Re: [PATCH] Fix PR77450

2016-09-06 Thread Yvan Roux
Hi Richard,

On 6 September 2016 at 14:41, Richard Biener  wrote:
>
> The following fixes PR77450.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2016-09-06  Richard Biener  
>
> PR c/77450
> c-family/
> * c-common.c (c_common_mark_addressable_vec): Handle
> COMPOUND_LITERAL_EXPR.
>
> * c-c++-common/vector-subscript-7.c: Adjust.
> * c-c++-common/vector-subscript-8.c: New testcase.

This new testcase fails in our validation (ARM and x86 targets):

gcc/testsuite/c-c++-common/vector-subscript-8.c:8:17: error: lvalue
required as left operand of assignment
compiler exited with status 1
FAIL: c-c++-common/vector-subscript-8.c  -std=c++98 (test for excess errors)


> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c (revision 240004)
> +++ gcc/c-family/c-common.c (working copy)
> @@ -10918,7 +10918,9 @@ c_common_mark_addressable_vec (tree t)
>  {
>while (handled_component_p (t))
>  t = TREE_OPERAND (t, 0);
> -  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
> +  if (!VAR_P (t)
> +  && TREE_CODE (t) != PARM_DECL
> +  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
>  return;
>TREE_ADDRESSABLE (t) = 1;
>  }
> Index: gcc/testsuite/c-c++-common/vector-subscript-7.c
> ===
> --- gcc/testsuite/c-c++-common/vector-subscript-7.c (revision 240004)
> +++ gcc/testsuite/c-c++-common/vector-subscript-7.c (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-ccp1" } */
> +/* { dg-options "-O -fdump-tree-fre1" } */
>
>  typedef int v4si __attribute__ ((vector_size (16)));
>
> @@ -11,4 +11,4 @@ main (int argc, char** argv)
>return ((v4si){1, 2, 42, 0})[j];
>  }
>
> -/* { dg-final { scan-tree-dump "return 42;" "ccp1" } } */
> +/* { dg-final { scan-tree-dump "return 42;" "fre1" } } */
> Index: gcc/testsuite/c-c++-common/vector-subscript-8.c
> ===
> --- gcc/testsuite/c-c++-common/vector-subscript-8.c (revision 0)
> +++ gcc/testsuite/c-c++-common/vector-subscript-8.c (working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +
> +typedef int V __attribute__((vector_size(4)));
> +
> +void
> +foo(void)
> +{
> +  (V){ 0 }[0] = 0;
> +}


[patch committed SH] Fix build failure

2016-09-06 Thread Kaz Kojima
I've committed the attached patch to fix build failure on SH
during compiling tree-profile.c:

../trunk/gcc/tree-profile.c: In function 'unsigned int tree_profiling()':
../trunk/gcc/tree-profile.c:560:12: error: 'TARGET_ATOMIC_ANY' was not declared 
in this scope

The target sets HAVE_atomic_compare_and_swapsi to TARGET_ATOMIC_ANY
defined in sh-protos.h which can't be seen from tree-profile.c.
The patch simply moves definitions for atomic models to sh.h from
sh-protos.h and guard them with __cplusplus.

Regards,
kaz
--
2016-09-07  Kaz Kojima  

* config/sh/sh-protos.h (struct sh_atomic_model,
selected_atomic_model, TARGET_ATOMIC_ANY, TARGET_ATOMIC_STRICT,
TARGET_ATOMIC_SOFT_GUSA, TARGET_ATOMIC_HARD_LLCS,
TARGET_ATOMIC_SOFT_TCB, TARGET_ATOMIC_SOFT_IMASK): Move to...
* config/sh/sh.h (struct sh_atomic_model,
selected_atomic_model, TARGET_ATOMIC_ANY, TARGET_ATOMIC_STRICT,
TARGET_ATOMIC_SOFT_GUSA, TARGET_ATOMIC_HARD_LLCS,
TARGET_ATOMIC_SOFT_TCB, TARGET_ATOMIC_SOFT_IMASK): ...here.
Guard with __cplusplus.

diff --git a/config/sh/sh-protos.h b/config/sh/sh-protos.h
index fecbb88..f611dab 100644
--- a/config/sh/sh-protos.h
+++ b/config/sh/sh-protos.h
@@ -35,59 +35,6 @@ enum sh_function_kind {
   SFUNC_STATIC
 };
 
-/* Atomic model.  */
-struct sh_atomic_model
-{
-  enum enum_type
-  {
-none = 0,
-soft_gusa,
-hard_llcs,
-soft_tcb,
-soft_imask,
-
-num_models
-  };
-
-  /*  If strict is set, disallow mixing of different models, as it would
-  happen on SH4A.  */
-  bool strict;
-  enum_type type;
-
-  /* Name string as it was specified on the command line.  */
-  const char* name;
-
-  /* Name string as it is used in C/C++ defines.  */
-  const char* cdef_name;
-
-  /* GBR offset variable for TCB model.  */
-  int tcb_gbr_offset;
-};
-
-extern const sh_atomic_model& selected_atomic_model (void);
-
-/* Shortcuts to check the currently selected atomic model.  */
-#define TARGET_ATOMIC_ANY \
-  (selected_atomic_model ().type != sh_atomic_model::none)
-
-#define TARGET_ATOMIC_STRICT \
-  (selected_atomic_model ().strict)
-
-#define TARGET_ATOMIC_SOFT_GUSA \
-  (selected_atomic_model ().type == sh_atomic_model::soft_gusa)
-
-#define TARGET_ATOMIC_HARD_LLCS \
-  (selected_atomic_model ().type == sh_atomic_model::hard_llcs)
-
-#define TARGET_ATOMIC_SOFT_TCB \
-  (selected_atomic_model ().type == sh_atomic_model::soft_tcb)
-
-#define TARGET_ATOMIC_SOFT_TCB_GBR_OFFSET_RTX \
-  GEN_INT (selected_atomic_model ().tcb_gbr_offset)
-
-#define TARGET_ATOMIC_SOFT_IMASK \
-  (selected_atomic_model ().type == sh_atomic_model::soft_imask)
-
 #ifdef RTX_CODE
 extern rtx sh_fsca_sf2int (void);
 extern rtx sh_fsca_int2sf (void);
diff --git a/config/sh/sh.h b/config/sh/sh.h
index 0403616..25b6cee 100644
--- a/config/sh/sh.h
+++ b/config/sh/sh.h
@@ -333,6 +333,63 @@ extern enum sh_divide_strategy_e sh_div_strategy;
 #define SH_DIV_STRATEGY_DEFAULT SH_DIV_CALL_DIV1
 #endif
 
+#ifdef __cplusplus
+
+/* Atomic model.  */
+struct sh_atomic_model
+{
+  enum enum_type
+  {
+none = 0,
+soft_gusa,
+hard_llcs,
+soft_tcb,
+soft_imask,
+
+num_models
+  };
+
+  /*  If strict is set, disallow mixing of different models, as it would
+  happen on SH4A.  */
+  bool strict;
+  enum_type type;
+
+  /* Name string as it was specified on the command line.  */
+  const char* name;
+
+  /* Name string as it is used in C/C++ defines.  */
+  const char* cdef_name;
+
+  /* GBR offset variable for TCB model.  */
+  int tcb_gbr_offset;
+};
+
+extern const sh_atomic_model& selected_atomic_model (void);
+
+/* Shortcuts to check the currently selected atomic model.  */
+#define TARGET_ATOMIC_ANY \
+  (selected_atomic_model ().type != sh_atomic_model::none)
+
+#define TARGET_ATOMIC_STRICT \
+  (selected_atomic_model ().strict)
+
+#define TARGET_ATOMIC_SOFT_GUSA \
+  (selected_atomic_model ().type == sh_atomic_model::soft_gusa)
+
+#define TARGET_ATOMIC_HARD_LLCS \
+  (selected_atomic_model ().type == sh_atomic_model::hard_llcs)
+
+#define TARGET_ATOMIC_SOFT_TCB \
+  (selected_atomic_model ().type == sh_atomic_model::soft_tcb)
+
+#define TARGET_ATOMIC_SOFT_TCB_GBR_OFFSET_RTX \
+  GEN_INT (selected_atomic_model ().tcb_gbr_offset)
+
+#define TARGET_ATOMIC_SOFT_IMASK \
+  (selected_atomic_model ().type == sh_atomic_model::soft_imask)
+
+#endif // __cplusplus
+
 #define SUBTARGET_OVERRIDE_OPTIONS (void) 0
 
 


Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Kugan Vivekanandarajah
Hi Richard,

On 6 September 2016 at 19:08, Richard Biener  wrote:
> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>> On 5 September 2016 at 17:57, Richard Biener  
>> wrote:
>>> On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
>>>  wrote:
 Hi All,

 While looking at gcc source, I noticed that we are iterating over SSA
 variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
 in others. It seems that variable 0 is always NULL TREE.
>>>
>>> Yeah, that's artificial because we don't assign SSA name version 0 (for some
>>> unknown reason).
>>>
 But it can
 confuse people who are looking for the first time. Therefore It might
 be good to follow some consistent usage here.

 It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
 other case. Here is attempt to do this based on what is done in other
 places. Bootstrapped and regression tested on X86_64-linux-gnu with no
 new regressions. is this OK?
>>>
>>> First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
>>> as SSAVAR might be confused with iterating over SSA_NAME_VAR.
>>>
>>> Then, if you add an iterator why leave the name == NULL handling to 
>>> consumers?
>>> That looks odd.
>>>
>>> Then, SSA names are just in a vector thus why not re-use the vector 
>>> iterator?
>>>
>>> That is,
>>>
>>> #define FOR_EACH_SSA_NAME (name) \
>>>   for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i)
>>>
>>> would be equivalent to your patch?
>>>
>>> Please also don't add new iterators that implicitely use 'cfun' but always 
>>> use
>>> one that passes it as explicit arg.
>>
>> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
>> we will not be able to skill NULL ssa_names with that.
>
> Why?  Can't you simply do
>
>   #define FOR_EACH_SSA_NAME (name) \
> for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i) \
>if (name)
>
> ?

Indeed.  I missed the if at the end :(.  Here is an updated patch.
Bootstrapped and regression tested on x86_64-linux-gnu with no new
regressions.

Thanks,
Kugan
>
>> I also added
>> index variable to the macro so that there want be any conflicts if the
>> index variable "i" (or whatever) is also defined in the loop.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>> regressions. Is this OK for trunk?
>>
>> Thanks,
>> Kugan
>>
>>
>> gcc/ChangeLog:
>>
>> 2016-09-06  Kugan Vivekanandarajah  
>>
>> * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
>> * cfgexpand.c (update_alias_info_with_stack_vars): Use
>> FOR_EACH_SSA_NAME to iterate over SSA variables.
>> (pass_expand::execute): Likewise.
>> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>> * tree-cfg.c (dump_function_to_file): Likewise.
>> * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>> (update_ssa): Likewise.
>> * tree-ssa-alias.c (dump_alias_info): Likewise.
>> * tree-ssa-ccp.c (ccp_finalize): Likewise.
>> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
>> (create_outofssa_var_map): Likewise.
>> (coalesce_ssa_name): Likewise.
>> * tree-ssa-operands.c (dump_immediate_uses): Likewise.
>> * tree-ssa-pre.c (compute_avail): Likewise.
>> * tree-ssa-sccvn.c (init_scc_vn): Likewise.
>> (scc_vn_restore_ssa_info): Likewise.
>> (free_scc_vn): Likwise.
>> (run_scc_vn): Likewise.
>> * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
>> * tree-ssa-ter.c (new_temp_expr_table): Likewise.
>> * tree-ssa-copy.c (fini_copy_prop): Likewise.
>> * tree-ssa.c (verify_ssa): Likewise.
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
 Thanks,
 Kugan


 gcc/ChangeLog:

 2016-09-05  Kugan Vivekanandarajah  

 * tree-ssanames.h (ssa_iterator::ssa_iterator): New.
 (ssa_iterator::get): Likewise.
 (ssa_iterator::next): Likewise.
 (FOR_EACH_SSAVAR): Likewise.
 * cfgexpand.c (update_alias_info_with_stack_vars): Use
 FOR_EACH_SSAVAR to iterate over SSA variables.
 (pass_expand::execute): Likewise.
 * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
 * tree-cfg.c (dump_function_to_file): Likewise.
 * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
 (update_ssa): Likewise.
 * tree-ssa-alias.c (dump_alias_info): Likewise.
 * tree-ssa-ccp.c (ccp_finalize): Likewise.
 * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
 (create_outofssa_var_map): Likewise.
 (coalesce_ssa_name): Likewise.
 * tree-ssa-operands.c (dump_immediate_uses): Likewise.
 * tree-ssa-pre.c (compute_avail): Likewise.
 * tree-ssa-sccvn.c (init_scc_vn): Likewise.
 (scc_vn_restore_ssa_info): Likewise.
 (free_scc_vn): Likwise.
 (run_scc_vn): Likewise.
 * tree-ssa-structalias.c (compute_points_to_sets): Likew

Re: [PATCH GCC 9/9]Prove no-overflow in computation of LOOP_VINFO_NITERS and improve code generation

2016-09-06 Thread kugan

Hi Bin,

On 07/09/16 04:54, Bin Cheng wrote:

Hi,
LOOP_VINFO_NITERS is computed as LOOP_VINFO_NITERSM1 + 1, which could overflow 
in loop niters' type.  Vectorizer needs to generate more code computing 
vectorized niters if overflow does happen.  However, For common loops, there is 
no overflow actually, this patch tries to prove the no-overflow information and 
use that to improve code generation.  At the moment, no-overflow information 
comes either from loop niter analysis, or the truth that we know loop is peeled 
for non-zero iterations in prologue peeling.  For the latter case, it doesn't 
matter if the original LOOP_VINFO_NITERS overflows or not, because computation 
LOOP_VINFO_NITERS - LOOP_VINFO_PEELING_FOR_ALIGNMENT cancels the overflow by 
underflow.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (loop_niters_no_overflow): New func.
(vect_transform_loop): Call loop_niters_no_overflow.  Pass the
no-overflow information to vect_do_peeling_for_loop_bound and
vect_gen_vector_loop_niters.


009-prove-no_overflow-for-vect-niters-20160902.txt


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 0d37f55..2ef1f9b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6610,6 +6610,38 @@ vect_loop_kill_debug_uses (struct loop *loop, gimple 
*stmt)
 }
 }

+/* Given loop represented by LOOP_VINFO, return true if computation of
+   LOOP_VINFO_NITERS (= LOOP_VINFO_NITERSM1 + 1) doesn't overflow, false
+   otherwise.  */
+
+static bool
+loop_niters_no_overflow (loop_vec_info loop_vinfo)
+{
+  /* Constant case.  */
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+{
+  int cst_niters = LOOP_VINFO_INT_NITERS (loop_vinfo);


Wouldn't it truncate by assigning this to int?



+  tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
+
+  gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
+  if (wi::to_widest (cst_nitersm1) < cst_niters)


Shouldn't you have do the addition and comparison in the type of the 
loop index instead of widest_int to see if that overflows?


Thanks,
Kugan


+   return true;
+}
+
+  widest_int max;
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  /* Check the upper bound of loop niters.  */
+  if (get_max_loop_iterations (loop, &max))
+{
+  tree type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
+  signop sgn = TYPE_SIGN (type);
+  widest_int type_max = widest_int::from (wi::max_value (type), sgn);
+  if (max < type_max)
+   return true;
+}
+  return false;
+}
+
 /* Function vect_transform_loop.

The analysis phase has determined that the loop is vectorizable.
@@ -6697,8 +6729,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   tree niters = vect_build_loop_niters (loop_vinfo);
   LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = niters;
   tree nitersm1 = unshare_expr (LOOP_VINFO_NITERSM1 (loop_vinfo));
+  bool niters_no_overflow = loop_niters_no_overflow (loop_vinfo);
   vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, th,
-  check_profitability, false);
+  check_profitability, niters_no_overflow);
   if (niters_vector == NULL_TREE)
 {
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
@@ -6707,7 +6740,7 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   LOOP_VINFO_INT_NITERS (loop_vinfo) / vf);
   else
vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector,
-false);
+niters_no_overflow);
 }

   /* 1) Make sure the loop header has exactly two entries



Re: [PATCH, rs6000] Fix PR72827 (ada bootstrap failure)

2016-09-06 Thread Jeff Law

On 09/06/2016 04:14 PM, Segher Boessenkool wrote:

On Tue, Sep 06, 2016 at 03:24:24PM -0600, Jeff Law wrote:

On 08/31/2016 01:08 AM, Eric Botcazou wrote:

DSE should really detect this is happening and not do the wrong thing.
Maybe add an assert somewhere?  Much easier to debug, that way.


That sounds fragile, functions are allowed to fiddle with the frame
pointer in
the prologue or epilogue (but of course not in the body).  I think that
DSE is
not the only RTL pass which makes this assumption of invariant frame
pointer
in the body, it seems rather fundamental in the RTL middle-end.

That's my recollection as well -- I recall many patches flying by
through the years that assumed the frame pointer was invariant -- but
they were mostly (all?) in things that ran before we add the
prologue/epilogue to the INSN chain.


We could simply check if the frame pointer (or stack pointer) is changed
while RTX_FRAME_RELATED_P is not set?  Can that ever happen on "proper"
code?
Worth a try, though I'm not convinced we're setting RTX_FRAME_RELATED_P 
correctly, particularly for the legacy targets.

jeff


Re: [PATCH, rs6000] Fix PR72827 (ada bootstrap failure)

2016-09-06 Thread Segher Boessenkool
On Tue, Sep 06, 2016 at 03:24:24PM -0600, Jeff Law wrote:
> On 08/31/2016 01:08 AM, Eric Botcazou wrote:
> >>DSE should really detect this is happening and not do the wrong thing.
> >>Maybe add an assert somewhere?  Much easier to debug, that way.
> >
> >That sounds fragile, functions are allowed to fiddle with the frame 
> >pointer in
> >the prologue or epilogue (but of course not in the body).  I think that 
> >DSE is
> >not the only RTL pass which makes this assumption of invariant frame 
> >pointer
> >in the body, it seems rather fundamental in the RTL middle-end.
> That's my recollection as well -- I recall many patches flying by 
> through the years that assumed the frame pointer was invariant -- but 
> they were mostly (all?) in things that ran before we add the 
> prologue/epilogue to the INSN chain.

We could simply check if the frame pointer (or stack pointer) is changed
while RTX_FRAME_RELATED_P is not set?  Can that ever happen on "proper"
code?


Segher


[PATCHv2] Fix a few simple cases where -Wparentheses does not warn for omitted middle value

2016-09-06 Thread Bernd Edlinger
On 09/05/16 23:50, Bernd Edlinger wrote:
> Hi,
>
> I've noticed that there is already a -Wparentheses warning for code like
>
>   int y = x == 2 ?: 1
>
> => warning: the omitted middle operand in ?: will always be 'true',
> suggest explicit middle operand [-Wparentheses]
>
> But it is not emitted for code that uses bool, like:
>
> void foo(bool x)
> {
>int y = x ?: 1;
>
> and under C it is not emitted for compound expressions
> that end with a comparison, like:
>
>int y = (i++,i==1) ?: 1;
>
> C++ is OK, but does only miss to warn on the bool data type.
>
> The attached patch should fix these warnings.
>

Well, reg-testing showed few test cases were broken, that made me
aware of an issue with templates when the LHS of ?: is dependent.

In that case the type is not available at the template declaration,
and the warning cannot be generated at the declaration but only when
the template is instantiated.  The new patch fixes this, and a
pre-existing issue, entered as PR 77496, when the type can not be
implicitly converted to boolean.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
gcc/c-family:
2016-09-06  Bernd Edlinger  

	PR c++/77496
	* c-common.c (warn_for_omitted_condop): Also warn for boolean data.

gcc/c:
2016-09-06  Bernd Edlinger  

	PR c++/77496
	* c-parser.c (c_parser_conditional_expression): Pass the rightmost
	COMPOUND_EXPR to warn_for_omitted_condop.

gcc/cp:
2016-09-06  Bernd Edlinger  

	PR c++/77496
	* call.c (build_conditional_expr_1): Call warn_for_omitted_condop.
	* class.c (instantiate_type): Look through the SAVE_EXPR.

gcc/testsuite:
2016-09-06  Bernd Edlinger  

	PR c++/77496
	* c-c++-common/warn-ommitted-condop.c: Add more test cases.
	* g++.dg/ext/pr77496.C: New test.
	* g++.dg/warn/pr77496.C: New test.


Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c	(revision 240001)
+++ gcc/c/c-parser.c	(working copy)
@@ -6423,16 +6423,20 @@ c_parser_conditional_expression (c_parser *parser,
   if (c_parser_next_token_is (parser, CPP_COLON))
 {
   tree eptype = NULL_TREE;
+  tree e;
 
   middle_loc = c_parser_peek_token (parser)->location;
-  pedwarn (middle_loc, OPT_Wpedantic, 
+  pedwarn (middle_loc, OPT_Wpedantic,
 	   "ISO C forbids omitting the middle term of a ?: expression");
-  warn_for_omitted_condop (middle_loc, cond.value);
   if (TREE_CODE (cond.value) == EXCESS_PRECISION_EXPR)
 	{
 	  eptype = TREE_TYPE (cond.value);
 	  cond.value = TREE_OPERAND (cond.value, 0);
 	}
+  e = cond.value;
+  while (TREE_CODE (e) == COMPOUND_EXPR)
+	e = TREE_OPERAND (e, 1);
+  warn_for_omitted_condop (middle_loc, e);
   /* Make sure first operand is calculated only once.  */
   exp1.value = c_save_expr (default_conversion (cond.value));
   if (eptype)
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 240001)
+++ gcc/c-family/c-common.c	(working copy)
@@ -10613,17 +10613,21 @@ fold_offsetof (tree expr)
   return convert (size_type_node, fold_offsetof_1 (expr));
 }
 
-/* Warn for A ?: C expressions (with B omitted) where A is a boolean 
+/* Warn for A ?: C expressions (with B omitted) where A is a boolean
expression, because B will always be true. */
 
 void
-warn_for_omitted_condop (location_t location, tree cond) 
-{ 
-  if (truth_value_p (TREE_CODE (cond))) 
-  warning_at (location, OPT_Wparentheses, 
+warn_for_omitted_condop (location_t location, tree cond)
+{
+  /* In C++ template declarations it can happen that the type is dependent
+ and not yet known, thus TREE_TYPE (cond) == NULL_TREE.  */
+  if (truth_value_p (TREE_CODE (cond))
+  || (TREE_TYPE (cond) != NULL_TREE &&
+	  TREE_CODE (TREE_TYPE (cond)) == BOOLEAN_TYPE))
+  warning_at (location, OPT_Wparentheses,
 		"the omitted middle operand in ?: will always be %, "
 		"suggest explicit middle operand");
-} 
+}
 
 /* Give an error for storing into ARG, which is 'const'.  USE indicates
how ARG was being used.  */
Index: gcc/cp/call.c
===
--- gcc/cp/call.c	(revision 240001)
+++ gcc/cp/call.c	(working copy)
@@ -4653,9 +4653,12 @@ build_conditional_expr_1 (location_t loc, tree arg
   if (!arg2)
 {
   if (complain & tf_error)
-	pedwarn (loc, OPT_Wpedantic, 
+	pedwarn (loc, OPT_Wpedantic,
 		 "ISO C++ forbids omitting the middle term of a ?: expression");
 
+  if ((complain & tf_warning) && !truth_value_p (TREE_CODE (arg1)))
+	warn_for_omitted_condop (loc, arg1);
+
   /* Make sure that lvalues remain lvalues.  See g++.oliva/ext1.C.  */
   if (lvalue_p (arg1))
 	arg2 = arg1 = cp_stabilize_reference (arg1);
Index: gcc/cp/class.c
===
--- gcc/cp/class.c	(revision 240001)
+++ gcc/cp/class.c	(working copy)
@@ -8262,7 +8262,12 @@ insta

Re: [PATCH] Move class substring_loc from c-family into gcc

2016-09-06 Thread Martin Sebor

On 09/03/2016 03:13 AM, Manuel López-Ibáñez wrote:

On 02/09/16 23:55, Martin Sebor wrote:

diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
index f839c74..bb0de4f 100644
--- a/gcc/substring-locations.h
+++ b/gcc/substring-locations.h
@@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not see
  #ifndef GCC_SUBSTRING_LOCATIONS_H
  #define GCC_SUBSTRING_LOCATIONS_H

+#include 
+


Is this header file going to be used in the middle-end? If so, then it
is suspicious that it includes cpplib.h. Otherwise, perhaps it should
live in c-family/


I'm not sure if you meant this question for me or for David but
I believe the main (only?) reason for this patch is let to make
use, in the -Wformat-length middle-end pass, of the same range
location API as in c-family/c-format in the C/C++ front ends.
David, please correct me if I mischaracterized it.

Martin



I'm not complaining about substring-locations.c because libcpp is
already linked with everything else even for other non-C languages, like
Ada, but the above is leaking all cpplib.h into the rest of the
compiler, which defeats the purpose of this in coretypes.h

/* Provide forward struct declaration so that we don't have to include
all of cpplib.h whenever a random prototype includes a pointer.
Note that the cpp_reader and cpp_token typedefs remain part of
cpplib.h.  */

struct cpp_reader;
struct cpp_token;


Cheers,
 Manuel.









Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Jeff Law

On 09/05/2016 01:57 AM, Richard Biener wrote:

On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
 wrote:

Hi All,

While looking at gcc source, I noticed that we are iterating over SSA
variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
in others. It seems that variable 0 is always NULL TREE.


Yeah, that's artificial because we don't assign SSA name version 0 (for some
unknown reason).
That might have been the global for aliasing purposes eons ago.  Or it 
might have been used for default defs early in the SSA development work. 
 I wouldn't be surprised if it doesn't need to be special anymore.


Jeff


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Jason Merrill wrote:

> The reason I care is that C++17 aligned new (wg21.link/p0035)
> specifies that for types that require more alignment than the usual
> operator new provides, the new-expression instead calls an operator
> new with an explicit alignment parameter.  MALLOC_ABI_ALIGNMENT
> sounded like exactly what I was looking for, but then I noticed that
> 'new long double' was going to the aligned new operator, which breaks
> older code that replaces operator new (without, of course, replacing
> the aligned variant).

I'd say that cxx_fundamental_alignment_p is the right thing to use here, 
but maybe you want an option to override the alignment threshold for 
aligned new (in case someone interposes a malloc that uses lower 
alignment, but still wants to allocate some objects with higher 
alignment).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Jason Merrill
On Tue, Sep 6, 2016 at 5:03 PM, Joseph Myers  wrote:
> On Tue, 6 Sep 2016, Jason Merrill wrote:
>
>> On Tue, Sep 6, 2016 at 11:16 AM, Joseph Myers  
>> wrote:
>> > GCC is supposed to support all mallocs that produce results aligned to at
>> > least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of
>> > max_align_t).
>>
>> I've just been running into problems with MALLOC_ABI_ALIGNMENT being
>> smaller than max_align_t, which doesn't make sense to me; the C11
>> passage Florian quoted says that malloc needs to support all
>> fundamental alignments, i.e. max_align_t.
>
> The point is that GCC supports being used in nonconforming implementations
> as well as in conforming ones, and in nonconforming contexts people may
> e.g. interpose malloc with a version that yields sufficient alignment for
> common cases but may not use 16-byte alignment, or may not align small
> objects as much as larger ones (whereas ISO C requires all alignments,
> even 1-byte ones, be aligned as much as max_align_t).

MALLOC_ABI_ALIGNMENT is documented as "Alignment, in bits, a C
conformant malloc implementation has to provide," which doesn't sound
like it's intended to allow for non-conforming implementations.  And
doesn't a smaller default lead to lost optimization opportunities in
the typical case?

The reason I care is that C++17 aligned new (wg21.link/p0035)
specifies that for types that require more alignment than the usual
operator new provides, the new-expression instead calls an operator
new with an explicit alignment parameter.  MALLOC_ABI_ALIGNMENT
sounded like exactly what I was looking for, but then I noticed that
'new long double' was going to the aligned new operator, which breaks
older code that replaces operator new (without, of course, replacing
the aligned variant).

Jason


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Paul Eggert

On 09/06/2016 01:40 PM, Joseph Myers wrote:
Sounds like a defect in C11 to me - none of the examples of flexible 
array

members anticipate needing to add to the size to allow for tail padding
with unknown alignment requirements.


Yes, I would prefer calling it a defect, as most code I've seen dealing 
with flexible array members does not align the tail size. However, GCC + 
valgrind does take advantage of this "defect" and I would not be 
surprised if other picky implementations do too.


The C11 standard's examples are weird, in that their flexible members 
are typically arrays of double, where the alignment in practice is 
invariably no less than that of the containing structure so our problem 
does not occur. And for the only example that calls malloc and is 
directly on point (assuming a weird platform where doubles are 
unaligned), the associated commentary says that the flexible array 
member behaves "for most purposes" as if it had the natural size, i.e., 
all bets are off unless you read the entire standard carefully! It is 
almost as if the C11 authors knew about the problem but did not want to 
call the reader's attention to it (I doubt whether that occurred -- it's 
just that it reads that way).



C11's prohibition of using alignof on incomplete types.) This is why

Structures with flexible array members are not incomplete types.

Ah, right you are. Sorry, I confused C11 alignof with Gnulib alignof. On 
pre-C11 platforms, the Gnulib substitute alignof does not work on such 
structures, so code like Gnulib fts.c that is intended to be portable to 
pre-C11 compilers can't use that C11 feature. I have corrected the 
Gnulib manual's documentation of this limitation.




Re: [PATCH, rs6000] Fix PR72827 (ada bootstrap failure)

2016-09-06 Thread Jeff Law

On 08/31/2016 01:08 AM, Eric Botcazou wrote:

DSE should really detect this is happening and not do the wrong thing.
Maybe add an assert somewhere?  Much easier to debug, that way.


That sounds fragile, functions are allowed to fiddle with the frame pointer in
the prologue or epilogue (but of course not in the body).  I think that DSE is
not the only RTL pass which makes this assumption of invariant frame pointer
in the body, it seems rather fundamental in the RTL middle-end.
That's my recollection as well -- I recall many patches flying by 
through the years that assumed the frame pointer was invariant -- but 
they were mostly (all?) in things that ran before we add the 
prologue/epilogue to the INSN chain.


jeff



Re: [PATCH] Delete GCJ

2016-09-06 Thread Jeff Law

On 09/06/2016 03:08 AM, Jakub Jelinek wrote:

On Tue, Sep 06, 2016 at 11:06:36AM +0200, Richard Biener wrote:

On Mon, Sep 5, 2016 at 6:17 PM, Andrew Haley  wrote:

On 05/09/16 17:15, Richard Biener wrote:

On September 5, 2016 5:13:06 PM GMT+02:00, Andrew Haley  wrote:

As discussed.  I think I should ask a Global reviewer to approve this
one.  For obvious reasons I haven't included the diffs to the deleted
gcc/java and libjava directories.  The whole tree, post GCJ-deletion,
is at svn+ssh://gcc.gnu.org/svn/gcc/branches/gcj/gcj-deletion-branch
if anyone would like to try it.


Isn't there also java specific C++ frontend parts?


There certainly are, but deleting them without breaking anything else
is going to be rather delicate.  I'm trying to do this one step at a
time, rather cautiously.


Ok, that sounds reasonable.

You have my approval for this first part then.  Please wait until after the
GNU Cauldron to allow other global reviewers to object.


No objection from me.

No objection from me either (I'm guessing that's not a surprise).

jeff


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Jason Merrill wrote:

> On Tue, Sep 6, 2016 at 11:16 AM, Joseph Myers  wrote:
> > GCC is supposed to support all mallocs that produce results aligned to at
> > least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of
> > max_align_t).
> 
> I've just been running into problems with MALLOC_ABI_ALIGNMENT being
> smaller than max_align_t, which doesn't make sense to me; the C11
> passage Florian quoted says that malloc needs to support all
> fundamental alignments, i.e. max_align_t.

The point is that GCC supports being used in nonconforming implementations 
as well as in conforming ones, and in nonconforming contexts people may 
e.g. interpose malloc with a version that yields sufficient alignment for 
common cases but may not use 16-byte alignment, or may not align small 
objects as much as larger ones (whereas ISO C requires all alignments, 
even 1-byte ones, be aligned as much as max_align_t).

Of course if users of such an interposed malloc try to allocate memory for 
_Float128 with it, they can expect that to break.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Jason Merrill
On Tue, Sep 6, 2016 at 11:16 AM, Joseph Myers  wrote:
> GCC is supposed to support all mallocs that produce results aligned to at
> least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of
> max_align_t).

I've just been running into problems with MALLOC_ABI_ALIGNMENT being
smaller than max_align_t, which doesn't make sense to me; the C11
passage Florian quoted says that malloc needs to support all
fundamental alignments, i.e. max_align_t.

Jason


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Bernd Edlinger wrote:

> I think that for _Float128 to become a fundamental data type,
> it is *not* sufficient that gcc supports it alone, because glibc
> needs to support also all the necessary math library functions.

No, the definition of basic types for a freestanding implementation is 
nothing to do with whether some library facilities are available.  (And, 
in any case, the same issue applies for _Decimal128, for which the library 
facilities have been available in libdfp for many years.)  Nor is there 
any reason for increases in malloc alignment in glibc to depend on other 
changes.

> I have some doubt that it will work to change max_align_t's
> alignment only based on the gcc version, I mean
> there should be a way to allow the max_align_t keep at
> 8 bytes if the existing glibc will not support 16 bytes.

max_align_t depending on the glibc version would be worse than a simple 
one-off change.

> macro, what the real malloc alignment will be.  And probably
> some builtin-macro, if gcc supports __float128, independently
> of the i386, that could probably also be helpful for writing
> portable applications.

I expect in future portable applications to use _Float128, for which 
FLT128* macros in  (or the corresponding built-in macros) can be 
used.  That doesn't allow for C++, but hopefully WG21 will decide at some 
point how best to support such types in C++ (maybe closer to how DFP is 
handled for C++).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Paul Eggert wrote:

> One way to correct the code is to increase malloc's argument up to a multiple
> of alignof(max_align_t). (One cannot portably use alignof(struct s) due to

Sounds like a defect in C11 to me - none of the examples of flexible array 
members anticipate needing to add to the size to allow for tail padding 
with unknown alignment requirements.

> C11's prohibition of using alignof on incomplete types.) This is why

Structures with flexible array members are not incomplete types.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, i386] Spellcheck hints for the i386 backend option handling (PR middle-end/77475, take 2, i386 part)

2016-09-06 Thread Jakub Jelinek
Hi!

On Mon, Sep 05, 2016 at 09:00:30PM +0200, Uros Bizjak wrote:
> OK as far as x86 target is concerned, but please allow a day for David
> to eventually comment on the implementation.

And here is the updated i386 patch, on top of the generic patch I've just
posted.  Compared to the last patch, this one now includes diagnostics,
uses the new opts-common.c helper and also stops using the perhaps fancy,
but absolutely l10n incompatible technique with prefix/suffix/sw variables
in various diagnostics - the words switch and attribute couldn't be
translated and would appear somewhere in a translated sequence.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-05  Jakub Jelinek  

PR middle-end/77475
* config/i386/i386.c (ix86_parse_stringop_strategy_string): Simplify,
use %qs instead of %s where desirable, use argument instead of arg in
the diagnostic wording, add list of supported strategies and
spellcheck hint.
(ix86_option_override_internal): Emit target("m...") instead of
option("m...") in the diagnostic.  Use %qs instead of %s in invalid
-march/-mtune option diagnostic.  Add list of supported arches/tunings
and spellcheck hint.  Remove prefix, suffix and sw variables, use
main_args_p ? "..." : "..." in diagnostics to make translation
possible.

* gcc.target/i386/pr65990.c: Adjust expected diagnostics.
* gcc.dg/march-generic.c: Likewise.
* gcc.target/i386/spellcheck-options-1.c: New test.
* gcc.target/i386/spellcheck-options-2.c: New test.
* gcc.target/i386/spellcheck-options-3.c: New test.
* gcc.target/i386/spellcheck-options-4.c: New test.

--- gcc/config/i386/i386.c.jj   2016-09-06 16:55:35.524605779 +0200
+++ gcc/config/i386/i386.c  2016-09-06 19:45:03.126299652 +0200
@@ -4516,6 +4517,7 @@ ix86_parse_stringop_strategy_string (cha
   const struct stringop_algs *default_algs;
   stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
   char *curr_range_str, *next_range_str;
+  const char *opt = is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=";
   int i = 0, n = 0;
 
   if (is_memset)
@@ -4537,15 +4539,13 @@ ix86_parse_stringop_strategy_string (cha
   if (3 != sscanf (curr_range_str, "%20[^:]:%d:%10s",
alg_name, &maxs, align))
 {
-  error ("wrong arg %s to option %s", curr_range_str,
- is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+ error ("wrong argument %qs to option %qs", curr_range_str, opt);
   return;
 }
 
   if (n > 0 && (maxs < (input_ranges[n - 1].max + 1) && maxs != -1))
 {
-  error ("size ranges of option %s should be increasing",
- is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+ error ("size ranges of option %qs should be increasing", opt);
   return;
 }
 
@@ -4555,9 +4555,25 @@ ix86_parse_stringop_strategy_string (cha
 
   if (i == last_alg)
 {
-  error ("wrong stringop strategy name %s specified for option %s",
- alg_name,
- is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+ error ("wrong strategy name %qs specified for option %qs",
+alg_name, opt);
+
+ auto_vec  candidates;
+ for (i = 0; i < last_alg; i++)
+   if ((stringop_alg) i != rep_prefix_8_byte || TARGET_64BIT)
+ candidates.safe_push (stringop_alg_names[i]);
+
+ char *s;
+ const char *hint
+   = candidates_list_and_hint (alg_name, s, candidates);
+ if (hint)
+   inform (input_location,
+   "valid arguments to %qs are: %s; did you mean %qs?",
+   opt, s, hint);
+ else
+   inform (input_location, "valid arguments to %qs are: %s",
+   opt, s);
+ XDELETEVEC (s);
   return;
 }
 
@@ -4565,10 +4581,8 @@ ix86_parse_stringop_strategy_string (cha
  && !TARGET_64BIT)
{
  /* rep; movq isn't available in 32-bit code.  */
- error ("stringop strategy name %s specified for option %s "
-"not supported for 32-bit code",
- alg_name,
- is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+ error ("strategy name %qs specified for option %qs "
+"not supported for 32-bit code", alg_name, opt);
  return;
}
 
@@ -4580,8 +4594,7 @@ ix86_parse_stringop_strategy_string (cha
 input_ranges[n].noalign = true;
   else
 {
-  error ("unknown alignment %s specified for option %s",
- align, is_memset ? "-mmemset_strategy=" : 
"-mmemcpy_strategy=");
+ error ("unknown alignment %qs specified for option %qs", align, opt);
   return;
 }
   n++;
@@ -4592,15 +4605,13 @@ ix86_parse_stringop_strategy_str

[PATCH, i386] Spellcheck hints for the i386 backend option handling (PR middle-end/77475, take 2, generic part)

2016-09-06 Thread Jakub Jelinek
Hi!

On Mon, Sep 05, 2016 at 09:59:03PM +0200, Jakub Jelinek wrote:
> Plus obviously the unrecognized_argument_error needs to be declared in some
> header file.
> 
> That said, for x86 -march/-mtune uses this is problematic, as it uses either
> %<-march=%> argument or target("march=") attribute wording there depending
> on whether it is a command line option or target attribute.  Though, it is
> not good this way for translation anyway.  Perhaps use XNEWVEC instead of
> XALLOCAVEC for the all options string, and have the helper function just
> return that + hint, inform by itself and then free the string?

Here is the generic part that does this (and it indeed simplifies quite a
lot the i386.c patch, while keeping the diagnostics in its hands for exact
wording etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-06  Jakub Jelinek  
Manuel Lopez-Ibanez  

PR middle-end/77475
* opts.h (candidates_list_and_hint): Declare.
* opts-common.c (candidates_list_and_hint): New function.
(cmdline_handle_error): Use it.

--- gcc/opts.h.jj   2016-06-30 19:38:45.0 +0200
+++ gcc/opts.h  2016-09-06 18:05:50.836734668 +0200
@@ -419,5 +420,8 @@ extern const struct sanitizer_opts_s
 extern void add_misspelling_candidates (auto_vec *candidates,
const struct cl_option *option,
const char *base_option);
+extern const char *candidates_list_and_hint (const char *arg, char *&str,
+const auto_vec  &
+candidates);
 
 #endif
--- gcc/opts-common.c.jj2016-06-30 19:38:45.0 +0200
+++ gcc/opts-common.c   2016-09-06 19:36:22.339827679 +0200
@@ -1069,6 +1069,37 @@ generate_option_input_file (const char *
   decoded->errors = 0;
 }
 
+/* Helper function for listing valid choices and hint for misspelled
+   value.  CANDIDATES is a vector containing all valid strings,
+   STR is set to a heap allocated string that contains all those
+   strings concatenated, separated by spaces, and the return value
+   is the closest string from those to ARG, or NULL if nothing is
+   close enough.  */
+
+const char *
+candidates_list_and_hint (const char *arg, char *&str,
+ const auto_vec  &candidates)
+{
+  size_t len = 0;
+  int i;
+  const char *candidate;
+  char *p;
+
+  FOR_EACH_VEC_ELT (candidates, i, candidate)
+len += strlen (candidate) + 1;
+
+  str = p = XNEWVEC (char, len);
+  FOR_EACH_VEC_ELT (candidates, i, candidate)
+{
+  len = strlen (candidate);
+  memcpy (p, candidate, len);
+  p[len] = ' ';
+  p += len + 1;
+}
+  p[-1] = '\0';
+  return find_closest_string (arg, &candidates);
+}
+
 /* Perform diagnostics for read_cmdline_option and control_warning_option
functions.  Returns true if an error has been diagnosed.
LOC and LANG_MASK arguments like in read_cmdline_option.
@@ -1108,38 +1139,27 @@ cmdline_handle_error (location_t loc, co
 {
   const struct cl_enum *e = &cl_enums[option->var_enum];
   unsigned int i;
-  size_t len;
-  char *s, *p;
+  char *s;
 
   if (e->unknown_error)
error_at (loc, e->unknown_error, arg);
   else
error_at (loc, "unrecognized argument in option %qs", opt);
 
-  len = 0;
-  for (i = 0; e->values[i].arg != NULL; i++)
-   len += strlen (e->values[i].arg) + 1;
-
   auto_vec  candidates;
-  s = XALLOCAVEC (char, len);
-  p = s;
   for (i = 0; e->values[i].arg != NULL; i++)
{
  if (!enum_arg_ok_for_language (&e->values[i], lang_mask))
continue;
- size_t arglen = strlen (e->values[i].arg);
- memcpy (p, e->values[i].arg, arglen);
- p[arglen] = ' ';
- p += arglen + 1;
  candidates.safe_push (e->values[i].arg);
}
-  p[-1] = 0;
-  const char *hint = find_closest_string (arg, &candidates);
+  const char *hint = candidates_list_and_hint (arg, s, candidates);
   if (hint)
inform (loc, "valid arguments to %qs are: %s; did you mean %qs?",
option->opt_text, s, hint);
   else
inform (loc, "valid arguments to %qs are: %s", option->opt_text, s);
+  XDELETEVEC (s);
 
   return true;
 }


Jakub


[PATCH] -fsanitize=thread fixes (PR sanitizer/68260)

2016-09-06 Thread Jakub Jelinek
Hi!

As the PRs show, -fsanitize=thread hasn't been instrumenting __atomic_clear
and __atomic_test_and_set builtins (i.e. those that operate on bool always),
so we reported a data race on the testcase even when there wasn't one.
And, as the other testcase shows, there were various ICEs with
-fnon-call-exceptions -fsanitize=thread.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-09-06  Jakub Jelinek  

PR sanitizer/68260
* tsan.c: Include target.h and tree-eh.h.
(enum tsan_atomic_action): Add bool_clear and bool_test_and_set.
(BOOL_CLEAR, BOOL_TEST_AND_SET): Define.
(tsan_atomic_table): Add BUILT_IN_ATOMIC_CLEAR and
BUILT_IN_ATOMIC_TEST_AND_SET entries.
(instrument_builtin_call): Remove EH edges of replaced call stmts.
Handle bool_clear and bool_test_and_set.
(instrument_memory_accesses): Add RET argument, purge dead eh edges
at the end of bbs and set *RET in that case to TODO_cleanup_cfg.
(tsan_pass): Adjust instrument_memory_accesses caller, return ret.

* c-c++-common/tsan/pr68260.c: New test.
* g++.dg/tsan/pr68260.C: New test.

--- gcc/tsan.c.jj   2016-07-11 22:18:08.0 +0200
+++ gcc/tsan.c  2016-09-06 14:06:21.193642590 +0200
@@ -40,6 +40,8 @@ along with GCC; see the file COPYING3.
 #include "tsan.h"
 #include "asan.h"
 #include "builtins.h"
+#include "target.h"
+#include "tree-eh.h"
 
 /* Number of instrumented memory accesses in the current function.  */
 
@@ -240,7 +242,8 @@ instrument_expr (gimple_stmt_iterator gs
 enum tsan_atomic_action
 {
   check_last, add_seq_cst, add_acquire, weak_cas, strong_cas,
-  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst
+  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst,
+  bool_clear, bool_test_and_set
 };
 
 /* Table how to map sync/atomic builtins to their corresponding
@@ -274,6 +277,10 @@ static const struct tsan_map_atomic
   TRANSFORM (fcode, tsan_fcode, fetch_op, code)
 #define FETCH_OPS(fcode, tsan_fcode, code) \
   TRANSFORM (fcode, tsan_fcode, fetch_op_seq_cst, code)
+#define BOOL_CLEAR(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, bool_clear, ERROR_MARK)
+#define BOOL_TEST_AND_SET(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, bool_test_and_set, ERROR_MARK)
 
   CHECK_LAST (ATOMIC_LOAD_1, TSAN_ATOMIC8_LOAD),
   CHECK_LAST (ATOMIC_LOAD_2, TSAN_ATOMIC16_LOAD),
@@ -463,7 +470,11 @@ static const struct tsan_map_atomic
   LOCK_RELEASE (SYNC_LOCK_RELEASE_2, TSAN_ATOMIC16_STORE),
   LOCK_RELEASE (SYNC_LOCK_RELEASE_4, TSAN_ATOMIC32_STORE),
   LOCK_RELEASE (SYNC_LOCK_RELEASE_8, TSAN_ATOMIC64_STORE),
-  LOCK_RELEASE (SYNC_LOCK_RELEASE_16, TSAN_ATOMIC128_STORE)
+  LOCK_RELEASE (SYNC_LOCK_RELEASE_16, TSAN_ATOMIC128_STORE),
+
+  BOOL_CLEAR (ATOMIC_CLEAR, TSAN_ATOMIC8_STORE),
+
+  BOOL_TEST_AND_SET (ATOMIC_TEST_AND_SET, TSAN_ATOMIC8_EXCHANGE)
 };
 
 /* Instrument an atomic builtin.  */
@@ -493,6 +504,8 @@ instrument_builtin_call (gimple_stmt_ite
if (!tree_fits_uhwi_p (last_arg)
|| memmodel_base (tree_to_uhwi (last_arg)) >= MEMMODEL_LAST)
  return;
+   if (lookup_stmt_eh_lp (stmt))
+ remove_stmt_from_eh_lp (stmt);
gimple_call_set_fndecl (stmt, decl);
update_stmt (stmt);
if (tsan_atomic_table[i].action == fetch_op)
@@ -514,6 +527,8 @@ instrument_builtin_call (gimple_stmt_ite
   != add_acquire
   ? MEMMODEL_SEQ_CST
   : MEMMODEL_ACQUIRE);
+   if (lookup_stmt_eh_lp (stmt))
+ remove_stmt_from_eh_lp (stmt);
update_gimple_call (gsi, decl, num + 1, args[0], args[1], args[2]);
stmt = gsi_stmt (*gsi);
if (tsan_atomic_table[i].action == fetch_op_seq_cst)
@@ -561,6 +576,8 @@ instrument_builtin_call (gimple_stmt_ite
if (!tree_fits_uhwi_p (args[5])
|| memmodel_base (tree_to_uhwi (args[5])) >= MEMMODEL_LAST)
  return;
+   if (lookup_stmt_eh_lp (stmt))
+ remove_stmt_from_eh_lp (stmt);
update_gimple_call (gsi, decl, 5, args[0], args[1], args[2],
args[4], args[5]);
return;
@@ -584,6 +601,8 @@ instrument_builtin_call (gimple_stmt_ite
g = gimple_build_assign (t, args[1]);
gsi_insert_before (gsi, g, GSI_SAME_STMT);
lhs = gimple_call_lhs (stmt);
+   if (lookup_stmt_eh_lp (stmt))
+ remove_stmt_from_eh_lp (stmt);
update_gimple_call (gsi, decl, 5, args[0],
build_fold_addr_expr (t), args[2],
build_int_cst (NULL_TREE,
@@ -610,11 +629,66 @@ instrument_builtin_call (gimple_stmt_ite
gcc_assert (num == 1);
t = TYPE_ARG_TYPES (TREE_TYPE (decl));
t = TREE_VALUE (TREE_CHAIN 

Re: Debug algorithms

2016-09-06 Thread François Dumont

Hi

Any final decision regarding this patch ?

Note that __std_a namespace is optional, I can remove this change 
from the patch if you want.


François


On 18/07/2016 21:25, François Dumont wrote:

On 13/07/2016 19:45, Jonathan Wakely wrote:

On 22/06/16 22:05 +0200, François Dumont wrote:

Hi

   Here is eventually the so long promized patch to introduce Debug 
algos similarly to Debug containers.


I'm trying to decide how much benefit this really gives us, and
whether the obfuscation to the code (even more namespaces involved,
and having to use __std_a:: instead of std::) is worth it. It also
means more code to maintain of course, with extra overloads.


   Why such an evolution:
- More flexibility, debug algos can be used explicitely without 
activating Debug mode.


Although nice in theory, I doubt this will get much usage in practice.

- Performance: Debug algos can get rid of Debug layer on top of 
container iterators to invoke normal algos. Operations on normal 
iterators are faster and we also benefit from the same algos 
specialization that sometimes exist on some container iterators 
(like std::deque ones). Also normal algos are now using other normal 
algos, Debug check won't be done several times.
- It will be easier to implement new Debug checks without the 
limitation to do so through some Debug macro


   To do so I introduced a new namespace __cxx1998_a used for normal 
algos when Debug mode is active. I couldn't reuse __cxx1998 cause 
with current implementation of Debug containers __cxx1998 is exposed 
and because of ADL we could then have ambiguity between Debug and 
normal versions of the same algos. I also introduced a __std_a 
namespace which control the kind of algos used within the library 
mostly for containers implementation details.



I think I need to apply the patch locally and spend some time looking
at the new structure, to see what ends up calling what. I'm finding it
difficult to follow that just from reading the patch.


This is definitely more code to maintain but I hope this code won't 
require much maintenance as it only host the Debug logic and not the 
algo logic itself. It relies on normal algo for algo logic.


You can see this patch as a way to cleanup the normal mode too !

If __cxx1998 namespace was perfectly encapsulated we could avoid the 
__cxx1998_a but for the moment the boundary between normal and debug 
mode is too tight.


François





Re: [PATCH 3/4][PR 71931] Fix libitm tests

2016-09-06 Thread Mike Stump
On Sep 6, 2016, at 11:13 AM, Szabolcs Nagy  wrote:
> 
> On 06/09/16 18:34, Mike Stump wrote:
>> On Sep 6, 2016, at 2:11 AM, Torvald Riegel  wrote:
>>> 
>>> On Wed, 2016-08-24 at 20:08 +0100, Szabolcs Nagy wrote:
 Pass build time CC make var down to dejagnu so the sysroot
 is set correctly when gcc is built with --with-build-sysroot.
 
 libitm/
 2016-08-24  Szabolcs Nagy  
 
PR testsuite/71931
* configure.ac: Add AC_CONFIG_FILES.
* configure: Regenerated.
* testuite/Makefile.am: Add rule for libitm-test-support.exp.
* testuite/Makefile.in: Regenerated.
* testuite/libitm-test-support.exp.in: New.
* testuite/lib/libitm.exp (libitm_init): Use BUILD_CC.
 
>>> 
>>> I don't know enough about the build system to really review this.  If a
>>> similar patch has been ACKed and applied for libatomic (71931 states
>>> that both are affected), then I guess this is OK?
>> 
>> I was hoping that someone else might review it, but i can see why no one 
>> else can or wants to.
>> 
>> Sorry for the feet dragging, Ok.  If libatomic has the same problem, and a 
>> similar solutions works there, Ok to fix it as well.
>> 
> 
> the libatomic changes were not accepted:
> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01771.html
> 
> i'll try a bit different approach
> (using automake's EXTRA_DEJAGNU_SITE_CONFIG so the
> CC setting is only done in the in-tree site.exp)

Oh, yeah.  Sorry, I did see that but didn't mean to ignore the concern raised.  
Please do work out install v non-install testing before it goes in.



[PATCH GCC 7/9]Skip loops iterating only 1 time in predictive commoning

2016-09-06 Thread Bin Cheng
Hi,
For loops which are bounded to iterate only 1 time (thus loop's latch doesn't 
roll), there is nothing to predictive common, this patch detects/skips these 
cases.  A test is also added in 
gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f for this.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-predcom.c (tree_predictive_commoning_loop): Skip loop that only
iterates 1 time.

gcc/testsuite/ChangeLog
2016-09-01  Bin Cheng  

* gfortran.dg/vect/fast-math-mgrid-resid.f: New test string.diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index fcf475d..88238f9 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -42,3 +42,4 @@ C
 ! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 1 "pcom" { target lp64 } } }
 ! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 2 "pcom" { target ia32 } } }
 ! { dg-final { scan-tree-dump-times "Predictive commoning failed: no suitable 
chains" 0 "pcom" } }
+! { dg-final { scan-tree-dump-times "Loop iterates only 1 time, nothing to do" 
1 "pcom" } }
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 096365a..14d53c2 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -2465,6 +2465,15 @@ tree_predictive_commoning_loop (struct loop *loop)
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "Processing loop %d\n",  loop->num);
 
+  /* Nothing for predicitive commoning if loop only iterates 1 time.  */
+  if (get_max_loop_iterations_int (loop) == 0)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Loop iterates only 1 time, nothing to do.\n");
+
+  return false;
+}
+
   /* Find the data references and split them into components according to their
  dependence relations.  */
   auto_vec loop_nest;


[PATCH GCC 4/9]Check niters for peeling for data access gaps in analyzer

2016-09-06 Thread Bin Cheng
Hi,
This patch checks if loop has enough niters for peeling for data access gaps in 
vect_analyze_loop_2, while now this check is in vect_transform_loop stage.  The 
problem is vectorizer may vectorize loops without enough iterations and 
generate false guard on the vectorized loop.  Though the loop is successfully 
vectorized, it will never be executed, and most likely, it will be removed 
during cfg-cleanup.  Examples can be found in revised tests of this patch.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (vect_analyze_loop_2): Check and skip loop if it
has no enough iterations for LOOP_VINFO_PEELING_FOR_GAPS.

gcc/testsuite/ChangeLog
2016-09-01  Bin Cheng  

* gcc.dg/vect/vect-98.c: Refine test case.
* gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Ditto.
* gcc.dg/vect/vect-strided-u8-i8-gap2.c: Ditto.
* gcc.dg/vect/vect-strided-u8-i8-gap4.c: Ditto.diff --git a/gcc/testsuite/gcc.dg/vect/vect-98.c 
b/gcc/testsuite/gcc.dg/vect/vect-98.c
index 99256a7..2055cce 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-98.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-98.c
@@ -3,6 +3,7 @@
 #include 
 #include "tree-vect.h"
 
+#define M 8
 #define N 4
 #define DOT4( a, b )  ( a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3] )
 
@@ -11,15 +12,15 @@ int main1 (int ia[][N])
 {
   int i, j;
   int ib[N] = {0,3,6,9};
-  int ic[N][N];
+  int ic[M][M];
 
-  for (i = 0; i < N; i++)
+  for (i = 0; i < M; i++)
 {
ic[0][i] = DOT4 (ia[i], ib);
 }
 
   /* check results: */  
-  for (i = 0; i < N; i++)
+  for (i = 0; i < M; i++)
 {
if (ic[0][i] != DOT4 (ia[i], ib))
abort();
@@ -30,7 +31,8 @@ int main1 (int ia[][N])
 
 int main (void)
 { 
-  int ia[N][N] = {{1,2,3,4},{2,3,5,7},{2,4,6,8},{22,43,55,77}};
+  int ia[M][N] = {{1,2,3,4},{2,3,5,7},{2,4,6,8},{22,43,55,77},
+ {13,17,19,23},{29,31,37,41},{3,7,2,1},{4,9,8,3}};
 
   check_vect ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c 
b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
index 42ed2b7..24c7cc3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
@@ -3,7 +3,7 @@
 #include 
 #include "tree-vect.h"
 
-#define N 16 
+#define N 24
 
 typedef struct {
unsigned char a;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c 
b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
index dddce85..23cea24 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
@@ -3,7 +3,7 @@
 #include 
 #include "tree-vect.h"
 
-#define N 16 
+#define N 24
 
 typedef struct {
unsigned char a;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 
b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
index 6face14..1b36df5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
@@ -3,7 +3,7 @@
 #include 
 #include "tree-vect.h"
 
-#define N 16 
+#define N 24
 
 typedef struct {
unsigned char a;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 45e18af..03ece95 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2061,6 +2061,25 @@ start_over:
   return false;
 }
 
+  /* If epilog loop is required because of data accesses with gaps,
+ one additional iteration needs to be peeled.  Check if there is
+ enough iterations for vectorization.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+  && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+{
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
+
+  if (wi::to_widest (scalar_niters) < vf)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"loop has no enough iterations to support"
+" peeling for gaps.\n");
+ return false;
+   }
+}
+
   /* Analyze cost.  Decide if worth while to vectorize.  */
   int min_profitable_estimate, min_profitable_iters;
   vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters,


[PATCH GCC 5/9]Put copied loop after its preheader and after the original loop's latch in basic block link list

2016-09-06 Thread Bin Cheng
Hi,
This simple patch changes slpeel_tree_duplicate_loop_edge_cfg by putting copied 
loop after its new preheader and after the original loop's latch in basic 
block's linked list.  It doesn't change CFG at all, but makes the dump cfg a 
little bit easier to read.  I assume this is good for basic block reordering 
too?

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Put
duplicated loop after its preheader and after the original loop.diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 3a3b0bc..b749afa 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -776,7 +776,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
struct loop *scalar_loop, edge e)
 {
   struct loop *new_loop;
-  basic_block *new_bbs, *bbs;
+  basic_block *new_bbs, *bbs, *pbbs;
   bool at_exit;
   bool was_imm_dom;
   basic_block exit_dest;
@@ -792,12 +792,13 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
 scalar_loop = loop;
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
-  get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
+  pbbs = bbs + 1;
+  get_loop_body_with_size (scalar_loop, pbbs, scalar_loop->num_nodes);
   /* Allow duplication of outer loops.  */
   if (scalar_loop->inner)
 duplicate_outer_loop = true;
   /* Check whether duplication is possible.  */
-  if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
+  if (!can_copy_bbs_p (pbbs, scalar_loop->num_nodes))
 {
   free (bbs);
   return NULL;
@@ -817,15 +818,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
  pre-header unconditionally for this.  */
   basic_block preheader = split_edge (loop_preheader_edge (scalar_loop));
   edge entry_e = single_pred_edge (preheader);
-  bbs[scalar_loop->num_nodes] = preheader;
+  bbs[0] = preheader;
   new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
   exit = single_exit (scalar_loop);
   copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
&exit, 1, &new_exit, NULL,
-   e->src, true);
+   at_exit ? loop->latch : e->src, true);
   exit = single_exit (loop);
-  basic_block new_preheader = new_bbs[scalar_loop->num_nodes];
+  basic_block new_preheader = new_bbs[0];
 
   add_phi_args_after_copy (new_bbs, scalar_loop->num_nodes + 1, NULL);
 


[PATCH GCC 3/9]Support rewriting non-lcssa phis for vars live outside of vect-loop

2016-09-06 Thread Bin Cheng
Hi,
Current implementation requires that variables live outside of vect-loop 
satisfying LCSSA form, this patch relaxes the restriction.  It keeps the old 
behavior for LCSSA PHI node by replacing use of live var with result of that 
PHI; for other uses of live var, it simply replaces all uses outside loop with 
the newly computed var.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (vectorizable_live_operation): Support handling
for live variable outside loop but not in lcssa form.diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fa06505..45e18af 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6478,14 +6478,6 @@ vectorizable_live_operation (gimple *stmt,
: gimple_get_lhs (stmt);
   lhs_type = TREE_TYPE (lhs);
 
-  /* Find all uses of STMT outside the loop - there should be at least one.  */
-  auto_vec worklist;
-  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
-if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
-   && !is_gimple_debug (use_stmt))
-  worklist.safe_push (use_stmt);
-  gcc_assert (worklist.length () >= 1);
-
   bitsize = TYPE_SIZE (TREE_TYPE (vectype));
   vec_bitsize = TYPE_SIZE (vectype);
 
@@ -6536,12 +6528,24 @@ vectorizable_live_operation (gimple *stmt,
   if (stmts)
 gsi_insert_seq_on_edge_immediate (single_exit (loop), stmts);
 
-  /* Replace all uses of the USE_STMT in the worklist with the newly inserted
- statement.  */
-  while (!worklist.is_empty ())
+  /* Replace use of lhs with newly computed result.  If the use stmt is a
+ single arg PHI, just replace all uses of PHI result.  It's necessary
+ because lcssa PHI defining lhs may be before newly inserted stmt.  */
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
+   && !is_gimple_debug (use_stmt))
 {
-  use_stmt = worklist.pop ();
-  replace_uses_by (gimple_phi_result (use_stmt), new_tree);
+  if (gimple_code (use_stmt) == GIMPLE_PHI
+ && gimple_phi_num_args (use_stmt) == 1)
+   {
+ replace_uses_by (gimple_phi_result (use_stmt), new_tree);
+   }
+  else
+   {
+ FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+   SET_USE (use_p, new_tree);
+   }
   update_stmt (use_stmt);
 }
 


[PATCH GCC 9/9]Prove no-overflow in computation of LOOP_VINFO_NITERS and improve code generation

2016-09-06 Thread Bin Cheng
Hi,
LOOP_VINFO_NITERS is computed as LOOP_VINFO_NITERSM1 + 1, which could overflow 
in loop niters' type.  Vectorizer needs to generate more code computing 
vectorized niters if overflow does happen.  However, For common loops, there is 
no overflow actually, this patch tries to prove the no-overflow information and 
use that to improve code generation.  At the moment, no-overflow information 
comes either from loop niter analysis, or the truth that we know loop is peeled 
for non-zero iterations in prologue peeling.  For the latter case, it doesn't 
matter if the original LOOP_VINFO_NITERS overflows or not, because computation 
LOOP_VINFO_NITERS - LOOP_VINFO_PEELING_FOR_ALIGNMENT cancels the overflow by 
underflow.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (loop_niters_no_overflow): New func.
(vect_transform_loop): Call loop_niters_no_overflow.  Pass the
no-overflow information to vect_do_peeling_for_loop_bound and
vect_gen_vector_loop_niters.diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 0d37f55..2ef1f9b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6610,6 +6610,38 @@ vect_loop_kill_debug_uses (struct loop *loop, gimple 
*stmt)
 }
 }
 
+/* Given loop represented by LOOP_VINFO, return true if computation of
+   LOOP_VINFO_NITERS (= LOOP_VINFO_NITERSM1 + 1) doesn't overflow, false
+   otherwise.  */
+
+static bool
+loop_niters_no_overflow (loop_vec_info loop_vinfo)
+{
+  /* Constant case.  */
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+{
+  int cst_niters = LOOP_VINFO_INT_NITERS (loop_vinfo);
+  tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
+
+  gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
+  if (wi::to_widest (cst_nitersm1) < cst_niters)
+   return true;
+}
+
+  widest_int max;
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  /* Check the upper bound of loop niters.  */
+  if (get_max_loop_iterations (loop, &max))
+{
+  tree type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
+  signop sgn = TYPE_SIGN (type);
+  widest_int type_max = widest_int::from (wi::max_value (type), sgn);
+  if (max < type_max)
+   return true;
+}
+  return false;
+}
+
 /* Function vect_transform_loop.
 
The analysis phase has determined that the loop is vectorizable.
@@ -6697,8 +6729,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   tree niters = vect_build_loop_niters (loop_vinfo);
   LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = niters;
   tree nitersm1 = unshare_expr (LOOP_VINFO_NITERSM1 (loop_vinfo));
+  bool niters_no_overflow = loop_niters_no_overflow (loop_vinfo);
   vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, th,
-  check_profitability, false);
+  check_profitability, niters_no_overflow);
   if (niters_vector == NULL_TREE)
 {
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
@@ -6707,7 +6740,7 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   LOOP_VINFO_INT_NITERS (loop_vinfo) / vf);
   else
vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector,
-false);
+niters_no_overflow);
 }
 
   /* 1) Make sure the loop header has exactly two entries


[PATCH GCC 8/9]Adjust test case for CFG changes in vectorized loop

2016-09-06 Thread Bin Cheng
Hi,
After CFG changes in vectorizer, the epilog loop now can be completely peeled, 
resulting in changes in the number of instructions that these tests check.  
This patch adjusts related checking strings.

Thanks,
bin


gcc/testsuite/ChangeLog
2016-09-01  Bin Cheng  

* gcc.target/i386/l_fma_float_1.c: Revise test.
* gcc.target/i386/l_fma_float_2.c: Ditto.
* gcc.target/i386/l_fma_float_3.c: Ditto.
* gcc.target/i386/l_fma_float_4.c: Ditto.
* gcc.target/i386/l_fma_float_5.c: Ditto.
* gcc.target/i386/l_fma_float_6.c: Ditto.
* gcc.target/i386/l_fma_double_1.c: Ditto.
* gcc.target/i386/l_fma_double_2.c: Ditto.
* gcc.target/i386/l_fma_double_3.c: Ditto.
* gcc.target/i386/l_fma_double_4.c: Ditto.
* gcc.target/i386/l_fma_double_5.c: Ditto.
* gcc.target/i386/l_fma_double_6.c: Ditto.diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
index 94e512b..ea90a35 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof 
(double;
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
index ffceab4..d604d57 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof 
(double;
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
index cdb4d33..d1ac6a5 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof 
(double;
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
index dda487e..58cd272 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof 
(double;
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times

[PATCH GCC 6/9]Simplify control flow graph for vectorized loop

2016-09-06 Thread Bin Cheng
Hi,
This is the main patch improving control flow graph for vectorized loop.  It 
generally rewrites loop peeling stuff in vectorizer.  As described in patch, 
for a typical loop to be vectorized like:

   preheader:
 LOOP:
   header_bb:
 loop_body
 if (exit_loop_cond) goto exit_bb
 elsegoto header_bb
   exit_bb:

This patch peels prolog and epilog from the loop, adds guards skipping PROLOG 
and EPILOG for various conditions.  As a result, the changed CFG would look 
like:

   guard_bb_1:
 if (prefer_scalar_loop) goto merge_bb_1
 elsegoto guard_bb_2

   guard_bb_2:
 if (skip_prolog) goto merge_bb_2
 else goto prolog_preheader

   prolog_preheader:
 PROLOG:
   prolog_header_bb:
 prolog_body
 if (exit_prolog_cond) goto prolog_exit_bb
 else  goto prolog_header_bb
   prolog_exit_bb:

   merge_bb_2:

   vector_preheader:
 VECTOR LOOP:
   vector_header_bb:
 vector_body
 if (exit_vector_cond) goto vector_exit_bb
 else  goto vector_header_bb
   vector_exit_bb:

   guard_bb_3:
 if (skip_epilog) goto merge_bb_3
 else goto epilog_preheader

   merge_bb_1:

   epilog_preheader:
 EPILOG:
   epilog_header_bb:
 epilog_body
 if (exit_epilog_cond) goto merge_bb_3
 else  goto epilog_header_bb

   merge_bb_3:


Note this patch peels prolog and epilog only if it's necessary, as well as adds 
different guard_conditions/branches.  Also the first guard/branch could be 
further improved by merging it with loop versioning.

Before this patch, up to 4 branch instructions need to be executed before the 
vectorized loop is reached in the worst case, while the number is reduced to 2 
with this patch.  The patch also does better in compile time analysis to avoid 
unnecessary peeling/branching.
>From implementation's point of view, vectorizer needs to update induction 
>variables and iteration bounds along with control flow changes.  
>Unfortunately, it also becomes much harder to follow because slpeel_* 
>functions updates SSA by itself, rather than using update_ssa interface.  This 
>patch tries to factor out SSA/IV/Niter_bound changes from CFG changes.  This 
>should make the implementation easier to read, and I think it maybe a step 
>forward to replace slpeel_* functions with generic GIMPLE loop copy interfaces 
>as Richard suggested.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release
adjust_vec automatically.
(slpeel_add_loop_guard): Remove param cond_expr_stmt_list.  Rename
param exit_bb to guard_to.
(slpeel_checking_verify_cfg_after_peeling):
(set_prologue_iterations):
(create_lcssa_for_virtual_phi): New func which is factored out from
slpeel_tree_peel_loop_to_edge.
(slpeel_tree_peel_loop_to_edge):
(iv_phi_p): New func.
(vect_can_advance_ivs_p): Call iv_phi_p.
(vect_update_ivs_after_vectorizer): Call iv_phi_p.  Directly insert
new gimple stmts in basic block.
(vect_do_peeling_for_loop_bound):
(vect_do_peeling_for_alignment):
(vect_gen_niters_for_prolog_loop): Rename to...
(vect_gen_prolog_loop_niters): ...Rename from.  Change parameters and
adjust implementation.
(vect_update_inits_of_drs): Fix code style issue.  Convert niters to
sizetype if necessary.
(vect_build_loop_niters): Move to here from tree-vect-loop.c.  Change
it to external function.
(vect_gen_scalar_loop_niters, vect_gen_vector_loop_niters): New.
(vect_gen_vector_loop_niters_mult_vf): New.
(slpeel_update_phi_nodes_for_loops): New.
(slpeel_update_phi_nodes_for_guard1): Reimplement.
(find_guard_arg, slpeel_update_phi_nodes_for_guard2): Reimplement.
(slpeel_update_phi_nodes_for_lcssa, vect_do_peeling): New.
* tree-vect-loop.c (vect_build_loop_niters): Move to file
tree-vect-loop-manip.c
(vect_generate_tmps_on_preheader): Delete.
(vect_transform_loop): Rename vectorization_factor to vf.  Call
vect_do_peeling instead of vect_do_peeling-* functions.
* tree-vectorizer.h (vect_do_peeling): New decl.
(vect_build_loop_niters, vect_gen_vector_loop_niters): New decls.
(vect_do_peeling_for_loop_bound): Delete.
(vect_do_peeling_for_alignment): Delete.diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index b749afa..e566c39 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -188,8 +188,6 @@ adjust_vec_debug_stmts (void)
   adjust_debug_stmts_now (&adjust_vec.last ());
   adjust_vec.pop ();
 }
-
-  adjust_vec.release ();
 }
 
 /* Adjust any debug stmts th

[PATCH GCC 2/9]Add interface reseting original copy tables in cfg.c

2016-09-06 Thread Bin Cheng
Hi,
This simple patch adds interface reseting original copy table in cfg.c.  This 
will be used in rewriting vect_do_peeling_* functions in vectorizer so that we 
don't need to release/allocate tables between prolog and epilog peeling.

Thanks,
bin

2016-09-01  Bin Cheng  

* cfg.c (reset_original_copy_tables): New func.
* cfg.h (reset_original_copy_tables): New decl.diff --git a/gcc/cfg.c b/gcc/cfg.c
index cab66c6..8e4a044 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -1066,6 +1066,15 @@ initialize_original_copy_tables (void)
   loop_copy = new hash_table (10);
 }
 
+void
+reset_original_copy_tables (void)
+{
+  gcc_assert (original_copy_bb_pool);
+  bb_original->empty ();
+  bb_copy->empty ();
+  loop_copy->empty ();
+}
+
 /* Free the data structures to maintain mapping between blocks and
its copies.  */
 void
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 6c8ba7e..ad935e3 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -108,6 +108,7 @@ extern void scale_bbs_frequencies_int (basic_block *, int, 
int, int);
 extern void scale_bbs_frequencies_gcov_type (basic_block *, int, gcov_type,
 gcov_type);
 extern void initialize_original_copy_tables (void);
+extern void reset_original_copy_tables (void);
 extern void free_original_copy_tables (void);
 extern void set_bb_original (basic_block, basic_block);
 extern basic_block get_bb_original (basic_block);


[PATCH GCC 1/9]Delete useless code in tree-vect-loop-manip.c

2016-09-06 Thread Bin Cheng
Hi,
This is a patch set generating new control flow graph for vectorized loop and 
its peeling loops.  At the moment, CFG for vecorized loop is complicated and 
sub-optimal.  Major issues are like:
A) For both prologue and vectorized loop, it generates guard/branch before 
loops checking if the following (prologue/vectorized) loop should be skipped.  
It also generates guard/branch after loops checking if the next loop 
(vectorized/epilogue) loop should be skipped.
B) Depending on how conditional set is supported by targets, it may generates 
one additional if-statement (branch) setting the niters for prologue loop.
C) In the worst cases, up to 4 branch instructions need to be executed before 
vectorized loop is entered.
D) For loops without enough niters, it checks&executes some (niters_prologue) 
iterations with prologue loop; then checks if the rest number of iterations 
(niters - niters_prologue) is enough for vectorization; if not, it skips 
vectorized loop and continues with epilogue loop.  This is bad since vectorized 
loop won't be executed at all after all the hassle.

This patch set improves it by merging different checks thus only 2 branch 
instructions (could be further reduced in combination with loop versioning) are 
executed before vectorized loop; it does better in compile time analysis in 
order to avoid prologue/epilogue peeling if possible; it improves code 
generation in various ways (live overflow handling, generating short live 
ranges).  In terms of implementation, it tries to factor SSA updating code out 
of CFG changing code, I think this may help future work replacing slpeel_* with 
generic GIMPLE loop copier.

So far there are 9 patches in the set, patch [1-5] are small prerequisites for 
major change which is done by patch 6.  Patch [7-9] are small patches either 
address test case or improve code generation.  Final bootstrap and test of 
patch set ongoing on x86_64 and AArch64.  Assume no new failure or will be 
fixed, any comments on this?

This is the first patch deleting useless code in tree-vect-loop-manip.c, as 
well as fixing obvious code style issue.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code
style issue.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment):
Remove useless code.diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 01d6bb1..3a3b0bc 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1003,9 +1003,9 @@ slpeel_can_duplicate_loop_p (const struct loop *loop, 
const_edge e)
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
-  /* All loops have an outer scope; the only case loop->outer is NULL is 
for
- the function itself.  */
-  if (!loop_outer (loop)
+  /* All loops have an outer scope; the only case loop->outer is NULL is for
+ the function itself.  */
+  if (!loop_outer (loop)
   || loop->num_nodes != num_bb
   || !empty_block_p (loop->latch)
   || !single_exit (loop)
@@ -1786,7 +1786,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
   struct loop *new_loop;
   edge update_e;
   basic_block preheader;
-  int loop_num;
   int max_iter;
   tree cond_expr = NULL_TREE;
   gimple_seq cond_expr_stmt_list = NULL;
@@ -1797,8 +1796,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
 
   initialize_original_copy_tables ();
 
-  loop_num  = loop->num;
-
   new_loop
 = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
 &ratio_mult_vf_name, ni_name, false,
@@ -1806,7 +1803,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
 cond_expr, cond_expr_stmt_list,
 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
   gcc_assert (new_loop);
-  gcc_assert (loop_num == loop->num);
   slpeel_checking_verify_cfg_after_peeling (loop, new_loop);
 
   /* A guard that controls whether the new_loop is to be executed or skipped
@@ -2053,8 +2049,6 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo, 
tree ni_name,
 
   initialize_original_copy_tables ();
 
-  gimple_seq stmts = NULL;
-  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
   niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo,
   ni_name,
   &bound);


Re: [PATCH 3/4][PR 71931] Fix libitm tests

2016-09-06 Thread Szabolcs Nagy
On 06/09/16 18:34, Mike Stump wrote:
> On Sep 6, 2016, at 2:11 AM, Torvald Riegel  wrote:
>>
>> On Wed, 2016-08-24 at 20:08 +0100, Szabolcs Nagy wrote:
>>> Pass build time CC make var down to dejagnu so the sysroot
>>> is set correctly when gcc is built with --with-build-sysroot.
>>>
>>> libitm/
>>> 2016-08-24  Szabolcs Nagy  
>>>
>>> PR testsuite/71931
>>> * configure.ac: Add AC_CONFIG_FILES.
>>> * configure: Regenerated.
>>> * testuite/Makefile.am: Add rule for libitm-test-support.exp.
>>> * testuite/Makefile.in: Regenerated.
>>> * testuite/libitm-test-support.exp.in: New.
>>> * testuite/lib/libitm.exp (libitm_init): Use BUILD_CC.
>>>
>>
>> I don't know enough about the build system to really review this.  If a
>> similar patch has been ACKed and applied for libatomic (71931 states
>> that both are affected), then I guess this is OK?
> 
> I was hoping that someone else might review it, but i can see why no one else 
> can or wants to.
> 
> Sorry for the feet dragging, Ok.  If libatomic has the same problem, and a 
> similar solutions works there, Ok to fix it as well.
> 

the libatomic changes were not accepted:
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01771.html

i'll try a bit different approach
(using automake's EXTRA_DEJAGNU_SITE_CONFIG so the
CC setting is only done in the in-tree site.exp)

> If someone wants to do a little work to make fortran libstdc++ and the other 
> two use the same mechanism, that would be nice.
> 

libstdc++ does something more complicated.



Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Bernd Edlinger
Hi,

I'd like to share some of my thoughts about the max_align_t,
if you don't mind.

I think that for _Float128 to become a fundamental data type,
it is *not* sufficient that gcc supports it alone, because glibc
needs to support also all the necessary math library functions.

When that is done, glibc can also increase the malloc alignment.
That will mean that programs built for glibc-2.24 will not
run on old glibc installations.

I think that glibc-2.24 will still have to support older
gcc versions, and should make malloc alignment be fixed,
even if an application is built using an old gcc version,
or am I wrong, which is possible here?

I have some doubt that it will work to change max_align_t's
alignment only based on the gcc version, I mean
there should be a way to allow the max_align_t keep at
8 bytes if the existing glibc will not support 16 bytes.

I *think*, there should be some kind of hand-shake in the
max_align_t to enable __float128 when gcc supports that,
*and* glibc supports that at the same time.

Like, maybe including some glibc-header, that defines some
macro, what the real malloc alignment will be.  And probably
some builtin-macro, if gcc supports __float128, independently
of the i386, that could probably also be helpful for writing
portable applications.



Bernd.


Re: [PING] Re: [PATCH, i386] Fix some warnings/errors that appear when enabling -Wnarrowing when building gcc

2016-09-06 Thread Eric Gallager
On 9/6/16, Uros Bizjak  wrote:
> On Tue, Sep 6, 2016 at 5:33 PM, Eric Gallager  wrote:
>> Ping? CC-ing an i386 maintainer since the patch mostly touches
>> i386-specific files. Also, to clarify, I say "warnings/errors" because
>> they start off as warnings in stage 1 but then become errors in stage
>> 2. Note also that my patch leaves out the part where I modify the
>> configure script to enable -Wnarrowing, because the rest of the code
>> isn't quite ready for that yet.
>
> You are probably referring to [1]? It looks OK, modulo:
>
> +DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~(0U))
>
> where parenthesis are not needed.
>
>
> Please resubmit the patch with a ChangeLog entry, as instructed in [2]
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02129.html
> [2] https://gcc.gnu.org/contribute.html#patches
>
> Uros.
>


Okay, reattached. Here's a ChangeLog entry to put in gcc/ChangeLog:

2016-09-06  Eric Gallager  

* config/i386/i386.c: Add 'U' suffix to constants to avoid
-Wnarrowing.
* config/i386/x86-tune.def: Likewise.
* opts.c: Likewise.


(Please also note that I don't have commit access.)
Thanks,
Eric
 gcc/config/i386/i386.c   | 60 ++--
 gcc/config/i386/x86-tune.def |  6 ++---
 gcc/opts.c   |  4 +--
 3 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4531647..181fc39 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2162,45 +2162,45 @@ const struct processor_costs *ix86_tune_cost = 
&pentium_cost;
 const struct processor_costs *ix86_cost = &pentium_cost;
 
 /* Processor feature/optimization bitmasks.  */
-#define m_386 (1<

Re: [PATCH 3/4][PR 71931] Fix libitm tests

2016-09-06 Thread Mike Stump
On Sep 6, 2016, at 2:11 AM, Torvald Riegel  wrote:
> 
> On Wed, 2016-08-24 at 20:08 +0100, Szabolcs Nagy wrote:
>> Pass build time CC make var down to dejagnu so the sysroot
>> is set correctly when gcc is built with --with-build-sysroot.
>> 
>> libitm/
>> 2016-08-24  Szabolcs Nagy  
>> 
>>  PR testsuite/71931
>>  * configure.ac: Add AC_CONFIG_FILES.
>>  * configure: Regenerated.
>>  * testuite/Makefile.am: Add rule for libitm-test-support.exp.
>>  * testuite/Makefile.in: Regenerated.
>>  * testuite/libitm-test-support.exp.in: New.
>>  * testuite/lib/libitm.exp (libitm_init): Use BUILD_CC.
>> 
> 
> I don't know enough about the build system to really review this.  If a
> similar patch has been ACKed and applied for libatomic (71931 states
> that both are affected), then I guess this is OK?

I was hoping that someone else might review it, but i can see why no one else 
can or wants to.

Sorry for the feet dragging, Ok.  If libatomic has the same problem, and a 
similar solutions works there, Ok to fix it as well.

If someone wants to do a little work to make fortran libstdc++ and the other 
two use the same mechanism, that would be nice.

Re: [PING] Re: [PATCH, i386] Fix some warnings/errors that appear when enabling -Wnarrowing when building gcc

2016-09-06 Thread Uros Bizjak
On Tue, Sep 6, 2016 at 5:33 PM, Eric Gallager  wrote:
> Ping? CC-ing an i386 maintainer since the patch mostly touches
> i386-specific files. Also, to clarify, I say "warnings/errors" because
> they start off as warnings in stage 1 but then become errors in stage
> 2. Note also that my patch leaves out the part where I modify the
> configure script to enable -Wnarrowing, because the rest of the code
> isn't quite ready for that yet.

You are probably referring to [1]? It looks OK, modulo:

+DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~(0U))

where parenthesis are not needed.


Please resubmit the patch with a ChangeLog entry, as instructed in [2]

[1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02129.html
[2] https://gcc.gnu.org/contribute.html#patches

Uros.

> On 8/31/16, Eric Gallager  wrote:
>> In https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01526.html I tried
>> enabling -Wnarrowing when building GCC and produced a log of the
>> resulting (uniq-ed) warnings/errors. The attached patch here fixes
>> some of them by using the 'U' suffix to make certain constants
>> unsigned so they don't become negative when applying the '~' operator
>> to them. After applying, there were still some narrowing issues
>> remaining that would have required modifying gcc/optc-gen.awk to fix
>> properly, but that looked too complicated so I'm avoiding it for now.
>> Still, at least by patching the files I did patch, I allowed bootstrap
>> to continue a little farther...
>>
>> Thanks,
>> Eric Gallager
>>


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-06 Thread Kyrill Tkachov


On 06/09/16 17:21, Jakub Jelinek wrote:

On Tue, Sep 06, 2016 at 04:59:23PM +0100, Kyrill Tkachov wrote:

On 06/09/16 16:32, Jakub Jelinek wrote:

On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:

The v3 of this patch addresses feedback I received on the version posted at [1].
The merged store buffer is now represented as a char array that we splat values 
onto with
native_encode_expr and native_interpret_expr. This allows us to merge anything 
that native_encode_expr
accepts, including floating point values and short vectors. So this version 
extends the functionality
of the previous one in that it handles floating point values as well.

The first phase of the algorithm that detects the contiguous stores is also 
slightly refactored according
to feedback to read more fluently.

Richi, I experimented with merging up to MOVE_MAX bytes rather than word size 
but I got worse results on aarch64.
MOVE_MAX there is 16 (because it has load/store register pair instructions) but 
the 128-bit immediates that we ended
synthesising were too complex. Perhaps the TImode immediate store RTL 
expansions could be improved, but for now
I've left the maximum merge size to be BITS_PER_WORD.

At least from playing with this kind of things in the RTL PR22141 patch,
I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
constants usually isn't a win (not just for speed optimized blocks but also for
-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
32-bit you need movabsq into some temporary register which has like 3 times 
worse
latency than normal store if I remember well, and then store it.

We could restrict the maximum width of the stores generated to 32 bits on 
x86_64.
I think this would need another parameter or target macro for the target to set.
Alternatively, is it a possibility for x86 to be a bit smarter in its DImode 
mov-immediate
expansion? For example break up the 64-bit movabsq immediate into two SImode 
immediates?

If you want a 64-bit store, you'd need to merge the two, and that would be
even more expensive.  It is a matter of say:
movl $0x12345678, (%rsp)
movl $0x09abcdef, 4(%rsp)
vs.
movabsq $0x09abcdef12345678, %rax
movq %rax, (%rsp)
vs.
movl $0x09abcdef, %eax
salq $32, %rax
orq $0x12345678, %rax
movq $rax, (%rsp)
etc.  Guess it needs to be benchmarked on contemporary CPUs, I'm pretty sure
the last sequence is the worst one.


I'd think the backend could expand to the 1st form given
a DImode move of 0x09abcdef12345678 to a memory destination?
Or restrict the maximum merged store size to 32 bits.
Or use rtx costs to iteratively reduce the size of the stores until
we reach the cheap narrower constants. I suspect the problem there would
be to weigh the benefit of using a narrower cheap constant against the
cost of adding more stores. At the GIMPLE level we don't really want to
be calling RTX costs on anything other than constants.




What alias set is used for the accesses if there are different alias sets
involved in between the merged stores?

As per https://gcc.gnu.org/ml/gcc/2016-06/msg00162.html the type used in those 
cases
would be ptr_type_node. See the get_type_for_merged_store function in the patch.

Richi knows this best.  I just wonder if e.g. all the stores go into fields
of the same structure it wouldn't be better if you need to punt use that
structure as the type rather than alias set 0.


I'm aware of that. The patch already has logic to avoid emitting unaligned 
accesses
for SLOW_UNALIGNED_ACCESS targets. Beyond that the patch introduces the 
parameter
PARAM_STORE_MERGING_ALLOW_UNALIGNED that can be used by the user or target to
forbid generation of unaligned stores by the pass altogether. Beyond that I'm 
not sure
how to behave more intelligently here. Any ideas?

Dunno, the heuristics was the main problem with my patch.  Generally, I'd
say there is a difference between cold and hot blocks, in cold ones perhaps
unaligned stores are more appropriate (if supported at all and not way too
slow), while in hot ones less desirable.


Could do. I'll look into it, thanks.


And, do you have some SPEC2k and/or SPEC2k6 numbers, for
  e.g. x86_64/i686/arm/aarch64/powerpc64le?

I did some benchmarking on aarch64 in the initial submission at
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html
aarch64 showed some improvement and no regressions on x86_64.
I'll be rerunning the numbers on aarch64/x86_64/arm as the patch has expanded
in scope since then (handling more bitfields, floating point constants).
I just wanted to get this version out before the Cauldron for comments.


The RTL PR22141 changes weren't added mainly because it slowed down SPEC2k*
on powerpc.

Unfortunately I don't have access to SPEC on powerpc. Any help with 
testing/benchmarking
there would be very much appreciated.

I don't have SPEC set up on any arch, perhaps you can ask some of the IBM
folks to benchmark

Re: [PATCH][v3] GIMPLE store merging pass

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 04:59:23PM +0100, Kyrill Tkachov wrote:
> On 06/09/16 16:32, Jakub Jelinek wrote:
> >On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:
> >>The v3 of this patch addresses feedback I received on the version posted at 
> >>[1].
> >>The merged store buffer is now represented as a char array that we splat 
> >>values onto with
> >>native_encode_expr and native_interpret_expr. This allows us to merge 
> >>anything that native_encode_expr
> >>accepts, including floating point values and short vectors. So this version 
> >>extends the functionality
> >>of the previous one in that it handles floating point values as well.
> >>
> >>The first phase of the algorithm that detects the contiguous stores is also 
> >>slightly refactored according
> >>to feedback to read more fluently.
> >>
> >>Richi, I experimented with merging up to MOVE_MAX bytes rather than word 
> >>size but I got worse results on aarch64.
> >>MOVE_MAX there is 16 (because it has load/store register pair instructions) 
> >>but the 128-bit immediates that we ended
> >>synthesising were too complex. Perhaps the TImode immediate store RTL 
> >>expansions could be improved, but for now
> >>I've left the maximum merge size to be BITS_PER_WORD.
> >At least from playing with this kind of things in the RTL PR22141 patch,
> >I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
> >constants usually isn't a win (not just for speed optimized blocks but also 
> >for
> >-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
> >32-bit you need movabsq into some temporary register which has like 3 times 
> >worse
> >latency than normal store if I remember well, and then store it.
> 
> We could restrict the maximum width of the stores generated to 32 bits on 
> x86_64.
> I think this would need another parameter or target macro for the target to 
> set.
> Alternatively, is it a possibility for x86 to be a bit smarter in its DImode 
> mov-immediate
> expansion? For example break up the 64-bit movabsq immediate into two SImode 
> immediates?

If you want a 64-bit store, you'd need to merge the two, and that would be
even more expensive.  It is a matter of say:
movl $0x12345678, (%rsp)
movl $0x09abcdef, 4(%rsp)
vs.
movabsq $0x09abcdef12345678, %rax
movq %rax, (%rsp)
vs.
movl $0x09abcdef, %eax
salq $32, %rax
orq $0x12345678, %rax
movq $rax, (%rsp)
etc.  Guess it needs to be benchmarked on contemporary CPUs, I'm pretty sure
the last sequence is the worst one.

> >What alias set is used for the accesses if there are different alias sets
> >involved in between the merged stores?
> 
> As per https://gcc.gnu.org/ml/gcc/2016-06/msg00162.html the type used in 
> those cases
> would be ptr_type_node. See the get_type_for_merged_store function in the 
> patch.

Richi knows this best.  I just wonder if e.g. all the stores go into fields
of the same structure it wouldn't be better if you need to punt use that
structure as the type rather than alias set 0.

> I'm aware of that. The patch already has logic to avoid emitting unaligned 
> accesses
> for SLOW_UNALIGNED_ACCESS targets. Beyond that the patch introduces the 
> parameter
> PARAM_STORE_MERGING_ALLOW_UNALIGNED that can be used by the user or target to
> forbid generation of unaligned stores by the pass altogether. Beyond that I'm 
> not sure
> how to behave more intelligently here. Any ideas?

Dunno, the heuristics was the main problem with my patch.  Generally, I'd
say there is a difference between cold and hot blocks, in cold ones perhaps
unaligned stores are more appropriate (if supported at all and not way too
slow), while in hot ones less desirable.
> 
> >And, do you have some SPEC2k and/or SPEC2k6 numbers, for
> >  e.g. x86_64/i686/arm/aarch64/powerpc64le?
> 
> I did some benchmarking on aarch64 in the initial submission at
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html
> aarch64 showed some improvement and no regressions on x86_64.
> I'll be rerunning the numbers on aarch64/x86_64/arm as the patch has expanded
> in scope since then (handling more bitfields, floating point constants).
> I just wanted to get this version out before the Cauldron for comments.
> 
> >The RTL PR22141 changes weren't added mainly because it slowed down SPEC2k*
> >on powerpc.
> 
> Unfortunately I don't have access to SPEC on powerpc. Any help with 
> testing/benchmarking
> there would be very much appreciated.

I don't have SPEC set up on any arch, perhaps you can ask some of the IBM
folks to benchmark it for you (that is what I've done back when writing the
patch)?

BTW, do you handle bitfield stores too?  Once you don't support just
constant stores, for bitfields but perhaps also for other adjacent
operations not just copying, but e.g. also bitwise operations (&,|,^,~)
on adjacent memory locations might be useful, or for bitfields even other
operations (e.g. addition if there are enough bits 

Re: [PATCH][v3] GIMPLE store merging pass

2016-09-06 Thread Kyrill Tkachov

Hi Jakub,

On 06/09/16 16:32, Jakub Jelinek wrote:

On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:

The v3 of this patch addresses feedback I received on the version posted at [1].
The merged store buffer is now represented as a char array that we splat values 
onto with
native_encode_expr and native_interpret_expr. This allows us to merge anything 
that native_encode_expr
accepts, including floating point values and short vectors. So this version 
extends the functionality
of the previous one in that it handles floating point values as well.

The first phase of the algorithm that detects the contiguous stores is also 
slightly refactored according
to feedback to read more fluently.

Richi, I experimented with merging up to MOVE_MAX bytes rather than word size 
but I got worse results on aarch64.
MOVE_MAX there is 16 (because it has load/store register pair instructions) but 
the 128-bit immediates that we ended
synthesising were too complex. Perhaps the TImode immediate store RTL 
expansions could be improved, but for now
I've left the maximum merge size to be BITS_PER_WORD.

At least from playing with this kind of things in the RTL PR22141 patch,
I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
constants usually isn't a win (not just for speed optimized blocks but also for
-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
32-bit you need movabsq into some temporary register which has like 3 times 
worse
latency than normal store if I remember well, and then store it.


We could restrict the maximum width of the stores generated to 32 bits on 
x86_64.
I think this would need another parameter or target macro for the target to set.
Alternatively, is it a possibility for x86 to be a bit smarter in its DImode 
mov-immediate
expansion? For example break up the 64-bit movabsq immediate into two SImode 
immediates?


   If it can
be CSEd and the same constant used multiple times in adjacent code perhaps.


Perhaps. From glancing at SPEC2006 generated code for aarch64 I didn't spot too 
many opportunities
for that though.


Various other targets have different costs for different constants,
so it would be nice if the pass considered that (computed RTX costs of those
constants and used that in some heuristics).


Could do. That could avoid creating too expensive immediates.


What alias set is used for the accesses if there are different alias sets
involved in between the merged stores?


As per https://gcc.gnu.org/ml/gcc/2016-06/msg00162.html the type used in those 
cases
would be ptr_type_node. See the get_type_for_merged_store function in the patch.


Also alignment can matter, even on non-strict alignment targets (speed vs.
-Os for that).


I'm aware of that. The patch already has logic to avoid emitting unaligned 
accesses
for SLOW_UNALIGNED_ACCESS targets. Beyond that the patch introduces the 
parameter
PARAM_STORE_MERGING_ALLOW_UNALIGNED that can be used by the user or target to
forbid generation of unaligned stores by the pass altogether. Beyond that I'm 
not sure
how to behave more intelligently here. Any ideas?


And, do you have some SPEC2k and/or SPEC2k6 numbers, for
  e.g. x86_64/i686/arm/aarch64/powerpc64le?


I did some benchmarking on aarch64 in the initial submission at
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html
aarch64 showed some improvement and no regressions on x86_64.
I'll be rerunning the numbers on aarch64/x86_64/arm as the patch has expanded
in scope since then (handling more bitfields, floating point constants).
I just wanted to get this version out before the Cauldron for comments.


The RTL PR22141 changes weren't added mainly because it slowed down SPEC2k*
on powerpc.


Unfortunately I don't have access to SPEC on powerpc. Any help with 
testing/benchmarking
there would be very much appreciated.


Also, do you only handle constants or also the case where there is partial
or complete copying from some other memory, where it could be turned into
larger chunk loads + stores or __builtin_memcpy?


At the moment just constants. I hope in the future to extend it to perform more 
tricks
involving contiguous stores.


I've disabled the pass for PDP-endian targets as the merging code proved to be 
quite fiddly to get right for different
endiannesses and I didn't feel comfortable writing logic for BYTES_BIG_ENDIAN 
!= WORDS_BIG_ENDIAN targets without serious
testing capabilities. I hope that's ok (I note the bswap pass also doesn't try 
to do anything on such targets).

I think that is fine, it isn't the only pass that punts in this case.


Thanks for the comments and ideas,
Kyrill


Jakub





Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Paul Eggert

On 09/06/2016 08:16 AM, Joseph Myers wrote:

I don't think there's any such requirement in the case of flexible array
members; if you use malloc to allocate a structure with a flexible array
member, you can access as many trailing array elements as would fit within
the allocated size, whether or not that size is a multiple of either the
alignment of the structure, or the alignment of max_align_t.


Although I would prefer you to be correct I'm afraid the standard 
doesn't agree, as C11's wording appears to allow the GCC optimization in 
question. (At least, draft N1570 does; I don't have the final standard.) 
C11 section 6.7.2.1 paragraph 18 says:


"when a . (or ->) operator has a left operand that is (a pointer to) a 
structure with a flexible array member and the right operand names that 
member, it behaves as if that member were replaced with the longest 
array (with the same element type) that would not make the structure 
larger than the object being accessed"


Suppose 'short' size and alignment is 2, and the following code 
allocates and uses 7 bytes:


  struct s { short n; char a[]; } *p = malloc (offsetof (struct s, a) + 5);

  return &p->a[5];

A strict reading of C11 says this code is supposed to behave as if 'char 
a[];' were replaced by 'char a[4];'. It is not the 'char a[5]' that one 
might expect, because using 'char a[5];' would mean the size of struct s 
(after alignment) would be 8, whereas the size of the object being 
accessed is only 7. Hence the above return expression has a subscript error.


One way to correct the code is to increase malloc's argument up to a 
multiple of alignof(max_align_t). (One cannot portably use 
alignof(struct s) due to C11's prohibition of using alignof on 
incomplete types.) This is why gnulib/lib/fts.c uses max_align_t.




[PING] Re: [PATCH, i386] Fix some warnings/errors that appear when enabling -Wnarrowing when building gcc

2016-09-06 Thread Eric Gallager
Ping? CC-ing an i386 maintainer since the patch mostly touches
i386-specific files. Also, to clarify, I say "warnings/errors" because
they start off as warnings in stage 1 but then become errors in stage
2. Note also that my patch leaves out the part where I modify the
configure script to enable -Wnarrowing, because the rest of the code
isn't quite ready for that yet.

On 8/31/16, Eric Gallager  wrote:
> In https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01526.html I tried
> enabling -Wnarrowing when building GCC and produced a log of the
> resulting (uniq-ed) warnings/errors. The attached patch here fixes
> some of them by using the 'U' suffix to make certain constants
> unsigned so they don't become negative when applying the '~' operator
> to them. After applying, there were still some narrowing issues
> remaining that would have required modifying gcc/optc-gen.awk to fix
> properly, but that looked too complicated so I'm avoiding it for now.
> Still, at least by patching the files I did patch, I allowed bootstrap
> to continue a little farther...
>
> Thanks,
> Eric Gallager
>


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:
> The v3 of this patch addresses feedback I received on the version posted at 
> [1].
> The merged store buffer is now represented as a char array that we splat 
> values onto with
> native_encode_expr and native_interpret_expr. This allows us to merge 
> anything that native_encode_expr
> accepts, including floating point values and short vectors. So this version 
> extends the functionality
> of the previous one in that it handles floating point values as well.
> 
> The first phase of the algorithm that detects the contiguous stores is also 
> slightly refactored according
> to feedback to read more fluently.
> 
> Richi, I experimented with merging up to MOVE_MAX bytes rather than word size 
> but I got worse results on aarch64.
> MOVE_MAX there is 16 (because it has load/store register pair instructions) 
> but the 128-bit immediates that we ended
> synthesising were too complex. Perhaps the TImode immediate store RTL 
> expansions could be improved, but for now
> I've left the maximum merge size to be BITS_PER_WORD.

At least from playing with this kind of things in the RTL PR22141 patch,
I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
constants usually isn't a win (not just for speed optimized blocks but also for
-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
32-bit you need movabsq into some temporary register which has like 3 times 
worse
latency than normal store if I remember well, and then store it.  If it can
be CSEd and the same constant used multiple times in adjacent code perhaps.
Various other targets have different costs for different constants,
so it would be nice if the pass considered that (computed RTX costs of those
constants and used that in some heuristics).
What alias set is used for the accesses if there are different alias sets
involved in between the merged stores?
Also alignment can matter, even on non-strict alignment targets (speed vs.
-Os for that).
And, do you have some SPEC2k and/or SPEC2k6 numbers, for
 e.g. x86_64/i686/arm/aarch64/powerpc64le?
The RTL PR22141 changes weren't added mainly because it slowed down SPEC2k*
on powerpc.
Also, do you only handle constants or also the case where there is partial
or complete copying from some other memory, where it could be turned into
larger chunk loads + stores or __builtin_memcpy?

> I've disabled the pass for PDP-endian targets as the merging code proved to 
> be quite fiddly to get right for different
> endiannesses and I didn't feel comfortable writing logic for BYTES_BIG_ENDIAN 
> != WORDS_BIG_ENDIAN targets without serious
> testing capabilities. I hope that's ok (I note the bswap pass also doesn't 
> try to do anything on such targets).

I think that is fine, it isn't the only pass that punts in this case.

Jakub


[RFC,PATCH] Using equivalences to help eliminate_regs_in_insn

2016-09-06 Thread Matthew Fortune
I've found a case of suboptimal code coming from LRA when we have a large
frame and have the frame grow downwards with MIPS. A good testcase for
this is the qsort benchmark from the automotive section of mibench[1].

Any frame access where the offset is not supported as an add-immediate
should benefit from this fix. It will probably have little or no effect
when frames grow upwards.  The patch is just prototype for now as it
has had very little testing beyond it looking like it does the right
thing.

@Vladimir: The affected code is in eliminate_regs_in_insn where
there is a comment (history suggests you wrote it) that says:

  /* We allow one special case which happens to work on all machines we
 currently support: a single set with the source or a REG_EQUAL
 note being a PLUS of an eliminable register and a constant.  */

There is an implementation that optimises a single set but not one for
a REG_EQUAL. Do you have any recollection of this code and do you think
there was a reason you didn't implement the REG_EQUAL case? Either way
any advice on my approach is welcome.

Below is an example of RTL pre/post LRA both without and with my patch
applied:

== RTL after IRA ==
(insn 96 95 97 4 (set (reg:SI 299)
(const_int -1441792 [0xffea])) 310 {*movsi_internal}
 (expr_list:REG_EQUIV (const_int -1441792 [0xffea])
(nil)))
(insn 97 96 98 4 (set (reg:SI 298)
(plus:SI (reg:SI 299)
(const_int 1792 [0x700]))) 13 {*addsi3}
 (expr_list:REG_DEAD (reg:SI 299)
(expr_list:REG_EQUIV (const_int -144 [0xffea0700])
(nil
(insn 98 97 71 4 (set (reg:SI 282 [ ivtmp.21 ])
(plus:SI (reg/f:SI 78 $frame)
(reg:SI 298))) 13 {*addsi3}
 (expr_list:REG_DEAD (reg:SI 298)
(nil)))

== RTL after LRA ==

(insn 96 95 97 4 (set (reg:SI 16 $16 [299])
(const_int -1441792 [0xffea])) 310 {*movsi_internal}
 (expr_list:REG_EQUIV (const_int -1441792 [0xffea])
(nil)))
(insn 97 96 249 4 (set (reg:SI 16 $16 [298])
(plus:SI (reg:SI 16 $16 [299])
(const_int 1792 [0x700]))) 13 {*addsi3}
 (expr_list:REG_EQUIV (const_int -144 [0xffea0700])
(nil)))
(insn 249 97 250 4 (set (reg:SI 3 $3 [342])
(const_int 1376256 [0x15])) 310 {*movsi_internal}
 (nil))
(insn 250 249 289 4 (set (reg:SI 2 $2 [341])
(ior:SI (reg:SI 3 $3 [342])
(const_int 63776 [0xf920]))) 185 {*iorsi3}
 (expr_list:REG_EQUAL (const_int 1440032 [0x15f920])
(nil)))
(insn 289 250 251 4 (set (reg:SI 4 $4 [363])
(plus:SI (reg/f:SI 29 $sp)
(const_int 24 [0x18]))) 13 {*addsi3}
 (nil))
(insn 251 289 98 4 (set (reg:SI 2 $2 [341])
(plus:SI (reg:SI 2 $2 [341])
(reg:SI 4 $4 [363]))) 13 {*addsi3}
 (nil))
(insn 98 251 71 4 (set (reg:SI 16 $16 [orig:282 ivtmp.21 ] [282])
(plus:SI (reg:SI 2 $2 [341])
(reg:SI 16 $16 [298]))) 13 {*addsi3}
 (nil))

== new RTL after LRA ==

(note 96 95 97 4 NOTE_INSN_DELETED)
(note 97 96 98 4 NOTE_INSN_DELETED)
(insn 98 97 71 4 (set (reg:SI 16 $16 [orig:282 ivtmp.21 ] [282])
(plus:SI (reg/f:SI 29 $sp)
(const_int 48 [0x30]))) 13 {*addsi3}
 (nil))

The frame is 8 bytes smaller due to one less spill which is why
32+24 != 48.  Obviously the code size improvement when this triggers
is quite good. Build options: mips-mti-elf-gcc -Os -mmicromips
-mabi=32 -msoft-float -EL -mips32r2 -fstack-protector

The -fstack-protector forces the frame to grow downwards.

Any advice/comments welcome...

Thanks,
Matthew

[1] http://vhosts.eecs.umich.edu/mibench//source.html

gcc/

* lra-eliminations.c (remove_reg_equal_offset_note): Use
ira_reg_equiv to optimise offsets when eliminating the
frame pointer.

diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index 08cc390..5424de6 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -918,7 +918,7 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, 
bool first_p,
   rtx substed_operand[MAX_RECOG_OPERANDS];
   rtx orig_operand[MAX_RECOG_OPERANDS];
   struct lra_elim_table *ep;
-  rtx plus_src, plus_cst_src;
+  rtx plus_src, plus_src_reg, plus_src_cst;
   lra_insn_recog_data_t id;
   struct lra_static_insn_data *static_id;
 
@@ -1006,7 +1006,7 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, 
bool first_p,
   /* We allow one special case which happens to work on all machines we
  currently support: a single set with the source or a REG_EQUAL
  note being a PLUS of an eliminable register and a constant.  */
-  plus_src = plus_cst_src = 0;
+  plus_src = plus_src_reg = plus_src_cst = 0;
   if (old_set && REG_P (SET_DEST (old_set)))
 {
   if (GET_CODE (SET_SRC (old_set)) == PLUS)
@@ -1014,24 +1014,59 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, 
bool first_p,
   /* First see if the source is of the form (

Re: [PATCH, i386] Fix ICE with a md builtin call (PR target/69255)

2016-09-06 Thread Uros Bizjak
On Tue, Sep 6, 2016 at 5:06 PM, Jakub Jelinek  wrote:
> On Mon, Sep 05, 2016 at 08:58:12PM +0200, Uros Bizjak wrote:
>> ... perhaps we can emit a warning here and expand the builtin as a
>> call? This way, we will mirror the case when builtin requires some
>> ISA, e.g.:
>>
>> void foo ()
>> {
>>   __builtin_ia32_stmxcsr();
>> }
>>
>> $ gcc -S -mno-sse dd.c
>> dd.c: In function ‘foo’:
>> dd.c:3:3: warning: implicit declaration of function
>> ‘__builtin_ia32_stmxcsr’ [-Wimplicit-function-declaration]
>>__builtin_ia32_stmxcsr();
>>^~
>
> I'm not sure it is a good idea to emit a warning, we are in the backend
> and we don't know what kind of warning/error the FE would emit if the
> builtin isn't implicitly prototyped.  E.g. C++ would certainly error out,
> C with -pedantic-errors would too, etc., it is simply too hard to replicate
> such behavior this late.

OK.

> Using expand_call works too (only tested on the testcases so far, but can test
> it fully), just had to add some expected warnings for the return and
> argument type -Wpsabi stuff.

Yes, this is what compiling testcase without using #pragmas warns.

> 2016-09-05  Jakub Jelinek  
>
> PR target/69255
> * config/i386/i386.c (ix86_expand_builtin): For builtin with
> unsupported or unknown ISA, use expand_call.
>
> * gcc.target/i386/pr69255-1.c: New test.
> * gcc.target/i386/pr69255-2.c: New test.
> * gcc.target/i386/pr69255-3.c: New test.

OK for mainline and release branches.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2016-09-05 19:27:03.287671306 +0200
> +++ gcc/config/i386/i386.c  2016-09-06 16:55:35.524605779 +0200
> @@ -36107,7 +36107,7 @@ ix86_expand_builtin (tree exp, rtx targe
>   error ("%qE needs isa option %s", fndecl, opts);
>   free (opts);
> }
> -  return const0_rtx;
> +  return expand_call (exp, target, ignore);
>  }
>
>switch (fcode)
> --- gcc/testsuite/gcc.target/i386/pr69255-1.c.jj2016-09-06 
> 16:54:01.793783159 +0200
> +++ gcc/testsuite/gcc.target/i386/pr69255-1.c   2016-09-06 16:59:53.317370510 
> +0200
> @@ -0,0 +1,17 @@
> +/* PR target/69255 */
> +/* { dg-do compile } */
> +/* { dg-options "-mno-avx512f" } */
> +
> +#pragma GCC target "avx512vl"
> +#pragma GCC target "no-avx512vl"
> +__attribute__ ((__vector_size__ (32))) long long a;
> +__attribute__ ((__vector_size__ (16))) int b;
> +
> +void
> +foo (const long long *p)
> +{
> +  a = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error "needs 
> isa option -m32 -mavx512vl" } */
> +}
> +
> +/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
> target *-*-* } 13 } */
> +/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" 
> { target *-*-* } 13 } */
> --- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj2016-09-06 
> 16:54:01.793783159 +0200
> +++ gcc/testsuite/gcc.target/i386/pr69255-2.c   2016-09-06 17:00:04.229233625 
> +0200
> @@ -0,0 +1,17 @@
> +/* PR target/69255 */
> +/* { dg-do compile } */
> +/* { dg-options "-mno-avx512f" } */
> +
> +#pragma GCC target "avx512vl"
> +#pragma GCC target ""
> +__attribute__ ((__vector_size__ (32))) long long a;
> +__attribute__ ((__vector_size__ (16))) int b;
> +
> +void
> +foo (const long long *p)
> +{
> +  __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error 
> "needs isa option -m32 -mavx512vl" } */
> +}
> +
> +/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
> target *-*-* } 13 } */
> +/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" 
> { target *-*-* } 13 } */
> --- gcc/testsuite/gcc.target/i386/pr69255-3.c.jj2016-09-06 
> 16:54:01.794783146 +0200
> +++ gcc/testsuite/gcc.target/i386/pr69255-3.c   2016-09-06 17:00:14.915099574 
> +0200
> @@ -0,0 +1,17 @@
> +/* PR target/69255 */
> +/* { dg-do compile } */
> +/* { dg-options "-mno-avx512f" } */
> +
> +#pragma GCC target "avx512vl"
> +#pragma GCC target ""
> +__attribute__ ((__vector_size__ (32))) long long a;
> +__attribute__ ((__vector_size__ (16))) int b;
> +
> +void
> +foo (const long long *p, __attribute__ ((__vector_size__ (32))) long long *q)
> +{
> +  *q = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);   /* { dg-error "needs 
> isa option -m32 -mavx512vl" } */
> +}
> +
> +/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
> target *-*-* } 13 } */
> +/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" 
> { target *-*-* } 13 } */
>
>
> Jakub


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Florian Weimer wrote:

> So that's what ties the two things together.  I still don't like what's
> implied in PR1, that all object sizes have to be multiples of the
> fundamental alignment.

I don't think there's any such requirement in the case of flexible array 
members; if you use malloc to allocate a structure with a flexible array 
member, you can access as many trailing array elements as would fit within 
the allocated size, whether or not that size is a multiple of either the 
alignment of the structure, or the alignment of max_align_t.

> > Well, that's a conformance bug in the implementation as a whole.  The
> > nonconforming modes in question are still useful and it's useful for GCC
> > to support such mallocs.
> 
> PR1 shows that GCC does not want to support such mallocs (or even glibc's
> malloc).

GCC is supposed to support all mallocs that produce results aligned to at 
least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of 
max_align_t).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH][v3] GIMPLE store merging pass

2016-09-06 Thread Kyrill Tkachov

Hi all,

The v3 of this patch addresses feedback I received on the version posted at [1].
The merged store buffer is now represented as a char array that we splat values 
onto with
native_encode_expr and native_interpret_expr. This allows us to merge anything 
that native_encode_expr
accepts, including floating point values and short vectors. So this version 
extends the functionality
of the previous one in that it handles floating point values as well.

The first phase of the algorithm that detects the contiguous stores is also 
slightly refactored according
to feedback to read more fluently.

Richi, I experimented with merging up to MOVE_MAX bytes rather than word size 
but I got worse results on aarch64.
MOVE_MAX there is 16 (because it has load/store register pair instructions) but 
the 128-bit immediates that we ended
synthesising were too complex. Perhaps the TImode immediate store RTL 
expansions could be improved, but for now
I've left the maximum merge size to be BITS_PER_WORD.

I've disabled the pass for PDP-endian targets as the merging code proved to be 
quite fiddly to get right for different
endiannesses and I didn't feel comfortable writing logic for BYTES_BIG_ENDIAN 
!= WORDS_BIG_ENDIAN targets without serious
testing capabilities. I hope that's ok (I note the bswap pass also doesn't try 
to do anything on such targets).

Tested on arm, aarch64, x86_64 and on big-endian arm and aarch64.

How does this version look?
Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01512.html

2016-09-06  Kyrylo Tkachov  

PR middle-end/22141
* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
* common.opt (fstore-merging): New Optimization option.
* opts.c (default_options_table): Add entry for
OPT_ftree_store_merging.
* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
* passes.def: Insert pass_tree_store_merging.
* tree-pass.h (make_pass_store_merging): Declare extern
prototype.
* gimple-ssa-store-merging.c: New file.
* doc/invoke.texi (Optimization Options): Document
-fstore-merging.

2016-09-06  Kyrylo Tkachov  
Jakub Jelinek  

PR middle-end/22141
* gcc.c-torture/execute/pr22141-1.c: New test.
* gcc.c-torture/execute/pr22141-2.c: Likewise.
* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
* gcc.target/aarch64/ldp_stp_4.c: Likewise.
* gcc.dg/store_merging_1.c: New test.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_3.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise.
* gcc.dg/store_merging_6.c: Likewise.
* gcc.target/i386/pr22141.c: Likewise.
* gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 5e3cbe5d736e2cf9429df989d9d95708b1243011..ca2415b79de60c9be44871777d57bcc8d5c07487 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1295,6 +1295,7 @@ OBJS = \
 	gimple-ssa-isolate-paths.o \
 	gimple-ssa-nonnull-compare.o \
 	gimple-ssa-split-paths.o \
+	gimple-ssa-store-merging.o \
 	gimple-ssa-strength-reduction.o \
 	gimple-streamer-in.o \
 	gimple-streamer-out.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 2004fbc7d0d7b1ca65d06cee0f7398f2c9e3fc64..d951ae7cd6fe8fdae17e17b8f5be7c636ea1986d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1452,6 +1452,10 @@ fstrict-volatile-bitfields
 Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
 Force bitfield accesses to match their type width.
 
+fstore-merging
+Common Var(flag_store_merging) Optimization
+Use the tree store merging pass.
+
 fguess-branch-probability
 Common Report Var(flag_guess_branch_prob) Optimization
 Enable guessing of branch probabilities.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 87da1f1c12b718fa63c9b89fdd8f85fbc6b54cb0..2e2a9fdd92733beb6948efc7a5c4b3fc79da73a9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -401,7 +401,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
--fstdarg-opt -fstrict-aliasing @gol
+-fstdarg-opt -fstore-merging -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
@@ -412,8 +412,8 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp -funconstrained-commons @gol
+-ftree-switch-conversion -ftree-tail-merge @gol
+-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-math-optimizations -fun

Re: [PATCH, i386] Fix ICE with a md builtin call (PR target/69255)

2016-09-06 Thread Jakub Jelinek
On Mon, Sep 05, 2016 at 08:58:12PM +0200, Uros Bizjak wrote:
> ... perhaps we can emit a warning here and expand the builtin as a
> call? This way, we will mirror the case when builtin requires some
> ISA, e.g.:
> 
> void foo ()
> {
>   __builtin_ia32_stmxcsr();
> }
> 
> $ gcc -S -mno-sse dd.c
> dd.c: In function ‘foo’:
> dd.c:3:3: warning: implicit declaration of function
> ‘__builtin_ia32_stmxcsr’ [-Wimplicit-function-declaration]
>__builtin_ia32_stmxcsr();
>^~

I'm not sure it is a good idea to emit a warning, we are in the backend
and we don't know what kind of warning/error the FE would emit if the
builtin isn't implicitly prototyped.  E.g. C++ would certainly error out,
C with -pedantic-errors would too, etc., it is simply too hard to replicate
such behavior this late.

Using expand_call works too (only tested on the testcases so far, but can test
it fully), just had to add some expected warnings for the return and
argument type -Wpsabi stuff.

2016-09-05  Jakub Jelinek  

PR target/69255
* config/i386/i386.c (ix86_expand_builtin): For builtin with
unsupported or unknown ISA, use expand_call.

* gcc.target/i386/pr69255-1.c: New test.
* gcc.target/i386/pr69255-2.c: New test.
* gcc.target/i386/pr69255-3.c: New test.

--- gcc/config/i386/i386.c.jj   2016-09-05 19:27:03.287671306 +0200
+++ gcc/config/i386/i386.c  2016-09-06 16:55:35.524605779 +0200
@@ -36107,7 +36107,7 @@ ix86_expand_builtin (tree exp, rtx targe
  error ("%qE needs isa option %s", fndecl, opts);
  free (opts);
}
-  return const0_rtx;
+  return expand_call (exp, target, ignore);
 }
 
   switch (fcode)
--- gcc/testsuite/gcc.target/i386/pr69255-1.c.jj2016-09-06 
16:54:01.793783159 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-1.c   2016-09-06 16:59:53.317370510 
+0200
@@ -0,0 +1,17 @@
+/* PR target/69255 */
+/* { dg-do compile } */
+/* { dg-options "-mno-avx512f" } */
+
+#pragma GCC target "avx512vl"
+#pragma GCC target "no-avx512vl"
+__attribute__ ((__vector_size__ (32))) long long a;
+__attribute__ ((__vector_size__ (16))) int b;
+
+void
+foo (const long long *p)
+{
+  a = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error "needs 
isa option -m32 -mavx512vl" } */
+}
+
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
--- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj2016-09-06 
16:54:01.793783159 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-2.c   2016-09-06 17:00:04.229233625 
+0200
@@ -0,0 +1,17 @@
+/* PR target/69255 */
+/* { dg-do compile } */
+/* { dg-options "-mno-avx512f" } */
+
+#pragma GCC target "avx512vl"
+#pragma GCC target ""
+__attribute__ ((__vector_size__ (32))) long long a;
+__attribute__ ((__vector_size__ (16))) int b;
+
+void
+foo (const long long *p)
+{
+  __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error 
"needs isa option -m32 -mavx512vl" } */
+}
+
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
--- gcc/testsuite/gcc.target/i386/pr69255-3.c.jj2016-09-06 
16:54:01.794783146 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-3.c   2016-09-06 17:00:14.915099574 
+0200
@@ -0,0 +1,17 @@
+/* PR target/69255 */
+/* { dg-do compile } */
+/* { dg-options "-mno-avx512f" } */
+
+#pragma GCC target "avx512vl"
+#pragma GCC target ""
+__attribute__ ((__vector_size__ (32))) long long a;
+__attribute__ ((__vector_size__ (16))) int b;
+
+void
+foo (const long long *p, __attribute__ ((__vector_size__ (32))) long long *q)
+{
+  *q = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);   /* { dg-error "needs 
isa option -m32 -mavx512vl" } */
+}
+
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */


Jakub


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Florian Weimer

On 09/06/2016 01:25 PM, Joseph Myers wrote:

On Tue, 6 Sep 2016, Florian Weimer wrote:


Why aren't there any users?  The standard isn't very clear what the value of
_Alignof (max_align_t) actually means.  Does the phrase “all contexts” include
pointers returned malloc?  Even if the requested size is smaller than the
fundamental alignment?  Did those who wrote the standard expect there to be
*any* relationship between malloc and max_align_t?


See my cleanup of the wording in DR#445
, which
is intended to reflect the intent and stay compatible with C99.  malloc
should be usable to allocate memory for any type from the standard library
headers, including max_align_t.


And the old text for malloc says:

“
The pointer returned if the allocation succeeds is suitably aligned so 
that it may be assigned to a pointer to any type of object with a 
fundamental alignment requirement and then used to access such an object 
or an array of such objects in the space allocated (until the space is 
explicitly deallocated).

”

So that's what ties the two things together.  I still don't like what's 
implied in PR1, that all object sizes have to be multiples of the 
fundamental alignment.



The existing situation is that most mallocs to do not provide _Alignof
(max_align_t) alignment unconditionally.  But they can provide arbitrarily
large alignment with aligned_alloc/memalign-style interfaces.


Well, that's a conformance bug in the implementation as a whole.  The
nonconforming modes in question are still useful and it's useful for GCC
to support such mallocs.


PR1 shows that GCC does not want to support such mallocs (or even 
glibc's malloc).


Florian



Re: [DOC] Boolean -> boolean changes

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Marek Polacek wrote:

>   * doc/tm.texi: Likewise.

You can't modify tm.texi directly; you have to modify whichever is 
appropriate of target.def and tm.texi.in, then regenerate tm.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Florian Weimer

On 09/06/2016 01:59 PM, Bernd Schmidt wrote:

On 09/06/2016 11:11 AM, Florian Weimer wrote:


I think this change is only safe because nothing relies on it.
max_align_t is a committee invention with no actual users.


I tried to verify that and ran grep over all the packages in
/usr/portage/distfiles. That did get a number of matches, fairly obvious
and expected ones like gcc and glibc, and several more which match only
because gnulib seems to try to provide a default, but I also found a few
real uses:

grep-2.25/lib/fts.c:
/* Align the allocation size so that it works for FTSENT,
   so that trailing padding may be referenced by direct access
   to the flexible array members, without triggering undefined
behavior
   by accessing bytes beyond the heap allocation.  This implicit
access
   was seen for example with ISDOT() and GCC 5.1.1 at -O2.
   Do not use alignof (FTSENT) here, since C11 prohibits
   taking the alignment of a structure containing a flexible
   array member.  */
len += alignof (max_align_t) - 1;
len &= ~ (alignof (max_align_t) - 1);


The context looks like this:

  /*
   * The file name is a variable length array.  Allocate the FTSENT
   * structure and the file name in one chunk.
   */
  len = offsetof(FTSENT, fts_name) + namelen + 1;
  /* Align the allocation size so that it works for FTSENT,
 so that trailing padding may be referenced by direct access
 to the flexible array members, without triggering undefined behavior
 by accessing bytes beyond the heap allocation.  This implicit access
 was seen for example with ISDOT() and GCC 5.1.1 at -O2.
 Do not use alignof (FTSENT) here, since C11 prohibits
 taking the alignment of a structure containing a flexible
 array member.  */
  len += alignof (max_align_t) - 1;
  len &= ~ (alignof (max_align_t) - 1);
  if ((p = malloc(len)) == NULL)
  return (NULL);

This code was added to work around a GCC bug:

  

It's rather dubious that this fix is necessary or even correct.  Where 
would this stop?  Would


  char buf[23];
  memset (buf, 'X', sizeof buf);
  strndup (buf, 23);

have to allocate 32 bytes so that the object size is a multiple of 16 on 
x86_64 (where alignof (max_align_t) is 16)?



libvpx-v1.3.0/nestegg/halloc/src/halloc.c:
/*
 *  block control header
 */
typedef struct hblock
{
#ifndef NDEBUG
#define HH_MAGIC0x20040518L
long  magic;
#endif
hlist_item_t  siblings; /* 2 pointers */
hlist_head_t  children; /* 1 pointer  */
max_align_t   data[1];  /* not allocated, see below */

} hblock_t;


This is a false hit.  Upstream is here:

  https://github.com/apankrat/halloc

src/align.h has:

union max_align
{
char   c;
short  s;
long   l;
inti;
float  f;
double d;
void * v;
void (*q)(void);
};

typedef union max_align max_align_t;

I'm not sure if it is an attempt to emulate max_align_t, or if it is a 
name collision.  It's from 2011, so perhaps the latter.  It's certainly 
incorrect for x86_64.  It works by accident in the halloc context 
because the object header is four words large, so it preserves the 
alignment of the underlying allocator.


Florian


Re: [PATCH][RFC] Early LTO debug

2016-09-06 Thread Markus Trippelsdorf
On 2016.09.06 at 14:26 +0200, Richard Biener wrote:
> On Tue, 6 Sep 2016, Markus Trippelsdorf wrote:
> 
> > On 2016.09.06 at 13:17 +0200, Richard Biener wrote:
> > > 
> > > The following is an updated patch, mainly stripped out old unnecessary
> > > cruft and some fixes for an issue that arised when LTOing Firefox.
> > 
> > One minor issue that I've noticed is that the patch generates a lot of
> > empty *debugobj* files in $TMPDIR, e.g.:
> > 
> >  % echo 'int main(){}' | g++ -flto -o /dev/null-x c++ -
> >  % ls -l $TMPDIR
> > total 0
> > -rw---. 1 trippels trippels 0 Sep  6 12:11 ccenD5Tcdebugobj
> > -rw---. 1 trippels trippels 0 Sep  6 12:11 ccXzvE4udebugobjtem
> 
> Ah, make_temp_file actually _creates_ the files already...
> 
> Fixed with the patch below.

Thanks, it takes care of the debugobjtems, but still creates bogus
debugobjs.

-- 
Markus


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 03:45:09PM +0200, Martin Liška wrote:
> --- a/libgcc/libgcov-profiler.c
> +++ b/libgcc/libgcov-profiler.c
> @@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  .  */
>  
>  #include "libgcov.h"
> +#include "auto-target.h"
>  #if !defined(inhibit_libc)
>  
> +/* Detect whether target can support atomic update of profilers.  */
> +#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#define GCOV_SUPPORTS_ATOMIC 0
> +#endif
> +#endif

One more thing.  This is always compiled by gcc, so I think you don't want
to include auto-target.h and use __SIZEOF_LONG_LONG__ == 4 and
__SIZEOF_LONG_LONG__ == 8 tests instead of LONG_LONG_TYPE_SIZE <= 32
or LONG_LONG_TYPE_SIZE > 32.  Ok with that change.

Jakub


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Martin Liška
On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
> sizeof (gcov_type) talks about the host gcov type, you want instead the
> target gcov type.  So
> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
> == 4 vs. 8).
> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
> 
>   Jakub

Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.

Ready for trunk?
Martin
>From 744d1688fee0359314d87d948323f58fbca6172e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 6 Sep 2016 14:35:52 +0200
Subject: [PATCH] [PATCH] Detect whether target can use -fprofile-update=atomic

libgcc/ChangeLog:

2016-09-06  Martin Liska  

	* libgcov-profiler.c: Use __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{4,8} to
	conditionaly enable/disable *_atomic functions.

gcc/ChangeLog:

2016-09-06  Martin Liska  

	* tree-profile.c (tree_profiling): Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-09-06  Martin Liska  

	* gcc.dg/profile-update-warning.c: New test.
---
 gcc/testsuite/gcc.dg/profile-update-warning.c |  7 ++
 gcc/tree-profile.c| 35 +++
 libgcc/libgcov-profiler.c | 24 ++
 3 files changed, 61 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..69b48e5 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -528,6 +528,20 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
+#ifndef HAVE_sync_compare_and_swapsi
+#define HAVE_sync_compare_and_swapsi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapsi
+#define HAVE_atomic_compare_and_swapsi 0
+#endif
+
+#ifndef HAVE_sync_compare_and_swapdi
+#define HAVE_sync_compare_and_swapdi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapdi
+#define HAVE_atomic_compare_and_swapdi 0
+#endif
+
 /* Profile all functions in the callgraph.  */
 
 static unsigned int
@@ -535,6 +549,27 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+{
+  bool can_support = false;
+  unsigned HOST_WIDE_INT gcov_type_size
+	= tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ()));
+  if (gcov_type_size == 4)
+	can_support
+	  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+  else if (gcov_type_size == 8)
+	can_support
+	  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+
+  if (!can_support)
+  {
+	warning (0, "target does not support atomic profile update, "
+		 "single mode is selected");
+	flag_profile_update = PROFILE_UPDATE_SINGLE;
+  }
+}
+
   /* This is a small-ipa pass that gets called only once, from
  cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d..887041f 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
 #include "libgcov.h"
+#include "auto-target.h"
 #if !defined(inhibit_libc)
 
+/* Detect whether target can support atomic update of profilers.  */
+#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#define GCOV_SUPPORTS_ATOMIC 0
+#endif
+#endif
+
 #ifdef L_gcov_interval_profiler
 /* If VALUE is in interval , then increases the
corresponding counter in COUNTERS.  If the VALUE is above or below
@@ -46,7 +58,7 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
-#ifdef L_gcov_interval_profiler_atomic
+#if defined(L_gcov_interval_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is in interval , then increases the
corresponding counter in COUNTERS.  If the VALUE is above or below
the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
@@ -80,7 +92,7 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gc

Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 03:13:09PM +0200, Martin Liška wrote:
> @@ -535,6 +549,27 @@ tree_profiling (void)
>  {
>struct cgraph_node *node;
>  
> +  /* Verify whether we can utilize atomic update operations.  */
> +  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
> +{
> +  bool can_support = false;
> +  if (sizeof (gcov_type) == 4)
> + can_support
> +   = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
> +  else if (sizeof (gcov_type) == 8)
> + can_support
> +   = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
> +  else
> + gcc_unreachable ();

sizeof (gcov_type) talks about the host gcov type, you want instead the
target gcov type.  So
TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
== 4 vs. 8).
As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
And I wouldn't add gcc_unreachable, just warn for weirdo arches always.

Jakub


[PATCH][AArch64] Improve legitimize_address

2016-09-06 Thread Wilco Dijkstra
Improve aarch64_legitimize_address - avoid splitting the offset if it is
supported.  When we do split, take the mode size into account.  BLKmode
falls into the unaligned case but should be treated like LDP/STP.
This improves codesize slightly due to fewer base address calculations:

int f(int *p) { return p[5000] + p[7000]; }

Now generates:

f:
add x0, x0, 16384
ldr w1, [x0, 3616]
ldr w0, [x0, 11616]
add w0, w1, w0
ret

instead of:

f:
add x1, x0, 16384
add x0, x0, 24576
ldr w1, [x1, 3616]
ldr w0, [x0, 3424]
add w0, w1, w0
ret

OK for trunk?

ChangeLog:
2016-09-06  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_legitimize_address):
Avoid use of base_offset if offset already in range.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
27bbdbad8cddc576f9ed4fd0670116bd6d318412..119ff0aecb0c9f88899fa141b2c7f9158281f9c3
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5058,9 +5058,19 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
   /* For offsets aren't a multiple of the access size, the limit is
 -256...255.  */
   else if (offset & (GET_MODE_SIZE (mode) - 1))
-   base_offset = (offset + 0x100) & ~0x1ff;
+   {
+ base_offset = (offset + 0x100) & ~0x1ff;
+
+ /* BLKmode typically uses LDP of X-registers.  */
+ if (mode == BLKmode)
+   base_offset = (offset + 512) & ~0x3ff;
+   }
+  /* Small negative offsets are supported.  */
+  else if (IN_RANGE (offset, -256, 0))
+   base_offset = 0;
+  /* Use 12-bit offset by access size.  */
   else
-   base_offset = offset & ~0xfff;
+   base_offset = offset & (~0xfff * GET_MODE_SIZE (mode));
 
   if (base_offset != 0)
{



Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Martin Liška
On 09/06/2016 02:51 PM, Jakub Jelinek wrote:
> On Tue, Sep 06, 2016 at 02:45:32PM +0200, Martin Liška wrote:
>> --- a/gcc/tree-profile.c
>> +++ b/gcc/tree-profile.c
>> @@ -528,6 +528,13 @@ gimple_gen_ior_profiler (histogram_value value, 
>> unsigned tag, unsigned base)
>>gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>>  }
>>  
>> +#ifndef HAVE_sync_compare_and_swapsi
>> +#define HAVE_sync_compare_and_swapsi 0
>> +#endif
>> +#ifndef HAVE_atomic_compare_and_swapsi
>> +#define HAVE_atomic_compare_and_swapsi 0
>> +#endif
>> +
>>  /* Profile all functions in the callgraph.  */
>>  
>>  static unsigned int
>> @@ -535,6 +542,16 @@ tree_profiling (void)
>>  {
>>struct cgraph_node *node;
>>  
>> +  /* Verify whether we can utilize atomic update operations.  */
>> +  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
>> +  && !HAVE_sync_compare_and_swapsi
>> +  && !HAVE_atomic_compare_and_swapsi)
> 
> This isn't in sync with:
> 
>> +/* Detect whether target can support atomic update of profilers.  */
>> +#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
>> +#define GCOV_SUPPORTS_ATOMIC 1
>> +#else
>> +#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
>> +#define GCOV_SUPPORTS_ATOMIC 1
>> +#else
>> +#define GCOV_SUPPORTS_ATOMIC 0
>> +#endif
>> +#endif
> 
> this.  Either you implement the poor man's 64-bit atomics with 32-bit cas
> and adjust the latter, or the former needs to look at the target's gcov type
> (long long always?) and depending on its size either test the HAVE_*si or
> HAVE_*di macros.
> 
>   Jakub
> 

Ok, thanks, this should be the proper patch, where I distinguish 
sizeof(gcov_type) and
use appropriate GAVE_*{s,d}i macros.

Ready for trunk?
Thanks,
Martin
>From 41bef1e975042071c973c3cb733a0e0d9a59fec6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 6 Sep 2016 14:35:52 +0200
Subject: [PATCH] [PATCH] Detect whether target can use -fprofile-update=atomic

libgcc/ChangeLog:

2016-09-06  Martin Liska  

	* libgcov-profiler.c: Use __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{4,8} to
	conditionaly enable/disable *_atomic functions.

gcc/ChangeLog:

2016-09-06  Martin Liska  

	* tree-profile.c (tree_profiling): Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-09-06  Martin Liska  

	* gcc.dg/profile-update-warning.c: New test.
---
 gcc/testsuite/gcc.dg/profile-update-warning.c |  7 ++
 gcc/tree-profile.c| 35 +++
 libgcc/libgcov-profiler.c | 24 ++
 3 files changed, 61 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..a3e6dca 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -528,6 +528,20 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
+#ifndef HAVE_sync_compare_and_swapsi
+#define HAVE_sync_compare_and_swapsi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapsi
+#define HAVE_atomic_compare_and_swapsi 0
+#endif
+
+#ifndef HAVE_sync_compare_and_swapdi
+#define HAVE_sync_compare_and_swapdi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapdi
+#define HAVE_atomic_compare_and_swapdi 0
+#endif
+
 /* Profile all functions in the callgraph.  */
 
 static unsigned int
@@ -535,6 +549,27 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+{
+  bool can_support = false;
+  if (sizeof (gcov_type) == 4)
+	can_support
+	  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+  else if (sizeof (gcov_type) == 8)
+	can_support
+	  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  else
+	gcc_unreachable ();
+
+  if (!can_support)
+  {
+	warning (0, "target does not support atomic profile update, "
+		 "single mode is selected");
+	flag_profile_update = PROFILE_UPDATE_SINGLE;
+  }
+}
+
   /* This is a small-ipa pass that gets called only once, from
  cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d..887041f 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME res

Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 02:45:32PM +0200, Martin Liška wrote:
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -528,6 +528,13 @@ gimple_gen_ior_profiler (histogram_value value, unsigned 
> tag, unsigned base)
>gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>  }
>  
> +#ifndef HAVE_sync_compare_and_swapsi
> +#define HAVE_sync_compare_and_swapsi 0
> +#endif
> +#ifndef HAVE_atomic_compare_and_swapsi
> +#define HAVE_atomic_compare_and_swapsi 0
> +#endif
> +
>  /* Profile all functions in the callgraph.  */
>  
>  static unsigned int
> @@ -535,6 +542,16 @@ tree_profiling (void)
>  {
>struct cgraph_node *node;
>  
> +  /* Verify whether we can utilize atomic update operations.  */
> +  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
> +  && !HAVE_sync_compare_and_swapsi
> +  && !HAVE_atomic_compare_and_swapsi)

This isn't in sync with:

> +/* Detect whether target can support atomic update of profilers.  */
> +#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#define GCOV_SUPPORTS_ATOMIC 0
> +#endif
> +#endif

this.  Either you implement the poor man's 64-bit atomics with 32-bit cas
and adjust the latter, or the former needs to look at the target's gcov type
(long long always?) and depending on its size either test the HAVE_*si or
HAVE_*di macros.

Jakub


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Martin Liška
On 09/06/2016 02:38 PM, David Edelsohn wrote:
> On Tue, Sep 6, 2016 at 8:14 AM, Nathan Sidwell  wrote:
>> On 09/06/16 06:57, David Edelsohn wrote:
>>
>>> What about Jakub's comment in the PR?
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
>>
>>
>> This needs addressing.  Can you clarify PPC behaviour, because I may have
>> misunderstood:
>>
>> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
>> in 32 bit mode.
>>
>> 2) PPC currently has 32 bit counters anyway.
> 
> The rs6000 port ABIs implement 64 bit "long long" type.  The current
> code uses LONG_LONG_TYPE_SIZE for the counters.  I assume that most
> ports don't support native 64 bit atomic operations in 32 bit ABI --
> PowerPC does not.
> 
> The previous code allowed gcov type to be overridden, but I don't
> think that it was 32 bit on most targets.
> 
>>
>> I had interpreted the comment to be implying #2, but now I'm not so sure.
>>
>>> The proposed patch seems wrong or at least incomplete.  The recent
>>> change is imposing a 64 bit DImode counter when producing 32 bit code.
>>> PowerPC does support 64 bit atomic operations in 32 bit mode.
>>
>>
>> I'm presuming you've missed a 'NOT' in that sentence.
> 
> Yes, I omitted a "NOT".
> 
> PowerPC64 has 64 bit atomics, but PowerPC32 subset only provides 32
> bit atomics in the ISA.
> 
> If the counters always should be 64 bit, then a poor-man's 64 bit
> atomic operation proposed by Jakub seems like a better solution.
> 
> Thanks, David

Hi David.

I sent the previous email before I read the Jakub's comment.
I'm attaching simplified patch (based of the comment), which works for i386
target. I'm testing that on on m68k target.

I prefer to have a single type for one config, same what Nathan suggested.
I like the idea of poor-man's atomics, I can make an incremental patch.

Martin

> 
>>
>>> Was there a design decision that profile counters always should be 64
>>> bits?  Either 32 bit targets won't support multi-threaded profiling or
>>> 32 bit targets can overflow the counter sooner.
>>
>>
>>
>>>  Which is worse?
>>> Which is more likely?
>>
>>
>> My initial thought is that it is probably awkward to support 2 different
>> sized counter types in the 'same' config.  I.e. 64-bit single-threaded
>> counters and 32-bit threaded counters.
>>
>> nathan

>From 44289abf2e3ecfb7e17c6f204b280af06bf20b0e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 6 Sep 2016 14:35:52 +0200
Subject: [PATCH] [PATCH] Detect whether target can use -fprofile-update=atomic

libgcc/ChangeLog:

2016-09-06  Martin Liska  

	* libgcov-profiler.c: Use __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{4,8} to
	conditionaly enable/disable *_atomic functions.

gcc/ChangeLog:

2016-09-06  Martin Liska  

	* tree-profile.c (tree_profiling): Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-09-06  Martin Liska  

	* gcc.dg/profile-update-warning.c: New test.
---
 gcc/testsuite/gcc.dg/profile-update-warning.c |  7 +++
 gcc/tree-profile.c| 17 +
 libgcc/libgcov-profiler.c | 24 +++-
 3 files changed, 43 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..8ce35be 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -528,6 +528,13 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
+#ifndef HAVE_sync_compare_and_swapsi
+#define HAVE_sync_compare_and_swapsi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapsi
+#define HAVE_atomic_compare_and_swapsi 0
+#endif
+
 /* Profile all functions in the callgraph.  */
 
 static unsigned int
@@ -535,6 +542,16 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
+  && !HAVE_sync_compare_and_swapsi
+  && !HAVE_atomic_compare_and_swapsi)
+{
+  warning (0, "target does not support atomic profile update, "
+	   "single mode is selected");
+  flag_profile_update = PROFILE_UPDATE_SINGLE;
+}
+
   /* This is a small-ipa pass that gets called only once, from
  cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d.

[PATCH] Fix PR77479

2016-09-06 Thread Richard Biener

The following fixes PR77479.

Boosttrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-09-06  Richard Biener  

PR tree-optimization/77479
* tree-vrp.c (update_value_range): Extend overflow handling to
VARYING.

* gcc.dg/torture/pr77479.c: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 240004)
+++ gcc/tree-vrp.c  (working copy)
@@ -744,23 +745,29 @@ update_value_range (const_tree var, valu
   value_range_type rtype = get_range_info (var, &min, &max);
   if (rtype == VR_RANGE || rtype == VR_ANTI_RANGE)
{
- value_range nr;
- nr.type = rtype;
+ tree nr_min, nr_max;
  /* Range info on SSA names doesn't carry overflow information
 so make sure to preserve the overflow bit on the lattice.  */
- if (new_vr->type == VR_RANGE
- && is_negative_overflow_infinity (new_vr->min)
- && wi::eq_p (new_vr->min, min))
-   nr.min = new_vr->min;
- else
-   nr.min = wide_int_to_tree (TREE_TYPE (var), min);
- if (new_vr->type == VR_RANGE
- && is_positive_overflow_infinity (new_vr->max)
- && wi::eq_p (new_vr->max, max))
-   nr.max = new_vr->max;
- else
-   nr.max = wide_int_to_tree (TREE_TYPE (var), max);
- nr.equiv = NULL;
+ if (rtype == VR_RANGE
+ && needs_overflow_infinity (TREE_TYPE (var))
+ && (new_vr->type == VR_VARYING
+ || (new_vr->type == VR_RANGE
+ && is_negative_overflow_infinity (new_vr->min)))
+ && wi::eq_p (vrp_val_min (TREE_TYPE (var)), min))
+   nr_min = negative_overflow_infinity (TREE_TYPE (var));
+ else
+   nr_min = wide_int_to_tree (TREE_TYPE (var), min);
+ if (rtype == VR_RANGE
+ && needs_overflow_infinity (TREE_TYPE (var))
+ && (new_vr->type == VR_VARYING
+ || (new_vr->type == VR_RANGE
+ && is_positive_overflow_infinity (new_vr->max)))
+ && wi::eq_p (vrp_val_max (TREE_TYPE (var)), max))
+   nr_max = positive_overflow_infinity (TREE_TYPE (var));
+ else
+   nr_max = wide_int_to_tree (TREE_TYPE (var), max);
+ value_range nr = VR_INITIALIZER;
+ set_and_canonicalize_value_range (&nr, rtype, nr_min, nr_max, NULL);
  vrp_intersect_ranges (new_vr, &nr);
}
 }
Index: gcc/testsuite/gcc.dg/torture/pr77479.c
===
--- gcc/testsuite/gcc.dg/torture/pr77479.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr77479.c  (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fstrict-overflow -ftree-vrp" } */
+
+void
+vr (int of, unsigned char bw)
+{
+  int d1;
+  int lm = 0;
+
+  for (d1 = 0; d1 < 3; ++d1)
+{
+  const int vl = 2;
+
+  while (bw < vl)
+   {
+   }
+  if (bw != vl)
+   lm -= vl;
+}
+  while (++of < 1)
+{
+  lm /= bw;
+  of += lm;
+}
+}


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread David Edelsohn
On Tue, Sep 6, 2016 at 8:26 AM, Jakub Jelinek  wrote:
> On Tue, Sep 06, 2016 at 08:14:58AM -0400, Nathan Sidwell wrote:
>> On 09/06/16 06:57, David Edelsohn wrote:
>>
>> >What about Jakub's comment in the PR?
>> >
>> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
>>
>> This needs addressing.  Can you clarify PPC behaviour, because I may have
>> misunderstood:
>>
>> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
>> in 32 bit mode.
>>
>> 2) PPC currently has 32 bit counters anyway.
>>
>> I had interpreted the comment to be implying #2, but now I'm not so sure.
>
> Aren't the counters 64-bit everywhere?
> Even with 32-bit atomics, as the only operation needed is addition, can't it
> be implemented anyway?  Instead of:
>   __atomic_fetch_add_8 (&val, 1, __ATOMIC_RELAXED);
> one could for e.g. little endian:
>   if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0)
> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);

Is this correct for either endianness?

- David

> There is a small risk that e.g. SIGTERM happens in between the two and the
> counters are written into the file without the upper half being bumped, but
> compared to the non-atomic updates it is much less likely (and, if you
> actually never overflow the counters, it won't be an issue anyway).
>
> Jakub


[PATCH] Fix PR77450

2016-09-06 Thread Richard Biener

The following fixes PR77450.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-09-06  Richard Biener  

PR c/77450
c-family/
* c-common.c (c_common_mark_addressable_vec): Handle
COMPOUND_LITERAL_EXPR.

* c-c++-common/vector-subscript-7.c: Adjust.
* c-c++-common/vector-subscript-8.c: New testcase.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 240004)
+++ gcc/c-family/c-common.c (working copy)
@@ -10918,7 +10918,9 @@ c_common_mark_addressable_vec (tree t)
 {   
   while (handled_component_p (t))
 t = TREE_OPERAND (t, 0);
-  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
+  if (!VAR_P (t)
+  && TREE_CODE (t) != PARM_DECL
+  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
 return;
   TREE_ADDRESSABLE (t) = 1;
 }
Index: gcc/testsuite/c-c++-common/vector-subscript-7.c
===
--- gcc/testsuite/c-c++-common/vector-subscript-7.c (revision 240004)
+++ gcc/testsuite/c-c++-common/vector-subscript-7.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp1" } */
+/* { dg-options "-O -fdump-tree-fre1" } */
 
 typedef int v4si __attribute__ ((vector_size (16)));
 
@@ -11,4 +11,4 @@ main (int argc, char** argv)
   return ((v4si){1, 2, 42, 0})[j];
 }
 
-/* { dg-final { scan-tree-dump "return 42;" "ccp1" } } */
+/* { dg-final { scan-tree-dump "return 42;" "fre1" } } */
Index: gcc/testsuite/c-c++-common/vector-subscript-8.c
===
--- gcc/testsuite/c-c++-common/vector-subscript-8.c (revision 0)
+++ gcc/testsuite/c-c++-common/vector-subscript-8.c (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+typedef int V __attribute__((vector_size(4)));
+
+void
+foo(void)
+{
+  (V){ 0 }[0] = 0;
+}


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread David Edelsohn
On Tue, Sep 6, 2016 at 8:14 AM, Nathan Sidwell  wrote:
> On 09/06/16 06:57, David Edelsohn wrote:
>
>> What about Jakub's comment in the PR?
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
>
>
> This needs addressing.  Can you clarify PPC behaviour, because I may have
> misunderstood:
>
> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
> in 32 bit mode.
>
> 2) PPC currently has 32 bit counters anyway.

The rs6000 port ABIs implement 64 bit "long long" type.  The current
code uses LONG_LONG_TYPE_SIZE for the counters.  I assume that most
ports don't support native 64 bit atomic operations in 32 bit ABI --
PowerPC does not.

The previous code allowed gcov type to be overridden, but I don't
think that it was 32 bit on most targets.

>
> I had interpreted the comment to be implying #2, but now I'm not so sure.
>
>> The proposed patch seems wrong or at least incomplete.  The recent
>> change is imposing a 64 bit DImode counter when producing 32 bit code.
>> PowerPC does support 64 bit atomic operations in 32 bit mode.
>
>
> I'm presuming you've missed a 'NOT' in that sentence.

Yes, I omitted a "NOT".

PowerPC64 has 64 bit atomics, but PowerPC32 subset only provides 32
bit atomics in the ISA.

If the counters always should be 64 bit, then a poor-man's 64 bit
atomic operation proposed by Jakub seems like a better solution.

Thanks, David

>
>> Was there a design decision that profile counters always should be 64
>> bits?  Either 32 bit targets won't support multi-threaded profiling or
>> 32 bit targets can overflow the counter sooner.
>
>
>
>>  Which is worse?
>> Which is more likely?
>
>
> My initial thought is that it is probably awkward to support 2 different
> sized counter types in the 'same' config.  I.e. 64-bit single-threaded
> counters and 32-bit threaded counters.
>
> nathan


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 08:14:58AM -0400, Nathan Sidwell wrote:
> On 09/06/16 06:57, David Edelsohn wrote:
> 
> >What about Jakub's comment in the PR?
> >
> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
> 
> This needs addressing.  Can you clarify PPC behaviour, because I may have
> misunderstood:
> 
> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
> in 32 bit mode.
> 
> 2) PPC currently has 32 bit counters anyway.
> 
> I had interpreted the comment to be implying #2, but now I'm not so sure.

Aren't the counters 64-bit everywhere?
Even with 32-bit atomics, as the only operation needed is addition, can't it
be implemented anyway?  Instead of:
  __atomic_fetch_add_8 (&val, 1, __ATOMIC_RELAXED);
one could for e.g. little endian:
  if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0)
__atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);
There is a small risk that e.g. SIGTERM happens in between the two and the
counters are written into the file without the upper half being bumped, but
compared to the non-atomic updates it is much less likely (and, if you
actually never overflow the counters, it won't be an issue anyway).

Jakub


Re: [PATCH][RFC] Early LTO debug

2016-09-06 Thread Richard Biener
On Tue, 6 Sep 2016, Markus Trippelsdorf wrote:

> On 2016.09.06 at 13:17 +0200, Richard Biener wrote:
> > 
> > The following is an updated patch, mainly stripped out old unnecessary
> > cruft and some fixes for an issue that arised when LTOing Firefox.
> 
> One minor issue that I've noticed is that the patch generates a lot of
> empty *debugobj* files in $TMPDIR, e.g.:
> 
>  % echo 'int main(){}' | g++ -flto -o /dev/null-x c++ -
>  % ls -l $TMPDIR
> total 0
> -rw---. 1 trippels trippels 0 Sep  6 12:11 ccenD5Tcdebugobj
> -rw---. 1 trippels trippels 0 Sep  6 12:11 ccXzvE4udebugobjtem

Ah, make_temp_file actually _creates_ the files already...

Fixed with the patch below.

> The new patch builds LLVM fine with "-flto -g".

Great.

Richard.

Index: early-lto-debug/gcc/lto-wrapper.c
===
--- early-lto-debug.orig/gcc/lto-wrapper.c  2016-09-06 14:26:29.233142490 
+0200
+++ early-lto-debug/gcc/lto-wrapper.c   2016-09-06 14:25:44.316618249 +0200
@@ -943,9 +943,10 @@ find_and_merge_options (int fd, off_t fi
   return true;
 }
 
-int
-debug_objcopy (const char *infile, const char *outfile)
+const char *
+debug_objcopy (const char *infile)
 {
+  const char *outfile;
   const char *errmsg;
   int err;
 
@@ -966,12 +967,12 @@ debug_objcopy (const char *infile, const
 }
   int infd = open (infile, O_RDONLY);
   if (infd == -1)
-return 1;
+return NULL;
   simple_object_read *inobj = simple_object_start_read (infd, inoff,
"__GNU_LTO",
&errmsg, &err);
   if (!inobj)
-return 1;
+return NULL;
 
   off_t off, len;
   if (simple_object_find_section (inobj, ".gnu.debuglto_.debug_info",
@@ -982,17 +983,21 @@ debug_objcopy (const char *infile, const
 
   simple_object_release_read (inobj);
   close (infd);
-  return 1;
+  return NULL;
 }
 
+  outfile = make_temp_file ("debugobjtem");
   errmsg = simple_object_copy_lto_debug_sections (inobj, outfile, &err);
   if (errmsg)
-fatal_error (0, "%s: %s\n", errmsg, xstrerror (err));
+{
+  unlink_if_ordinary (outfile);
+  fatal_error (0, "%s: %s\n", errmsg, xstrerror (err));
+}
 
   simple_object_release_read (inobj);
   close (infd);
 
-  return 0;
+  return outfile;
 }
 
 
@@ -1401,8 +1406,8 @@ cont1:
   if (! skip_debug)
 for (i = 0; i < ltoobj_argc; ++i)
   {
-   char *tem = make_temp_file ("debugobjtem");
-   if (!debug_objcopy (ltoobj_argv[i], tem))
+   const char *tem;
+   if ((tem = debug_objcopy (ltoobj_argv[i])))
  {
obstack_ptr_grow (&argv_obstack, tem);
n_debugobj++;


Re: [PATCH][RFC] Early LTO debug

2016-09-06 Thread Markus Trippelsdorf
On 2016.09.06 at 13:17 +0200, Richard Biener wrote:
> 
> The following is an updated patch, mainly stripped out old unnecessary
> cruft and some fixes for an issue that arised when LTOing Firefox.

One minor issue that I've noticed is that the patch generates a lot of
empty *debugobj* files in $TMPDIR, e.g.:

 % echo 'int main(){}' | g++ -flto -o /dev/null-x c++ -
 % ls -l $TMPDIR
total 0
-rw---. 1 trippels trippels 0 Sep  6 12:11 ccenD5Tcdebugobj
-rw---. 1 trippels trippels 0 Sep  6 12:11 ccXzvE4udebugobjtem

The new patch builds LLVM fine with "-flto -g".

-- 
Markus


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Nathan Sidwell

On 09/06/16 06:57, David Edelsohn wrote:


What about Jakub's comment in the PR?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6


This needs addressing.  Can you clarify PPC behaviour, because I may have 
misunderstood:


1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops in 
32 bit mode.


2) PPC currently has 32 bit counters anyway.

I had interpreted the comment to be implying #2, but now I'm not so sure.


The proposed patch seems wrong or at least incomplete.  The recent
change is imposing a 64 bit DImode counter when producing 32 bit code.
PowerPC does support 64 bit atomic operations in 32 bit mode.


I'm presuming you've missed a 'NOT' in that sentence.


Was there a design decision that profile counters always should be 64
bits?  Either 32 bit targets won't support multi-threaded profiling or
32 bit targets can overflow the counter sooner.




 Which is worse?
Which is more likely?


My initial thought is that it is probably awkward to support 2 different sized 
counter types in the 'same' config.  I.e. 64-bit single-threaded counters and 
32-bit threaded counters.


nathan


Re: [patch, libgomp, OpenACC] Additional enter/exit data map handling

2016-09-06 Thread Thomas Schwinge
Hi!

On Mon, 29 Aug 2016 15:46:47 +0800, Chung-Lin Tang  
wrote:
> this patch is a port of some changes from gomp-4_0-branch,
> including adding additional map type handling in OpenACC enter/exit data
> directives, and some pointer set handling changes. Updated
> testsuite case are also included.
> 
> Tested on trunk to ensure no regressions, is this okay for trunk?

> 2016-08-29  Cesar Philippidis  
> Thomas Schwinge  
> Chung-Lin Tang  

Maybe I'm misremembering, but I can't remember having been involved in
this.  ;-)

> libgomp/
> * oacc-parallel.c (find_pset): Adjust and rename from...
> (find_pointer): ...this function.
> (GOACC_enter_exit_data): Handle GOMP_MAP_TO and GOMP_MAP_ALLOC,
> adjust find_pointer calls into find_pset, adjust pointer map handling,
> add acc_is_present guards to calls to gomp_acc_insert_pointer and
> gomp_acc_remove_pointer.

> --- oacc-parallel.c   (revision 239814)
> +++ oacc-parallel.c   (working copy)
> @@ -38,15 +38,23 @@
>  #include 
>  #include 
>  
> +/* Returns the number of mappings associated with the pointer or pset. PSET
> +   have three mappings, whereas pointer have two.  */
> +
>  static int
> -find_pset (int pos, size_t mapnum, unsigned short *kinds)
> +find_pointer (int pos, size_t mapnum, unsigned short *kinds)
>  {
>if (pos + 1 >= mapnum)
>  return 0;
>  
>unsigned char kind = kinds[pos+1] & 0xff;
>  
> -  return kind == GOMP_MAP_TO_PSET;
> +  if (kind == GOMP_MAP_TO_PSET)
> +return 3;
> +  else if (kind == GOMP_MAP_POINTER)
> +return 2;
> +
> +  return 0;
>  }

I'm still confused about that find_pset/find_pointer handling.  Why is
that required?  Essentially, that means that GOACC_enter_exit_data is
skipping over some mappings, right?  If yes, why do the front ends
(Fortran only?) then emit these mappings to begin with, if we're then
ignoring them in the runtime?

> @@ -298,7 +306,9 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>  
>if (kind == GOMP_MAP_FORCE_ALLOC
> || kind == GOMP_MAP_FORCE_PRESENT
> -   || kind == GOMP_MAP_FORCE_TO)
> +   || kind == GOMP_MAP_FORCE_TO
> +   || kind == GOMP_MAP_TO
> +   || kind == GOMP_MAP_ALLOC)
>   {
> data_enter = true;
> break;
> @@ -312,31 +322,39 @@ GOACC_enter_exit_data (int device, size_t mapnum,
> kind);
>  }
>  
> +  /* In c, non-pointers and arrays are represented by a single data clause.
> + Dynamically allocated arrays and subarrays are represented by a data
> + clause followed by an internal GOMP_MAP_POINTER.
> +
> + In fortran, scalars and not allocated arrays are represented by a
> + single data clause. Allocated arrays and subarrays have three mappings:
> + 1) the original data clause, 2) a PSET 3) a pointer to the array data.
> +  */
> +
>if (data_enter)
>  {
>for (i = 0; i < mapnum; i++)
>   {
> unsigned char kind = kinds[i] & 0xff;
>  
> -   /* Scan for PSETs.  */
> -   int psets = find_pset (i, mapnum, kinds);
> +   /* Scan for pointers and PSETs.  */
> +   int pointer = find_pointer (i, mapnum, kinds);
>  
> -   if (!psets)
> +   if (!pointer)
>   {
> switch (kind)
>   {
> - case GOMP_MAP_POINTER:
> -   gomp_acc_insert_pointer (1, &hostaddrs[i], &sizes[i],
> - &kinds[i]);
> + case GOMP_MAP_ALLOC:
> +   acc_present_or_create (hostaddrs[i], sizes[i]);
> break;
>   case GOMP_MAP_FORCE_ALLOC:
> acc_create (hostaddrs[i], sizes[i]);
> break;
> - case GOMP_MAP_FORCE_PRESENT:
> + case GOMP_MAP_TO:
> acc_present_or_copyin (hostaddrs[i], sizes[i]);
> break;
>   case GOMP_MAP_FORCE_TO:
> -   acc_present_or_copyin (hostaddrs[i], sizes[i]);
> +   acc_copyin (hostaddrs[i], sizes[i]);
> break;
>   default:
> gomp_fatal (" GOACC_enter_exit_data UNHANDLED kind 
> 0x%.2x",
> @@ -346,12 +364,16 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   }
> else
>   {
> -   gomp_acc_insert_pointer (3, &hostaddrs[i], &sizes[i], &kinds[i]);
> +   if (!acc_is_present (hostaddrs[i], sizes[i]))
> + {
> +   gomp_acc_insert_pointer (pointer, &hostaddrs[i],
> +&sizes[i], &kinds[i]);
> + }
> /* Increment 'i' by two because OpenACC requires fortran
>arrays to be contiguous, so each PSET is associated with
>one of MAP_FORCE_ALLOC/MAP_FORCE_PRESET/MAP_FORCE_TO, and
>one MAP_POINTER.  */
> -   i += 2;
> +   i += pointer - 1;
>   }
>   }
>  }
> @@ -360,19 +382,15 @@ GOACC

Re: [DOC] Boolean -> boolean changes

2016-09-06 Thread Bernd Schmidt

On 09/06/2016 11:19 AM, Marek Polacek wrote:

As discussed ,
this patch makes the Boolean -> boolean change in our doc/ directory.

Ok for trunk?


Yes, thanks.


Bernd


[Patch, testsuite, avr] Skip gcc.dg/Wno-frame-address.c for avr

2016-09-06 Thread Senthil Kumar Selvaraj
Hi,

  The below patch adds avr to the list of targets excluded for
  Wno-frame-address.c. Like the other excluded targets, the avr backend
  only supports the builtin for the current stack frame.

  Committed to trunk.

Regards
Senthil

gcc/testsuite/ChangeLog:

2016-09-06  Senthil Kumar Selvaraj  

* gcc.dg/Wno-frame-address.c: Skip for avr-*-*.


Index: gcc/testsuite/gcc.dg/Wno-frame-address.c
===
--- gcc/testsuite/gcc.dg/Wno-frame-address.c(revision 240004)
+++ gcc/testsuite/gcc.dg/Wno-frame-address.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-skip-if "Cannot access arbitrary stack frames" { arm*-*-* hppa*-*-* 
ia64-*-* visium-*-* } } */
+/* { dg-skip-if "Cannot access arbitrary stack frames" { arm*-*-* avr-*-* 
hppa*-*-* ia64-*-* visium-*-* } } */
 /* { dg-options "-Werror" } */
 /* { dg-additional-options "-mbackchain" { target { s390*-*-* } } } */
 


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Bernd Schmidt

On 09/06/2016 11:11 AM, Florian Weimer wrote:


I think this change is only safe because nothing relies on it.
max_align_t is a committee invention with no actual users.


I tried to verify that and ran grep over all the packages in 
/usr/portage/distfiles. That did get a number of matches, fairly obvious 
and expected ones like gcc and glibc, and several more which match only 
because gnulib seems to try to provide a default, but I also found a few 
real uses:


grep-2.25/lib/fts.c:
/* Align the allocation size so that it works for FTSENT,
   so that trailing padding may be referenced by direct access
   to the flexible array members, without triggering undefined 
behavior
   by accessing bytes beyond the heap allocation.  This 
implicit access

   was seen for example with ISDOT() and GCC 5.1.1 at -O2.
   Do not use alignof (FTSENT) here, since C11 prohibits
   taking the alignment of a structure containing a flexible
   array member.  */
len += alignof (max_align_t) - 1;
len &= ~ (alignof (max_align_t) - 1);


libvpx-v1.3.0/nestegg/halloc/src/halloc.c:
/*
 *  block control header
 */
typedef struct hblock
{
#ifndef NDEBUG
#define HH_MAGIC0x20040518L
long  magic;
#endif
hlist_item_t  siblings; /* 2 pointers */
hlist_head_t  children; /* 1 pointer  */
max_align_t   data[1];  /* not allocated, see below */

} hblock_t;


Bernd


Re: [patch, libgomp, OpenACC] Additional enter/exit data map handling

2016-09-06 Thread Chung-Lin Tang
Ping.

On 2016/8/29 03:46 PM, Chung-Lin Tang wrote:
> Hi Jakub,
> this patch is a port of some changes from gomp-4_0-branch,
> including adding additional map type handling in OpenACC enter/exit data
> directives, and some pointer set handling changes. Updated
> testsuite case are also included.
> 
> Tested on trunk to ensure no regressions, is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2016-08-29  Cesar Philippidis  
> Thomas Schwinge  
> Chung-Lin Tang  
> 
> libgomp/
> * oacc-parallel.c (find_pset): Adjust and rename from...
> (find_pointer): ...this function.
> (GOACC_enter_exit_data): Handle GOMP_MAP_TO and GOMP_MAP_ALLOC,
> adjust find_pointer calls into find_pset, adjust pointer map handling,
> add acc_is_present guards to calls to gomp_acc_insert_pointer and
> gomp_acc_remove_pointer.
> 
> * testsuite/libgomp.oacc-c-c++-common/data-2.c: Update test.
> * testsuite/libgomp.oacc-c-c++-common/enter-data.c: New test.
> * testsuite/libgomp.oacc-fortran/data-2.f90: Update test.
> 



Re: [PATCH][AArch64] Improve stack adjustment

2016-09-06 Thread Wilco Dijkstra
 
ping
    
Richard Earnshaw wrote:
> I see you've added a default argument for your new parameter.  I think
> doing that is fine, but I have two comments about how we might use that
> in this case.
> 
> Firstly, if this parameter is suitable for having a default value, then
> I think the preceding one should also be treated in the same way.
> 
> Secondly, I think (due to these parameters being BOOL with no useful
> context to make it clear which is which) that having wrapper functions
> (inlined, of course) that describe the intended behaviour more clearly
> would be useful.  So, defining
> 
> static inline void
> aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> {
>    aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);
> }
> 
> would make it clearer at the call point than having a lot of true and
> false parameters scattered round the code.
> 
> Alternatively we might remove all the default parameters and require
> wrappers like the above to make it more explicit which API is intended -
> this might make more sense if not all combinations make sense.

OK here is the new version using simple wrapper functions and no
default parameters:

ChangeLog:
2016-08-10  Wilco Dijkstra  

gcc/
    * config/aarch64/aarch64.c (aarch64_add_constant_internal):
    Add extra argument to allow emitting the move immediate.
    Use add/sub with positive immediate.
    (aarch64_add_constant): Add inline function.
    (aarch64_add_sp): Likewise.
    (aarch64_sub_sp): Likewise.
    (aarch64_expand_prologue): Call aarch64_sub_sp.
    (aarch64_expand_epilogue): Call aarch64_add_sp.
    Decide when to leave out move.
    (aarch64_output_mi_thunk): Call aarch64_add_constant.

testsuite/
    * gcc.target/aarch64/test_frame_17.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b8536175a84b76f8c2939e61f1379ae279b20d43..5a25fce17785af9f9dc12e0f2a9609af09af0b35
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1941,15 +1941,21 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to held
-   intermediate value if necessary.
+   intermediate value if necessary.  FRAME_RELATED_P should be true if
+   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+   to the generated instructions.  If SCRATCHREG is known to hold
+   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+   immediate again.
 
-   This function is sometimes used to adjust the stack pointer, so we must
-   ensure that it can never cause transient stack deallocation by writing an
-   invalid value into REGNUM.  */
+   Since this function may be used to adjust the stack pointer, we must
+   ensure that it cannot cause transient stack deallocation (for example
+   by first incrementing SP and then decrementing when adjusting by a
+   large immediate).  */
 
 static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_add_constant_internal (machine_mode mode, int regnum, int scratchreg,
+  HOST_WIDE_INT delta, bool frame_related_p,
+  bool emit_move_imm)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
@@ -1967,11 +1973,11 @@ aarch64_add_constant (machine_mode mode, int regnum, 
int scratchreg,
   return;
 }
 
-  /* We need two add/sub instructions, each one performing part of the
- calculation.  Don't do this if the addend can be loaded into register with
- a single instruction, in that case we prefer a move to a scratch register
- following by an addition.  */
-  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
+  /* We need two add/sub instructions, each one perform part of the
+ addition/subtraction, but don't this if the addend can be loaded into
+ register by single instruction, in that case we prefer a move to scratch
+ register following by addition.  */
+  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))
 {
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
@@ -1985,8 +1991,10 @@ aarch64_add_constant (machine_mode mode, int regnum, int 
scratchreg,
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
-  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (emit_move_imm)
+    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
+  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
+ : gen_add2_insn (this_rtx, scratch_rtx));
   if (frame_related_p)
 {
   RTX_FRAME_RELATED_P (insn) = frame_related_p;
@@ -1995,6 +2003,27 @@ aarch64_add_constant (

Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Florian Weimer wrote:

> Why aren't there any users?  The standard isn't very clear what the value of
> _Alignof (max_align_t) actually means.  Does the phrase “all contexts” include
> pointers returned malloc?  Even if the requested size is smaller than the
> fundamental alignment?  Did those who wrote the standard expect there to be
> *any* relationship between malloc and max_align_t?

See my cleanup of the wording in DR#445 
, which 
is intended to reflect the intent and stay compatible with C99.  malloc 
should be usable to allocate memory for any type from the standard library 
headers, including max_align_t.

> The existing situation is that most mallocs to do not provide _Alignof
> (max_align_t) alignment unconditionally.  But they can provide arbitrarily
> large alignment with aligned_alloc/memalign-style interfaces.

Well, that's a conformance bug in the implementation as a whole.  The 
nonconforming modes in question are still useful and it's useful for GCC 
to support such mallocs.

> has to remain interposable).  If there is no relationship to malloc, GCC
> essentially supports unbounded alignment on many targets.  How is it possible
> to have a meaningful definition of max_align_t under such circumstances?

max_align_t only covers fundamental alignments, not necessarily all 
alignments that are supported.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Richard Biener wrote:

> > On 32-bit x86, max_align_t is currently 8-byte aligned, but
> > _Decimal128 and _Float128 are 16-byte aligned, so the alignment of
> > max_align_t needs to increase to meet the standard requirements for
> > these types.
> 
> As doubles on 32bit x86 are 4 byte aligned I suppose the choice could have
> been to make those types 8 byte aligned to avoid changes of max_align_t?
> (with the obvious eventual performance penalty of needing to do unaligned
> loads/stores to xmm registers).

However, the choice made some years ago (and documented in HJ's ABI 
document) was 16-byte alignment.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][RFC] Early LTO debug

2016-09-06 Thread Richard Biener
On Fri, 26 Aug 2016, Richard Biener wrote:

> 
> The patch below is my current development state of Early LTO debug lumped
> into one big patch (no changelog, sorry).  It contains previously posted
> 
>   https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01089.html
> 
> which extends dwarf2asm.h and adjusts a target hook which means the
> patch carries the usual tm.texi burden.
> 
> From previous iterations the patch has removed most of the dwarf2out
> refactoring to make it easier to review.  It does get away with
> dwarf2out_abstract_function (we have that by means of early debug
> for all functions) and it moves quite a bunch of stuff from late
> dwarf processing to early.  For actual submission I will try to
> factor out parts of the latter at least.
> 
> 
> So the main idea is to emit the DWARF tree we have at 
> dwarf2out_early_finish into "Early LTO DWARF" sections (using
> .gnu.debuglto_ section prefixes, not .gnu.lto_ because GNU as
> spits out warnings about bogus section flags used for 
> .gnu.lto_.debug_str).  If we produce fat LTO objects the full
> DWARF info will be emit alongside regular assembly again at
> dwarf2out_finish time.  When streaming out LTO bytecode we
> record for each interesting tree (decls at the moment) a reference
> to the "early DIE" generated via a $symbol + offset pair.  There
> is a single symbol emitted for each early debug-info section.
> 
> At WPA time those references are simply carried through.
> 
> At LTRANS time those references are read back in and dwarf2out
> is told about the existence of the early DIE for tree X and
> it creates a shadow DIE for that, simply refering to the early
> DIE via a symbolic reference.  At dwarf2out_finish time we then
> annotate the early DIE by means of the shadow DIE which has
> a DW_AT_abstract_origin refering to the early DIE.  This mostly
> adds location information to DIEs.
> 
> 
> On the driver side things get a little complicated as we need to
> funnel through the "Early LTO DWARF" sections to the final link,
> presenting them as regular debug info sections.  To make a smooth
> transition possible I choose to write a Early LTO DWARF copier
> in the simple-object framework which basically takes all the
> early LTO DWARF sections and writes them into a new object file,
> renaming them on-the-fly.  I implemented this for ELF only sofar
> and the API needs to be changed to reflect the fact that this is
> really only suited for use as LTO early DWARF section copier
> (tried to make it a little more generic but that didn't work
> out very well).  So for each LTO input file you get a LTO early
> debug object which are then partially linked to a single LTO
> early debug object (we could simply omit that step, it doesn't
> save very much due to the fact that GNU ld with partial linking
> doesn't optimize mergeable string sections...).  This object
> is then fed back as LTO link output to the linker.
> 
> Ideally the linker plugin API would be extended to allow claiming
> only a subset of sections and allow telling the linker to rename
> sections that we didn't claim.  That would avoid some I/O and
> disk-space usage.
> 
> 
> The main motivation for this is to increase the quality of
> debug information you get when using LTO and the most promising
> part of testing is
> 
> === libstdc++ tests ===
>  
>  Running target unix/-flto/-g/
>  FAIL: 23_containers/deque/cons/clear_allocator.cc execution test
> -FAIL: libstdc++-prettyprinters/48362.cc print t1
> -FAIL: libstdc++-prettyprinters/48362.cc print t2
> -FAIL: libstdc++-prettyprinters/cxx11.cc print efl
> ...
> -FAIL: libstdc++-prettyprinters/whatis.cc whatis unord2_holder
> -FAIL: libstdc++-prettyprinters/whatis.cc whatis ustring_holder
>  
> === libstdc++ Summary for unix/-flto/-g/ ===
>  
>  # of expected passes   11423
> -# of unexpected failures   140
> +# of unexpected failures   1
>  # of expected failures 65
> -# of unsupported tests 234
> +# of unsupported tests 243
> 
> which means all libstdc++ pretty printer tests now pass!
> 
> We get a few guality improvements (mind you, unconditionally added
> -flto -g):
> 
> -FAIL: gcc.dg/guality/inline-params.c   -O1  execution test
> +XPASS: gcc.dg/guality/inline-params.c   -O2  execution test
>  XPASS: gcc.dg/guality/inline-params.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
> +XPASS: gcc.dg/guality/inline-params.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> +XPASS: gcc.dg/guality/inline-params.c   -O3 -g  execution test
> +XPASS: gcc.dg/guality/inline-params.c   -Os  execution test
> 
> but also the issue that gdb doesn't cope with the way we represent
> VLAs with the new scheme:
> 
> +FAIL: gcc.dg/guality/vla-1.c   -O0  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O0  line 24 sizeof (a) == 17 * sizeof 
> (short)
> +FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
> +FAIL: gcc

Re: Implement C _FloatN, _FloatNx types [version 6]

2016-09-06 Thread Joseph Myers
On Tue, 6 Sep 2016, Richard Biener wrote:

> Note that this makes it maybe a good idea to only enable _FloatXX support
> for targets that have explicitely done so once we near the GCC 7 release.

I don't think that's a good idea; it's a recipe for most targets never 
supporting the new types when in fact they'd work fine.  (Every case other 
than possibly _Float32 variable arguments will just work as long as the 
back end uses machine modes to determine argument passing for scalar types 
and assuming the ABI chosen is the same as for the corresponding standard 
types.  32-bit powerpc is one of the most complicated variable-arguments 
ABIs.  And since there are no printf formats for the new types, passing 
them in variable arguments is an unusual case anyway.)

Architecture maintainers where there are externally maintained 
architecture definitions are of course welcome to work on getting an ABI 
for these types properly documented.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread David Edelsohn
On Tue, Sep 6, 2016 at 6:45 AM, Martin Liška  wrote:

>>> Proper fix contains of 2 parts:
>>> a) compiler emission must verify that -fprofile-update=atomic is doable for 
>>> a given target; it's done
>>> via a new function can_generate_atomic_builtin
>>> b) libgcc must detect whether __atomic_fetch_add_x can be expanded on the 
>>> target; that requires configure
>>> support and if the target is not capable to expand these, we must 
>>> conditionally remove all gcov_.*profiler_atomic
>>> functions from libgcov.a.
>>
>> I'm fine with the coverage-pecific changes, but the new hooks etc are not 
>> something I can approve.
>>
>> gcc/ChangeLog:
>>
>> 2016-08-12  Martin Liska  
>>
>> * optabs.c (can_generate_atomic_builtin): New function.
>> * optabs.h (can_generate_atomic_builtin): Declare the function.
>> Need GWM or similar review
>>
>> * tree-profile.c (tree_profiling):  Detect whether target can use
>> -fprofile-update=atomic.
>> ok
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-08-12  Martin Liska  
>>
>> * gcc.dg/profile-update-warning.c: New test.
>> ok
>>
>> libgcc/ChangeLog:
>>
>> 2016-08-16  Martin Liska  
>>
>> * acinclude.m4: New file.
>> * config.in: New macro defines.
>> * configure: Regenerated.
>> * configure.ac: Detect atomic operations.
>> need GWM or similar review
>>
>> * libgcov-profiler.c: Detect GCOV_SUPPORTS_ATOMIC and
>> conditionaly enable/disable *_atomic functions.
>> OK.
>>
>> nathan
>
> Hi Nathan.
>
> Thanks for review, I'm CCing Jakub and Richard for the review.
> The patch should fix very similar issue spotted on AIX target by David.

What about Jakub's comment in the PR?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6

The proposed patch seems wrong or at least incomplete.  The recent
change is imposing a 64 bit DImode counter when producing 32 bit code.
PowerPC does support 64 bit atomic operations in 32 bit mode.

Was there a design decision that profile counters always should be 64
bits?  Either 32 bit targets won't support multi-threaded profiling or
32 bit targets can overflow the counter sooner.  Which is worse?
Which is more likely?

Thanks, David


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-06 Thread Martin Liška
On 08/16/2016 04:30 PM, Nathan Sidwell wrote:
> On 08/16/16 08:55, Martin Liška wrote:
>> Hello.
>>
>> As reported in [1], m68k has been broken since I installed the patch. Reason 
>> is that the target
>> does not support atomic operations (add, or) for a mode of gcov_type. 
>> Because of that, we see
>> an undefined symbols.
>>
>> Proper fix contains of 2 parts:
>> a) compiler emission must verify that -fprofile-update=atomic is doable for 
>> a given target; it's done
>> via a new function can_generate_atomic_builtin
>> b) libgcc must detect whether __atomic_fetch_add_x can be expanded on the 
>> target; that requires configure
>> support and if the target is not capable to expand these, we must 
>> conditionally remove all gcov_.*profiler_atomic
>> functions from libgcov.a.
> 
> I'm fine with the coverage-pecific changes, but the new hooks etc are not 
> something I can approve.
> 
> gcc/ChangeLog:
> 
> 2016-08-12  Martin Liska  
> 
> * optabs.c (can_generate_atomic_builtin): New function.
> * optabs.h (can_generate_atomic_builtin): Declare the function.
> Need GWM or similar review
> 
> * tree-profile.c (tree_profiling):  Detect whether target can use
> -fprofile-update=atomic.
> ok
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-08-12  Martin Liska  
> 
> * gcc.dg/profile-update-warning.c: New test.
> ok
> 
> libgcc/ChangeLog:
> 
> 2016-08-16  Martin Liska  
> 
> * acinclude.m4: New file.
> * config.in: New macro defines.
> * configure: Regenerated.
> * configure.ac: Detect atomic operations.
> need GWM or similar review
> 
> * libgcov-profiler.c: Detect GCOV_SUPPORTS_ATOMIC and
> conditionaly enable/disable *_atomic functions.
> OK.
> 
> nathan

Hi Nathan.

Thanks for review, I'm CCing Jakub and Richard for the review.
The patch should fix very similar issue spotted on AIX target by David.

Thanks,
Martin


Re: [PATCH] Fix location of command line backend reported issues (PR middle-end/77475)

2016-09-06 Thread Christophe Lyon
On 6 September 2016 at 12:12, Jakub Jelinek  wrote:
> On Tue, Sep 06, 2016 at 12:07:47PM +0200, Christophe Lyon wrote:
>> On 5 September 2016 at 19:20, Jakub Jelinek  wrote:
>> > Hi!
>> >
>> > While it would be perhaps nice to pass explicit location_t in the target
>> > option handling code, there are hundreds of error/warning/sorry calls
>> > in lots of backends, and lots of those routines are used not just
>> > for the process_options time (i.e. command line options), but also for
>> > pragma GCC target and target option handling, so at least for the time 
>> > being
>> > I think it is easier to just use UNKNOWN_LOCATION for the command line
>> > option diagnostics.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> > 2016-09-05  Jakub Jelinek  
>> >
>> > PR middle-end/77475
>> > * toplev.c (process_options): Temporarily set input_location
>> > to UNKNOWN_LOCATION around targetm.target_option.override () call.
>> >
>>
>> This patch caused regressions on aarch64. The following tests now fail:
>>   gcc.target/aarch64/arch-diagnostics-1.c  (test for errors, line 1)
>>   gcc.target/aarch64/arch-diagnostics-1.c (test for excess errors)
>>   gcc.target/aarch64/arch-diagnostics-2.c  (test for errors, line 1)
>>   gcc.target/aarch64/arch-diagnostics-2.c (test for excess errors)
>>   gcc.target/aarch64/cpu-diagnostics-1.c  (test for errors, line 1)
>>   gcc.target/aarch64/cpu-diagnostics-1.c (test for excess errors)
>>   gcc.target/aarch64/cpu-diagnostics-2.c  (test for errors, line 1)
>>   gcc.target/aarch64/cpu-diagnostics-2.c (test for excess errors)
>>   gcc.target/aarch64/cpu-diagnostics-3.c  (test for errors, line 1)
>>   gcc.target/aarch64/cpu-diagnostics-3.c (test for excess errors)
>>   gcc.target/aarch64/cpu-diagnostics-4.c  (test for errors, line 1)
>>   gcc.target/aarch64/cpu-diagnostics-4.c (test for excess errors)
>
> Those tests need adjustments, not to expect such errors on line 1, but on
> line 0.
>
> I think following untested patch should fix that:
>
Indeed it does, thanks!

Christophe

> --- gcc/testsuite/gcc.target/aarch64/arch-diagnostics-1.c.jj2012-10-23 
> 19:54:58.0 +0200
> +++ gcc/testsuite/gcc.target/aarch64/arch-diagnostics-1.c   2016-09-06 
> 12:10:32.241560531 +0200
> @@ -1,4 +1,4 @@
> -/* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
> +/* { dg-error "unknown" "" {target "aarch64*-*-*" } 0 } */
>  /* { dg-options "-O2 -march=dummy" } */
>
>  void f ()
> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c.jj 2016-05-18 
> 10:59:49.0 +0200
> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c2016-09-06 
> 12:11:08.170110003 +0200
> @@ -1,4 +1,4 @@
> -/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
> +/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
>  /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
>  /* { dg-options "-O2 -mcpu=cortex-a53+no" } */
>
> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c.jj 2016-05-18 
> 10:59:49.0 +0200
> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c2016-09-06 
> 12:11:01.698191158 +0200
> @@ -1,4 +1,4 @@
> -/* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
> +/* { dg-error "unknown" "" {target "aarch64*-*-*" } 0 } */
>  /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
>  /* { dg-options "-O2 -mcpu=dummy" } */
>
> --- gcc/testsuite/gcc.target/aarch64/arch-diagnostics-2.c.jj2012-10-23 
> 19:54:58.0 +0200
> +++ gcc/testsuite/gcc.target/aarch64/arch-diagnostics-2.c   2016-09-06 
> 12:10:39.737466536 +0200
> @@ -1,4 +1,4 @@
> -/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
> +/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
>  /* { dg-options "-O2 -march=+dummy" } */
>
>  void f ()
> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c.jj 2016-05-18 
> 10:59:49.0 +0200
> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c2016-09-06 
> 12:11:13.628041564 +0200
> @@ -1,4 +1,4 @@
> -/* { dg-error "invalid feature" "" {target "aarch64*-*-*" } } */
> +/* { dg-error "invalid feature" "" {target "aarch64*-*-*" } 0 } */
>  /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
>  /* { dg-options "-O2 -mcpu=cortex-a53+dummy" } */
>
> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c.jj 2016-05-18 
> 10:59:49.0 +0200
> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c2016-09-06 
> 12:11:18.44898 +0200
> @@ -1,4 +1,4 @@
> -/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
> +/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
>  /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
>  /* { dg-options "-O2 -mcpu=+dummy" } */
>
>
>
> Jakub


Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Richard Biener
On Tue, Sep 6, 2016 at 12:07 PM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
>
> On 6 September 2016 at 19:57, Richard Biener  
> wrote:
>> On Tue, Sep 6, 2016 at 11:33 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> On 6 September 2016 at 19:08, Richard Biener  
>>> wrote:
 On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
  wrote:
> Hi Richard,
>
> On 5 September 2016 at 17:57, Richard Biener  
> wrote:
>> On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi All,
>>>
>>> While looking at gcc source, I noticed that we are iterating over SSA
>>> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
>>> in others. It seems that variable 0 is always NULL TREE.
>>
>> Yeah, that's artificial because we don't assign SSA name version 0 (for 
>> some
>> unknown reason).
>>
>>> But it can
>>> confuse people who are looking for the first time. Therefore It might
>>> be good to follow some consistent usage here.
>>>
>>> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
>>> other case. Here is attempt to do this based on what is done in other
>>> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
>>> new regressions. is this OK?
>>
>> First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
>> as SSAVAR might be confused with iterating over SSA_NAME_VAR.
>>
>> Then, if you add an iterator why leave the name == NULL handling to 
>> consumers?
>> That looks odd.
>>
>> Then, SSA names are just in a vector thus why not re-use the vector 
>> iterator?
>>
>> That is,
>>
>> #define FOR_EACH_SSA_NAME (name) \
>>   for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i)
>>
>> would be equivalent to your patch?
>>
>> Please also don't add new iterators that implicitely use 'cfun' but 
>> always use
>> one that passes it as explicit arg.
>
> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
> we will not be able to skill NULL ssa_names with that.

 Why?  Can't you simply do

   #define FOR_EACH_SSA_NAME (name) \
 for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i) \
if (name)

 ?
>>>
>>> For example, with the test patch where loop body also defines i which
>>> is giving error:
>>>
>>> ../../trunk/gcc/tree-ssa-ter.c: In function ‘temp_expr_table*
>>> new_temp_expr_table(var_map)’:
>>> ../../trunk/gcc/tree-ssa-ter.c:206:11: error: redeclaration of ‘int i’
>>>int i;
>>>^
>>> In file included from ../../trunk/gcc/ssa.h:29:0,
>>>  from ../../trunk/gcc/tree-ssa-ter.c:28:
>>> ../../trunk/gcc/tree-ssanames.h:66:17: note: ‘unsigned int i’
>>> previously declared here
>>>for (unsigned i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i)
>>>  ^
>>> ../../trunk/gcc/tree-ssa-ter.c:204:3: note: in expansion of macro
>>> ‘FOR_EACH_SSA_NAME’
>>>FOR_EACH_SSA_NAME (name)
>>>^
>>> Makefile:1097: recipe for target 'tree-ssa-ter.o' failed
>>>
>>> Am I missing something here ?
>>
>> Well, my comment was not about passing 'i' to the macro but about
>> not being able to skip NULL names.  I just copied & pasted my
>> earlier macro.
>
> Sorry, I misunderstood your comment.
>
> I have:
>
> +#define FOR_EACH_SSA_NAME(i, VAR) \
> +  for (i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i)
>
> We will skip the first (0 index) if that is what you are referring here.
>
> I also thought that you wanted to skip any NULL_TREEs after that too.
> After looking at your previous your comment, I don’t you think you
> wanted that anyway.

I _did_ want that, otherwise my suggestion to use

 #define FOR_EACH_SSA_NAME(i, VAR) \
   for (i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i) \
  if (VAR)

would have made no sense.  Btw, please use capital i and put uses
in parens when not lvalues (see FOR_EACH_VEC_ELT).  As I said
I'd like 'cfun' to be explicit, thus

#define FOR_EACH_SSA_NAME(I, VAR, FN) \
  for (I = 1; SSANAMES (FN)->iterate ((I), &(VAR)); ++(I)) \
if (VAR)

Thanks,
Richard.

> Therefore I think I am doing what you wanted in my last patch.
>
> Thanks,
> Kugan
>
>
>> Richard.
>>
>>> Thanks,
>>> Kugan
>>>

> I also added
> index variable to the macro so that there want be any conflicts if the
> index variable "i" (or whatever) is also defined in the loop.
>
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions. Is this OK for trunk?
>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-09-06  Kugan Vivekanandarajah  
>
> * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
> * cfgexpand.c (update_alias_info_with_stack_vars): Use
> FOR_EACH_SSA_NAME to iterate over SSA variables.
> (pass_expand::execute): Likewi

Re: [libstdc++-v3] Fix dg-require before dg-run directives in testsuite.

2016-09-06 Thread Jonathan Wakely

On 02/09/16 14:17 +0100, Matthew Wahab wrote:

Hello,

Tests in the libstdc++-v3 testsuite were recently changed to use { dg-do
.. { target c++11 } } instead of using a dg-options to set -std
(https://gcc.gnu.org/ml/libstdc++/2016-08/msg00102.html). As a
consequence, several tests were left with directive lists that had
dg-require preceding dg-do. This meant that some tests were run when
they should have been skipped as unsupported. In particular, most of the
23_algorithms/*/complexity.cc tests became FAIL on aarch64-none-elf

This patch adjusts the tests so that the dg-requires come after the
dg-do directives. The makes the tests that were previously
FAIL/UNRESOLVED become UNSUPPORTED. It also makes UNSUPPORTED some tests
that were PASS.

The tests that move from FAIL/UNRESOLVED to UNSUPPORTED:

- 22_locale/locale/cons/unicode.cc.
- 25_algorithms/{pop_heap,push_heap,sort_heap}/complexity.c.

From PASS to UNSUPPORTED:

- 23_containers/*/debug/60499.c
- 23_containers/vector/debug/52433.c

Tested by running the testsuite for cross-compiled aarch64-none-elf.

Ok for trunk?


Yes, thanks for cleaning this up.




Re: [PATCH] Fix location of command line backend reported issues (PR middle-end/77475)

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 12:07:47PM +0200, Christophe Lyon wrote:
> On 5 September 2016 at 19:20, Jakub Jelinek  wrote:
> > Hi!
> >
> > While it would be perhaps nice to pass explicit location_t in the target
> > option handling code, there are hundreds of error/warning/sorry calls
> > in lots of backends, and lots of those routines are used not just
> > for the process_options time (i.e. command line options), but also for
> > pragma GCC target and target option handling, so at least for the time being
> > I think it is easier to just use UNKNOWN_LOCATION for the command line
> > option diagnostics.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2016-09-05  Jakub Jelinek  
> >
> > PR middle-end/77475
> > * toplev.c (process_options): Temporarily set input_location
> > to UNKNOWN_LOCATION around targetm.target_option.override () call.
> >
> 
> This patch caused regressions on aarch64. The following tests now fail:
>   gcc.target/aarch64/arch-diagnostics-1.c  (test for errors, line 1)
>   gcc.target/aarch64/arch-diagnostics-1.c (test for excess errors)
>   gcc.target/aarch64/arch-diagnostics-2.c  (test for errors, line 1)
>   gcc.target/aarch64/arch-diagnostics-2.c (test for excess errors)
>   gcc.target/aarch64/cpu-diagnostics-1.c  (test for errors, line 1)
>   gcc.target/aarch64/cpu-diagnostics-1.c (test for excess errors)
>   gcc.target/aarch64/cpu-diagnostics-2.c  (test for errors, line 1)
>   gcc.target/aarch64/cpu-diagnostics-2.c (test for excess errors)
>   gcc.target/aarch64/cpu-diagnostics-3.c  (test for errors, line 1)
>   gcc.target/aarch64/cpu-diagnostics-3.c (test for excess errors)
>   gcc.target/aarch64/cpu-diagnostics-4.c  (test for errors, line 1)
>   gcc.target/aarch64/cpu-diagnostics-4.c (test for excess errors)

Those tests need adjustments, not to expect such errors on line 1, but on
line 0.

I think following untested patch should fix that:

--- gcc/testsuite/gcc.target/aarch64/arch-diagnostics-1.c.jj2012-10-23 
19:54:58.0 +0200
+++ gcc/testsuite/gcc.target/aarch64/arch-diagnostics-1.c   2016-09-06 
12:10:32.241560531 +0200
@@ -1,4 +1,4 @@
-/* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
+/* { dg-error "unknown" "" {target "aarch64*-*-*" } 0 } */
 /* { dg-options "-O2 -march=dummy" } */
 
 void f ()
--- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c.jj 2016-05-18 
10:59:49.0 +0200
+++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c2016-09-06 
12:11:08.170110003 +0200
@@ -1,4 +1,4 @@
-/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
+/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
 /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
 /* { dg-options "-O2 -mcpu=cortex-a53+no" } */
 
--- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c.jj 2016-05-18 
10:59:49.0 +0200
+++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c2016-09-06 
12:11:01.698191158 +0200
@@ -1,4 +1,4 @@
-/* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
+/* { dg-error "unknown" "" {target "aarch64*-*-*" } 0 } */
 /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
 /* { dg-options "-O2 -mcpu=dummy" } */
 
--- gcc/testsuite/gcc.target/aarch64/arch-diagnostics-2.c.jj2012-10-23 
19:54:58.0 +0200
+++ gcc/testsuite/gcc.target/aarch64/arch-diagnostics-2.c   2016-09-06 
12:10:39.737466536 +0200
@@ -1,4 +1,4 @@
-/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
+/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
 /* { dg-options "-O2 -march=+dummy" } */
 
 void f ()
--- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c.jj 2016-05-18 
10:59:49.0 +0200
+++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c2016-09-06 
12:11:13.628041564 +0200
@@ -1,4 +1,4 @@
-/* { dg-error "invalid feature" "" {target "aarch64*-*-*" } } */
+/* { dg-error "invalid feature" "" {target "aarch64*-*-*" } 0 } */
 /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
 /* { dg-options "-O2 -mcpu=cortex-a53+dummy" } */
 
--- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c.jj 2016-05-18 
10:59:49.0 +0200
+++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c2016-09-06 
12:11:18.44898 +0200
@@ -1,4 +1,4 @@
-/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
+/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
 /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
 /* { dg-options "-O2 -mcpu=+dummy" } */
 


Jakub


Re: [PATCH] Fix location of command line backend reported issues (PR middle-end/77475)

2016-09-06 Thread Christophe Lyon
On 5 September 2016 at 19:20, Jakub Jelinek  wrote:
> Hi!
>
> While it would be perhaps nice to pass explicit location_t in the target
> option handling code, there are hundreds of error/warning/sorry calls
> in lots of backends, and lots of those routines are used not just
> for the process_options time (i.e. command line options), but also for
> pragma GCC target and target option handling, so at least for the time being
> I think it is easier to just use UNKNOWN_LOCATION for the command line
> option diagnostics.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-09-05  Jakub Jelinek  
>
> PR middle-end/77475
> * toplev.c (process_options): Temporarily set input_location
> to UNKNOWN_LOCATION around targetm.target_option.override () call.
>

Hi,

This patch caused regressions on aarch64. The following tests now fail:
  gcc.target/aarch64/arch-diagnostics-1.c  (test for errors, line 1)
  gcc.target/aarch64/arch-diagnostics-1.c (test for excess errors)
  gcc.target/aarch64/arch-diagnostics-2.c  (test for errors, line 1)
  gcc.target/aarch64/arch-diagnostics-2.c (test for excess errors)
  gcc.target/aarch64/cpu-diagnostics-1.c  (test for errors, line 1)
  gcc.target/aarch64/cpu-diagnostics-1.c (test for excess errors)
  gcc.target/aarch64/cpu-diagnostics-2.c  (test for errors, line 1)
  gcc.target/aarch64/cpu-diagnostics-2.c (test for excess errors)
  gcc.target/aarch64/cpu-diagnostics-3.c  (test for errors, line 1)
  gcc.target/aarch64/cpu-diagnostics-3.c (test for excess errors)
  gcc.target/aarch64/cpu-diagnostics-4.c  (test for errors, line 1)
  gcc.target/aarch64/cpu-diagnostics-4.c (test for excess errors)

Christophe


> --- gcc/toplev.c.jj 2016-09-03 11:18:55.0 +0200
> +++ gcc/toplev.c2016-09-05 15:05:19.819995470 +0200
> @@ -1220,7 +1220,10 @@ process_options (void)
>no_backend = lang_hooks.post_options (&main_input_filename);
>
>/* Some machines may reject certain combinations of options.  */
> +  location_t saved_location = input_location;
> +  input_location = UNKNOWN_LOCATION;
>targetm.target_option.override ();
> +  input_location = saved_location;
>
>if (flag_diagnostics_generate_patch)
>global_dc->edit_context_ptr = new edit_context ();
>
> Jakub


Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Kugan Vivekanandarajah
Hi Richard,


On 6 September 2016 at 19:57, Richard Biener  wrote:
> On Tue, Sep 6, 2016 at 11:33 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>> On 6 September 2016 at 19:08, Richard Biener  
>> wrote:
>>> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>>>  wrote:
 Hi Richard,

 On 5 September 2016 at 17:57, Richard Biener  
 wrote:
> On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi All,
>>
>> While looking at gcc source, I noticed that we are iterating over SSA
>> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
>> in others. It seems that variable 0 is always NULL TREE.
>
> Yeah, that's artificial because we don't assign SSA name version 0 (for 
> some
> unknown reason).
>
>> But it can
>> confuse people who are looking for the first time. Therefore It might
>> be good to follow some consistent usage here.
>>
>> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
>> other case. Here is attempt to do this based on what is done in other
>> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
>> new regressions. is this OK?
>
> First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
> as SSAVAR might be confused with iterating over SSA_NAME_VAR.
>
> Then, if you add an iterator why leave the name == NULL handling to 
> consumers?
> That looks odd.
>
> Then, SSA names are just in a vector thus why not re-use the vector 
> iterator?
>
> That is,
>
> #define FOR_EACH_SSA_NAME (name) \
>   for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i)
>
> would be equivalent to your patch?
>
> Please also don't add new iterators that implicitely use 'cfun' but 
> always use
> one that passes it as explicit arg.

 I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
 we will not be able to skill NULL ssa_names with that.
>>>
>>> Why?  Can't you simply do
>>>
>>>   #define FOR_EACH_SSA_NAME (name) \
>>> for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i) \
>>>if (name)
>>>
>>> ?
>>
>> For example, with the test patch where loop body also defines i which
>> is giving error:
>>
>> ../../trunk/gcc/tree-ssa-ter.c: In function ‘temp_expr_table*
>> new_temp_expr_table(var_map)’:
>> ../../trunk/gcc/tree-ssa-ter.c:206:11: error: redeclaration of ‘int i’
>>int i;
>>^
>> In file included from ../../trunk/gcc/ssa.h:29:0,
>>  from ../../trunk/gcc/tree-ssa-ter.c:28:
>> ../../trunk/gcc/tree-ssanames.h:66:17: note: ‘unsigned int i’
>> previously declared here
>>for (unsigned i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i)
>>  ^
>> ../../trunk/gcc/tree-ssa-ter.c:204:3: note: in expansion of macro
>> ‘FOR_EACH_SSA_NAME’
>>FOR_EACH_SSA_NAME (name)
>>^
>> Makefile:1097: recipe for target 'tree-ssa-ter.o' failed
>>
>> Am I missing something here ?
>
> Well, my comment was not about passing 'i' to the macro but about
> not being able to skip NULL names.  I just copied & pasted my
> earlier macro.

Sorry, I misunderstood your comment.

I have:

+#define FOR_EACH_SSA_NAME(i, VAR) \
+  for (i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i)

We will skip the first (0 index) if that is what you are referring here.

I also thought that you wanted to skip any NULL_TREEs after that too.
After looking at your previous your comment, I don’t you think you
wanted that anyway.

Therefore I think I am doing what you wanted in my last patch.

Thanks,
Kugan


> Richard.
>
>> Thanks,
>> Kugan
>>
>>>
 I also added
 index variable to the macro so that there want be any conflicts if the
 index variable "i" (or whatever) is also defined in the loop.

 Bootstrapped and regression tested on x86_64-linux-gnu with no new
 regressions. Is this OK for trunk?

 Thanks,
 Kugan


 gcc/ChangeLog:

 2016-09-06  Kugan Vivekanandarajah  

 * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
 * cfgexpand.c (update_alias_info_with_stack_vars): Use
 FOR_EACH_SSA_NAME to iterate over SSA variables.
 (pass_expand::execute): Likewise.
 * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
 * tree-cfg.c (dump_function_to_file): Likewise.
 * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
 (update_ssa): Likewise.
 * tree-ssa-alias.c (dump_alias_info): Likewise.
 * tree-ssa-ccp.c (ccp_finalize): Likewise.
 * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
 (create_outofssa_var_map): Likewise.
 (coalesce_ssa_name): Likewise.
 * tree-ssa-operands.c (dump_immediate_uses): Likewise.
 * tree-ssa-pre.c (compute_avail): Likewise.
 * tree-ssa-sccvn.c (init_scc_vn): Likewise.
 (scc_vn

Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Richard Biener
On Tue, Sep 6, 2016 at 11:33 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 6 September 2016 at 19:08, Richard Biener  
> wrote:
>> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> On 5 September 2016 at 17:57, Richard Biener  
>>> wrote:
 On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
  wrote:
> Hi All,
>
> While looking at gcc source, I noticed that we are iterating over SSA
> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
> in others. It seems that variable 0 is always NULL TREE.

 Yeah, that's artificial because we don't assign SSA name version 0 (for 
 some
 unknown reason).

> But it can
> confuse people who are looking for the first time. Therefore It might
> be good to follow some consistent usage here.
>
> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
> other case. Here is attempt to do this based on what is done in other
> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
> new regressions. is this OK?

 First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
 as SSAVAR might be confused with iterating over SSA_NAME_VAR.

 Then, if you add an iterator why leave the name == NULL handling to 
 consumers?
 That looks odd.

 Then, SSA names are just in a vector thus why not re-use the vector 
 iterator?

 That is,

 #define FOR_EACH_SSA_NAME (name) \
   for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i)

 would be equivalent to your patch?

 Please also don't add new iterators that implicitely use 'cfun' but always 
 use
 one that passes it as explicit arg.
>>>
>>> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
>>> we will not be able to skill NULL ssa_names with that.
>>
>> Why?  Can't you simply do
>>
>>   #define FOR_EACH_SSA_NAME (name) \
>> for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i) \
>>if (name)
>>
>> ?
>
> For example, with the test patch where loop body also defines i which
> is giving error:
>
> ../../trunk/gcc/tree-ssa-ter.c: In function ‘temp_expr_table*
> new_temp_expr_table(var_map)’:
> ../../trunk/gcc/tree-ssa-ter.c:206:11: error: redeclaration of ‘int i’
>int i;
>^
> In file included from ../../trunk/gcc/ssa.h:29:0,
>  from ../../trunk/gcc/tree-ssa-ter.c:28:
> ../../trunk/gcc/tree-ssanames.h:66:17: note: ‘unsigned int i’
> previously declared here
>for (unsigned i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i)
>  ^
> ../../trunk/gcc/tree-ssa-ter.c:204:3: note: in expansion of macro
> ‘FOR_EACH_SSA_NAME’
>FOR_EACH_SSA_NAME (name)
>^
> Makefile:1097: recipe for target 'tree-ssa-ter.o' failed
>
> Am I missing something here ?

Well, my comment was not about passing 'i' to the macro but about
not being able to skip NULL names.  I just copied & pasted my
earlier macro.

Richard.

> Thanks,
> Kugan
>
>>
>>> I also added
>>> index variable to the macro so that there want be any conflicts if the
>>> index variable "i" (or whatever) is also defined in the loop.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>>> regressions. Is this OK for trunk?
>>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-09-06  Kugan Vivekanandarajah  
>>>
>>> * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
>>> * cfgexpand.c (update_alias_info_with_stack_vars): Use
>>> FOR_EACH_SSA_NAME to iterate over SSA variables.
>>> (pass_expand::execute): Likewise.
>>> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>>> * tree-cfg.c (dump_function_to_file): Likewise.
>>> * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>>> (update_ssa): Likewise.
>>> * tree-ssa-alias.c (dump_alias_info): Likewise.
>>> * tree-ssa-ccp.c (ccp_finalize): Likewise.
>>> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
>>> (create_outofssa_var_map): Likewise.
>>> (coalesce_ssa_name): Likewise.
>>> * tree-ssa-operands.c (dump_immediate_uses): Likewise.
>>> * tree-ssa-pre.c (compute_avail): Likewise.
>>> * tree-ssa-sccvn.c (init_scc_vn): Likewise.
>>> (scc_vn_restore_ssa_info): Likewise.
>>> (free_scc_vn): Likwise.
>>> (run_scc_vn): Likewise.
>>> * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
>>> * tree-ssa-ter.c (new_temp_expr_table): Likewise.
>>> * tree-ssa-copy.c (fini_copy_prop): Likewise.
>>> * tree-ssa.c (verify_ssa): Likewise.
>>>

 Thanks,
 Richard.


> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-09-05  Kugan Vivekanandarajah  
>
> * tree-ssanames.h (ssa_iterator::ssa_iterator): New.
> (ssa_iterator::get): Likewise.
> (ssa_iterator::next): Likewise.
> (FOR_EACH_SSAVAR):

Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Kugan Vivekanandarajah
Hi Richard,

On 6 September 2016 at 19:08, Richard Biener  wrote:
> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>> On 5 September 2016 at 17:57, Richard Biener  
>> wrote:
>>> On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
>>>  wrote:
 Hi All,

 While looking at gcc source, I noticed that we are iterating over SSA
 variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
 in others. It seems that variable 0 is always NULL TREE.
>>>
>>> Yeah, that's artificial because we don't assign SSA name version 0 (for some
>>> unknown reason).
>>>
 But it can
 confuse people who are looking for the first time. Therefore It might
 be good to follow some consistent usage here.

 It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
 other case. Here is attempt to do this based on what is done in other
 places. Bootstrapped and regression tested on X86_64-linux-gnu with no
 new regressions. is this OK?
>>>
>>> First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
>>> as SSAVAR might be confused with iterating over SSA_NAME_VAR.
>>>
>>> Then, if you add an iterator why leave the name == NULL handling to 
>>> consumers?
>>> That looks odd.
>>>
>>> Then, SSA names are just in a vector thus why not re-use the vector 
>>> iterator?
>>>
>>> That is,
>>>
>>> #define FOR_EACH_SSA_NAME (name) \
>>>   for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i)
>>>
>>> would be equivalent to your patch?
>>>
>>> Please also don't add new iterators that implicitely use 'cfun' but always 
>>> use
>>> one that passes it as explicit arg.
>>
>> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
>> we will not be able to skill NULL ssa_names with that.
>
> Why?  Can't you simply do
>
>   #define FOR_EACH_SSA_NAME (name) \
> for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i) \
>if (name)
>
> ?

For example, with the test patch where loop body also defines i which
is giving error:

../../trunk/gcc/tree-ssa-ter.c: In function ‘temp_expr_table*
new_temp_expr_table(var_map)’:
../../trunk/gcc/tree-ssa-ter.c:206:11: error: redeclaration of ‘int i’
   int i;
   ^
In file included from ../../trunk/gcc/ssa.h:29:0,
 from ../../trunk/gcc/tree-ssa-ter.c:28:
../../trunk/gcc/tree-ssanames.h:66:17: note: ‘unsigned int i’
previously declared here
   for (unsigned i = 1; SSANAMES (cfun)->iterate (i, &VAR); ++i)
 ^
../../trunk/gcc/tree-ssa-ter.c:204:3: note: in expansion of macro
‘FOR_EACH_SSA_NAME’
   FOR_EACH_SSA_NAME (name)
   ^
Makefile:1097: recipe for target 'tree-ssa-ter.o' failed

Am I missing something here ?

Thanks,
Kugan

>
>> I also added
>> index variable to the macro so that there want be any conflicts if the
>> index variable "i" (or whatever) is also defined in the loop.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>> regressions. Is this OK for trunk?
>>
>> Thanks,
>> Kugan
>>
>>
>> gcc/ChangeLog:
>>
>> 2016-09-06  Kugan Vivekanandarajah  
>>
>> * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
>> * cfgexpand.c (update_alias_info_with_stack_vars): Use
>> FOR_EACH_SSA_NAME to iterate over SSA variables.
>> (pass_expand::execute): Likewise.
>> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>> * tree-cfg.c (dump_function_to_file): Likewise.
>> * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>> (update_ssa): Likewise.
>> * tree-ssa-alias.c (dump_alias_info): Likewise.
>> * tree-ssa-ccp.c (ccp_finalize): Likewise.
>> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
>> (create_outofssa_var_map): Likewise.
>> (coalesce_ssa_name): Likewise.
>> * tree-ssa-operands.c (dump_immediate_uses): Likewise.
>> * tree-ssa-pre.c (compute_avail): Likewise.
>> * tree-ssa-sccvn.c (init_scc_vn): Likewise.
>> (scc_vn_restore_ssa_info): Likewise.
>> (free_scc_vn): Likwise.
>> (run_scc_vn): Likewise.
>> * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
>> * tree-ssa-ter.c (new_temp_expr_table): Likewise.
>> * tree-ssa-copy.c (fini_copy_prop): Likewise.
>> * tree-ssa.c (verify_ssa): Likewise.
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
 Thanks,
 Kugan


 gcc/ChangeLog:

 2016-09-05  Kugan Vivekanandarajah  

 * tree-ssanames.h (ssa_iterator::ssa_iterator): New.
 (ssa_iterator::get): Likewise.
 (ssa_iterator::next): Likewise.
 (FOR_EACH_SSAVAR): Likewise.
 * cfgexpand.c (update_alias_info_with_stack_vars): Use
 FOR_EACH_SSAVAR to iterate over SSA variables.
 (pass_expand::execute): Likewise.
 * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
 * tree-cfg.c (dump_function_to_file): Likewise.
 * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
 (update_ssa): Lik

[Patch RFC] Modify excess precision logic to permit FLT_EVAL_METHOD=16

2016-09-06 Thread James Greenhalgh

Hi,

ISO/IEC TS 18661-3 redefines FLT_EVAL_METHOD. In this patch I'm interested
in updating the code in GCC to handle two of the changes. The redefinition
of the meanings of {0, 1, 2}, and the meaning of "N" for the specific case
of N=16.

In ISO/IEC TS 18661-3, these are defined as so:

  0

Evaluate all operations and constants, whose semantic type has at most
the range and precision of float, to the range and precision of
float; evaluate all other operations and constants to the range and
precision of the semantic type.

  1

Evaluate all operations and constants, whose semantic type has at most
the range and precision of double, to the range and precision of
double; evaluate all other operations and constants to the range and
precision of the semantic type.

  2

Evaluate all operations and constants, whose semantic type has at most
the range and precision of long double, to the range and precision of
long double; evaluate all other operations and constants to the range
and precision of the semantic type.

  N, where _FloatN is a supported interchange floating type

Evaluate operations and constants whose semantic type has at most
the range and precision of the _FloatN type, to the range and precision
of the _FloatN type; evaluate all other operations and constants to the
range and precision of the semantic type;

When we enable support for _Float16 in AArch64 we would like, where
we have the ARMv8.2-A 16-bit floating point extensions available, to
evaluate operations to the range and precision of that type. Where we
do not have the extensions available, we'd like to evaluate operations
to the range and precision of float.

This will require adding support for FLT_EVAL_METHOD=16. In most cases
this is simply duplicating the exceptions that exist for FLT_EVAL_METHOD==0
and teaching some of the excess-precision logic that it may need to
handle _Float16 types if FLT_EVAL_METHOD=0.

In this patch, I first add a new enum for the 5 float methods, just to save
the repeated magic numbers now we are up to 5 of them. I've added this to
flag-types.h, but there may be a better home for it.

In c-family/c-cppbuiltin.c I've updated cpp_iec_559_value such that also
allow setting __GEC_IEC_559 if FLT_EVAL_METHOD=16, and I've updated
c_cpp_builtins to handle the new value, and use the new enum names.

Then I've updated init_excess_precision in toplev.c with the new enum
names, and with logic to understand that for targets that provide _Float16,
we can't set flag_excess_precision to "fast" when FLT_EVAL_METHOD=0. This is
because FLT_EVAL_METHOD=0 requires predictably using the precision of float
for _Float16 types.

In tree.c, I've modified excess_precision_type with the logic discussed
above, promoting _Float16 to the appropriate type if we are required to.

I've also added a special case that allows promoting to "float" from an
_Float16 type in the case that -fexcess-precision=fast. If we don't do
this, then the "fast" case will spend more time promoting and demoting
between HFmode and SFmode and the consequence will be slower code.

Bootstrapped on AArch64 and x86_64.

OK?

Thanks,
James

---
gcc/

2016-09-06  James Greenhalgh  

* flag-types.h (flt_eval_method): New.
* toplev.c (init_excess_precision): Handle
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.
* tree.c (excess_precision_type): Likewise.

gcc/c-family/

2016-09-06  James Greenhalgh  

* c-cppbuiltin.c (cpp_iec_559_value): Support
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.
(c_cpp_builtins): Likewise.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index ee4d233..f278aff 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -735,10 +735,13 @@ cpp_iec_559_value (void)
  excess precision to mean lack of IEEE 754 support.  The same
  applies to unpredictable contraction.  For C++, and outside
  strict conformance mode, do not consider these options to mean
- lack of IEEE 754 support.  */
+ lack of IEEE 754 support.  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT and
+ FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 both give predictable excess
+ precision.  */
   if (flag_iso
   && !c_dialect_cxx ()
-  && TARGET_FLT_EVAL_METHOD != 0
+  && TARGET_FLT_EVAL_METHOD != FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
+  && TARGET_FLT_EVAL_METHOD != FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
   && flag_excess_precision_cmdline != EXCESS_PRECISION_STANDARD)
 ret = 0;
   if (flag_iso
@@ -1118,24 +1121,28 @@ c_cpp_builtins (cpp_reader *pfile)
 	}
 	  builtin_define_with_value (macro_name, suffix, 0);
 	  bool excess_precision = false;
-	  if (TARGET_FLT_EVAL_METHOD != 0
-	  && mode != TYPE_MODE (long_double_type_node)
-	  && (mode == TYPE_MODE (float_type_node)
-		  || mode == TYPE_MODE (double_type_node)))
-	switch (TARGET_FLT_EVAL_METHOD)
-	  {
-	  case -1:
-	  case 2:
-		excess_precision =

Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Richard Biener
On Tue, Sep 6, 2016 at 11:11 AM, Florian Weimer  wrote:
> On 09/05/2016 07:06 PM, Joseph Myers wrote:
>
>> Such an increase is of course an ABI change, but a reasonably safe
>> one, in that max_align_t doesn't tend to appear in library interfaces
>> (rather, it's something to use when writing allocators and similar
>> code; most matches found on codesearch.debian.net look like copies of
>> the gnulib stddef.h module rather than anything actually using
>> max_align_t at all).
>
>
> I've thought some more about this.  I'll try to rephrase my reservations in
> a less diplomatic way, hopefully making my thoughts clearer.
>
> I think this change is only safe because nothing relies on it. max_align_t
> is a committee invention with no actual users.  As such, you can change it
> in any way you see fit and it won't make a difference in any way.
>
> Why aren't there any users?  The standard isn't very clear what the value of
> _Alignof (max_align_t) actually means.  Does the phrase “all contexts”
> include pointers returned malloc?  Even if the requested size is smaller
> than the fundamental alignment?  Did those who wrote the standard expect
> there to be *any* relationship between malloc and max_align_t?
>
> The existing situation is that most mallocs to do not provide _Alignof
> (max_align_t) alignment unconditionally.  But they can provide arbitrarily
> large alignment with aligned_alloc/memalign-style interfaces.
>
> For object alignment declarations, I think GCC can satisfy most requests,
> perhaps subject to binutils/dynamic linker limitations for global variables
> on some targets.
>
> The question then is, how can we make max_align_t more useful?  If it is
> supposed to reflect a malloc property, it cannot be a compile-time constant
> because we don't know at compile time which malloc is going to be used
> (malloc has to remain interposable).  If there is no relationship to malloc,
> GCC essentially supports unbounded alignment on many targets.  How is it
> possible to have a meaningful definition of max_align_t under such
> circumstances?

I thought max_align_t was to replace uses like

  union {
long long x;
double y;
...
char buf[N];
  };

to get statically allocated "max" aligned memory.  That is, you now know
_which_ type you need to put into the union.

Richard.

> Florian


[Patch AArch64] Add floatdihf2 and floatunsdihf2 patterns

2016-09-06 Thread James Greenhalgh

Hi,

This patch adds patterns for conversion from 64-bit integer to 16-bit
floating-point values under AArch64 targets which don't have support for
the ARMv8.2-A 16-bit floating point extensions.

We implement these by first saturating to a SImode (we know that any
values >= 65504 will round to infinity after conversion to HFmode), then
converting to a DFmode (unsigned conversions could go to SFmode, but there
is no performance benefit to this). Then converting to HFmode.

Having added these patterns, the expansion path in "expand_float" will
now try to use them for conversions from SImode to HFmode as there is no
floatsihf2 pattern. expand_float first tries widening the integer size and
looking for a match, so it will try SImode -> DImode. But our DI mode
pattern is going to then saturate us back to SImode which is wasteful.

Better, would be for us to provide float(uns)sihf2 patterns directly.
So that's what this patch does.

The testcase add in this patch would fail on trunk for AArch64. There is
no libgcc routine to make the conversion, and we don't provide appropriate
patterns in the backend, so we get a link-time error.

Bootstrapped and tested on aarch64-none-linux-gnu

OK for trunk?

James

---
2016-09-06  James Greenhalgh  

* config/aarch64/aarch64.md (sihf2): Convert to expand.
(dihf2): Likewise.
(aarch64_fp16_hf2): New.

2016-09-06  James Greenhalgh  

* gcc.target/aarch64/floatdihf2_1.c: New.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6afaf90..1882a72 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4630,7 +4630,14 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
-(define_insn "hf2"
+;; If we do not have ARMv8.2-A 16-bit floating point extensions, the
+;; midend will arrange for an SImode conversion to HFmode to first go
+;; through DFmode, then to HFmode.  But first it will try converting
+;; to DImode then down, which would match our DImode pattern below and
+;; give very poor code-generation.  So, we must provide our own emulation
+;; of the mid-end logic.
+
+(define_insn "aarch64_fp16_hf2"
   [(set (match_operand:HF 0 "register_operand" "=w")
 	(FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
   "TARGET_FP_F16INST"
@@ -4638,6 +4645,53 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
+(define_expand "sihf2"
+  [(set (match_operand:HF 0 "register_operand")
+	(FLOATUORS:HF (match_operand:SI 1 "register_operand")))]
+  "TARGET_FLOAT"
+{
+  if (TARGET_FP_F16INST)
+emit_insn (gen_aarch64_fp16_sihf2 (operands[0], operands[1]));
+  else
+{
+  rtx convert_target = gen_reg_rtx (DFmode);
+  emit_insn (gen_sidf2 (convert_target, operands[1]));
+  emit_insn (gen_truncdfhf2 (operands[0], convert_target));
+}
+  DONE;
+}
+)
+
+;; For DImode there is no wide enough floating-point mode that we
+;; can convert through natively (TFmode would work, but requires a library
+;; call).  However, we know that any value >= 65504 will be rounded
+;; to infinity on conversion.  This is well within the range of SImode, so
+;; we can:
+;;   Saturate to SImode.
+;;   Convert from that to DFmode
+;;   Convert from that to HFmode (phew!).
+;; Note that the saturation to SImode requires the SIMD extensions.  If
+;; we ever need to provide this pattern where the SIMD extensions are not
+;; available, we would need a different approach.
+
+(define_expand "dihf2"
+  [(set (match_operand:HF 0 "register_operand")
+	(FLOATUORS:HF (match_operand:DI 1 "register_operand")))]
+  "TARGET_FLOAT && (TARGET_FP_F16INST || TARGET_SIMD)"
+{
+  if (TARGET_FP_F16INST)
+emit_insn (gen_aarch64_fp16_dihf2 (operands[0], operands[1]));
+  else
+{
+  rtx sat_target = gen_reg_rtx (SImode);
+  emit_insn (gen_aarch64_qmovndi (sat_target, operands[1]));
+  emit_insn (gen_sihf2 (operands[0], sat_target));
+}
+
+  DONE;
+}
+)
+
 ;; Convert between fixed-point and floating-point (scalar modes)
 
 (define_insn "3"
diff --git a/gcc/testsuite/gcc.target/aarch64/floatdihf2_1.c b/gcc/testsuite/gcc.target/aarch64/floatdihf2_1.c
new file mode 100644
index 000..9eaa4ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/floatdihf2_1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Test that conversion from 32-bit and 64-bit integers can be done
+   without a call to the support library.  */
+
+#pragma GCC target ("arch=armv8.2-a+nofp16")
+
+__fp16
+foo (int x)
+{
+  return x;
+}
+
+__fp16
+bar (unsigned int x)
+{
+  return x;
+}
+
+__fp16
+fool (long long x)
+{
+  return x;
+}
+
+__fp16
+barl (unsigned long long x)
+{
+  return x;
+}
+
+
+/* { dg-final { scan-assembler-not "__float\\\[ds\\\]ihf2" } } */
+/* { dg-final { scan-assembler-not "__floatun\\\[ds\\\]ihf2" } } */


[DOC] Boolean -> boolean changes

2016-09-06 Thread Marek Polacek
As discussed ,
this patch makes the Boolean -> boolean change in our doc/ directory.

Ok for trunk?

2016-09-06  Marek Polacek  

* doc/extend.texi: Use lowercase "boolean".
* doc/invoke.texi: Likewise.
* doc/md.texi: Likewise.
* doc/tm.texi: Likewise.

diff --git gcc/doc/extend.texi gcc/doc/extend.texi
index aba3b90..8df9d62 100644
--- gcc/doc/extend.texi
+++ gcc/doc/extend.texi
@@ -9486,7 +9486,7 @@ the pointer points.
 @end smallexample
 
 The object pointed to by the first argument must be of integer or pointer
-type.  It must not be a Boolean type.
+type.  It must not be a boolean type.
 
 @emph{Note:} GCC 4.4 and later implement @code{__sync_fetch_and_nand}
 as @code{*ptr = ~(tmp & value)} instead of @code{*ptr = ~tmp & value}.
@@ -9772,7 +9772,7 @@ they are not scaled by the size of the type to which the 
pointer points.
 @end smallexample
 
 The object pointed to by the first argument must be of integer or pointer
-type.  It must not be a Boolean type.  All memory orders are valid.
+type.  It must not be a boolean type.  All memory orders are valid.
 
 @end deftypefn
 
@@ -9896,7 +9896,7 @@ behavior for all argument values.
 
 The first built-in function allows arbitrary integral types for operands and
 the result type must be pointer to some integral type other than enumerated or
-Boolean type, the rest of the built-in functions have explicit integer types.
+boolean type, the rest of the built-in functions have explicit integer types.
 
 The compiler will attempt to use hardware instructions to implement
 these built-in functions where possible, like conditional jump on overflow
@@ -9942,7 +9942,7 @@ These built-in functions are similar to 
@code{__builtin_add_overflow},
 @code{__builtin_sub_overflow}, or @code{__builtin_mul_overflow}, except that
 they don't store the result of the arithmetic operation anywhere and the
 last argument is not a pointer, but some expression with integral type other
-than enumerated or Boolean type.
+than enumerated or boolean type.
 
 The built-in functions promote the first two operands into infinite precision 
signed type
 and perform addition on those promoted operands. The result is then
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 20be9b7..2342ce0 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -5476,7 +5476,7 @@ if (a < 0 && a < 0) @{ @dots{} @}
 @opindex Wlogical-not-parentheses
 @opindex Wno-logical-not-parentheses
 Warn about logical not used on the left hand side operand of a comparison.
-This option does not warn if the right operand is considered to be a Boolean
+This option does not warn if the right operand is considered to be a boolean
 expression.  Its purpose is to detect suspicious code like the following:
 @smallexample
 int a;
diff --git gcc/doc/md.texi gcc/doc/md.texi
index 996b164..c1015f0 100644
--- gcc/doc/md.texi
+++ gcc/doc/md.texi
@@ -700,7 +700,7 @@ predicates used with @code{match_operand} have names that 
end in
 @samp{_operand}, and those used with @code{match_operator} have names
 that end in @samp{_operator}.
 
-All predicates are Boolean functions (in the mathematical sense) of
+All predicates are boolean functions (in the mathematical sense) of
 two arguments: the RTL expression that is being considered at that
 position in the instruction pattern, and the machine mode that the
 @code{match_operand} or @code{match_operator} specifies.  In this
@@ -4274,7 +4274,7 @@ constraint.  Docstrings are explained further below.
 @end deffn
 
 Non-register constraints are more like predicates: the constraint
-definition gives a Boolean expression which indicates whether the
+definition gives a boolean expression which indicates whether the
 constraint matches.
 
 @deffn {MD Expression} define_constraint name docstring exp
diff --git gcc/doc/tm.texi gcc/doc/tm.texi
index 5866260..dc92a79 100644
--- gcc/doc/tm.texi
+++ gcc/doc/tm.texi
@@ -1891,7 +1891,7 @@ This hook may conditionally modify five variables
 @code{fixed_regs}, @code{call_used_regs}, @code{global_regs},
 @code{reg_names}, and @code{reg_class_contents}, to take into account
 any dependence of these register sets on target flags.  The first three
-of these are of type @code{char []} (interpreted as Boolean vectors).
+of these are of type @code{char []} (interpreted as boolean vectors).
 @code{global_regs} is a @code{const char *[]}, and
 @code{reg_class_contents} is a @code{HARD_REG_SET}.  Before the macro is
 called, @code{fixed_regs}, @code{call_used_regs},

Marek


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Florian Weimer

On 09/05/2016 07:06 PM, Joseph Myers wrote:


Such an increase is of course an ABI change, but a reasonably safe
one, in that max_align_t doesn't tend to appear in library interfaces
(rather, it's something to use when writing allocators and similar
code; most matches found on codesearch.debian.net look like copies of
the gnulib stddef.h module rather than anything actually using
max_align_t at all).


I've thought some more about this.  I'll try to rephrase my reservations 
in a less diplomatic way, hopefully making my thoughts clearer.


I think this change is only safe because nothing relies on it. 
max_align_t is a committee invention with no actual users.  As such, you 
can change it in any way you see fit and it won't make a difference in 
any way.


Why aren't there any users?  The standard isn't very clear what the 
value of _Alignof (max_align_t) actually means.  Does the phrase “all 
contexts” include pointers returned malloc?  Even if the requested size 
is smaller than the fundamental alignment?  Did those who wrote the 
standard expect there to be *any* relationship between malloc and 
max_align_t?


The existing situation is that most mallocs to do not provide _Alignof 
(max_align_t) alignment unconditionally.  But they can provide 
arbitrarily large alignment with aligned_alloc/memalign-style interfaces.


For object alignment declarations, I think GCC can satisfy most 
requests, perhaps subject to binutils/dynamic linker limitations for 
global variables on some targets.


The question then is, how can we make max_align_t more useful?  If it is 
supposed to reflect a malloc property, it cannot be a compile-time 
constant because we don't know at compile time which malloc is going to 
be used (malloc has to remain interposable).  If there is no 
relationship to malloc, GCC essentially supports unbounded alignment on 
many targets.  How is it possible to have a meaningful definition of 
max_align_t under such circumstances?


Florian


Re: [PATCH 3/4][PR 71931] Fix libitm tests

2016-09-06 Thread Torvald Riegel
On Wed, 2016-08-24 at 20:08 +0100, Szabolcs Nagy wrote:
> Pass build time CC make var down to dejagnu so the sysroot
> is set correctly when gcc is built with --with-build-sysroot.
> 
> libitm/
> 2016-08-24  Szabolcs Nagy  
> 
>   PR testsuite/71931
>   * configure.ac: Add AC_CONFIG_FILES.
>   * configure: Regenerated.
>   * testuite/Makefile.am: Add rule for libitm-test-support.exp.
>   * testuite/Makefile.in: Regenerated.
>   * testuite/libitm-test-support.exp.in: New.
>   * testuite/lib/libitm.exp (libitm_init): Use BUILD_CC.
> 

I don't know enough about the build system to really review this.  If a
similar patch has been ACKed and applied for libatomic (71931 states
that both are affected), then I guess this is OK?



Re: [PATCH] Delete GCJ

2016-09-06 Thread Jakub Jelinek
On Tue, Sep 06, 2016 at 11:06:36AM +0200, Richard Biener wrote:
> On Mon, Sep 5, 2016 at 6:17 PM, Andrew Haley  wrote:
> > On 05/09/16 17:15, Richard Biener wrote:
> >> On September 5, 2016 5:13:06 PM GMT+02:00, Andrew Haley  
> >> wrote:
> >>> As discussed.  I think I should ask a Global reviewer to approve this
> >>> one.  For obvious reasons I haven't included the diffs to the deleted
> >>> gcc/java and libjava directories.  The whole tree, post GCJ-deletion,
> >>> is at svn+ssh://gcc.gnu.org/svn/gcc/branches/gcj/gcj-deletion-branch
> >>> if anyone would like to try it.
> >>
> >> Isn't there also java specific C++ frontend parts?
> >
> > There certainly are, but deleting them without breaking anything else
> > is going to be rather delicate.  I'm trying to do this one step at a
> > time, rather cautiously.
> 
> Ok, that sounds reasonable.
> 
> You have my approval for this first part then.  Please wait until after the
> GNU Cauldron to allow other global reviewers to object.

No objection from me.

Jakub


Re: [RFC][SSA] Iterator to visit SSA

2016-09-06 Thread Richard Biener
On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 5 September 2016 at 17:57, Richard Biener  
> wrote:
>> On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi All,
>>>
>>> While looking at gcc source, I noticed that we are iterating over SSA
>>> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
>>> in others. It seems that variable 0 is always NULL TREE.
>>
>> Yeah, that's artificial because we don't assign SSA name version 0 (for some
>> unknown reason).
>>
>>> But it can
>>> confuse people who are looking for the first time. Therefore It might
>>> be good to follow some consistent usage here.
>>>
>>> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
>>> other case. Here is attempt to do this based on what is done in other
>>> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
>>> new regressions. is this OK?
>>
>> First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
>> as SSAVAR might be confused with iterating over SSA_NAME_VAR.
>>
>> Then, if you add an iterator why leave the name == NULL handling to 
>> consumers?
>> That looks odd.
>>
>> Then, SSA names are just in a vector thus why not re-use the vector iterator?
>>
>> That is,
>>
>> #define FOR_EACH_SSA_NAME (name) \
>>   for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i)
>>
>> would be equivalent to your patch?
>>
>> Please also don't add new iterators that implicitely use 'cfun' but always 
>> use
>> one that passes it as explicit arg.
>
> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
> we will not be able to skill NULL ssa_names with that.

Why?  Can't you simply do

  #define FOR_EACH_SSA_NAME (name) \
for (unsigned i = 1; SSANAMES (cfun).iterate (i, &name); ++i) \
   if (name)

?

> I also added
> index variable to the macro so that there want be any conflicts if the
> index variable "i" (or whatever) is also defined in the loop.
>
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions. Is this OK for trunk?
>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-09-06  Kugan Vivekanandarajah  
>
> * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
> * cfgexpand.c (update_alias_info_with_stack_vars): Use
> FOR_EACH_SSA_NAME to iterate over SSA variables.
> (pass_expand::execute): Likewise.
> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
> * tree-cfg.c (dump_function_to_file): Likewise.
> * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
> (update_ssa): Likewise.
> * tree-ssa-alias.c (dump_alias_info): Likewise.
> * tree-ssa-ccp.c (ccp_finalize): Likewise.
> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
> (create_outofssa_var_map): Likewise.
> (coalesce_ssa_name): Likewise.
> * tree-ssa-operands.c (dump_immediate_uses): Likewise.
> * tree-ssa-pre.c (compute_avail): Likewise.
> * tree-ssa-sccvn.c (init_scc_vn): Likewise.
> (scc_vn_restore_ssa_info): Likewise.
> (free_scc_vn): Likwise.
> (run_scc_vn): Likewise.
> * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
> * tree-ssa-ter.c (new_temp_expr_table): Likewise.
> * tree-ssa-copy.c (fini_copy_prop): Likewise.
> * tree-ssa.c (verify_ssa): Likewise.
>
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-09-05  Kugan Vivekanandarajah  
>>>
>>> * tree-ssanames.h (ssa_iterator::ssa_iterator): New.
>>> (ssa_iterator::get): Likewise.
>>> (ssa_iterator::next): Likewise.
>>> (FOR_EACH_SSAVAR): Likewise.
>>> * cfgexpand.c (update_alias_info_with_stack_vars): Use
>>> FOR_EACH_SSAVAR to iterate over SSA variables.
>>> (pass_expand::execute): Likewise.
>>> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>>> * tree-cfg.c (dump_function_to_file): Likewise.
>>> * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>>> (update_ssa): Likewise.
>>> * tree-ssa-alias.c (dump_alias_info): Likewise.
>>> * tree-ssa-ccp.c (ccp_finalize): Likewise.
>>> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
>>> (create_outofssa_var_map): Likewise.
>>> (coalesce_ssa_name): Likewise.
>>> * tree-ssa-operands.c (dump_immediate_uses): Likewise.
>>> * tree-ssa-pre.c (compute_avail): Likewise.
>>> * tree-ssa-sccvn.c (init_scc_vn): Likewise.
>>> (scc_vn_restore_ssa_info): Likewise.
>>> (free_scc_vn): Likwise.
>>> (run_scc_vn): Likewise.
>>> * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
>>> * tree-ssa-ter.c (new_temp_expr_table): Likewise.


Re: C/C++ PATCH for c/77423 (bogus warning with -Wlogical-not-parentheses)

2016-09-06 Thread Marek Polacek
On Mon, Sep 05, 2016 at 10:35:26AM -0600, Sandra Loosemore wrote:
> On 09/05/2016 09:55 AM, Bernd Schmidt wrote:
> > 
> > 
> > On 09/05/2016 12:52 PM, Marek Polacek wrote:
> > > On Mon, Sep 05, 2016 at 12:35:13PM +0200, Bernd Schmidt wrote:
> > > > On 09/02/2016 05:13 PM, Marek Polacek wrote:
> > > > > diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> > > > > index 87da1f1..38d55d4 100644
> > > > > --- gcc/doc/invoke.texi
> > > > > +++ gcc/doc/invoke.texi
> > > > > @@ -5437,8 +5437,8 @@ if (a < 0 && a < 0) @{ @dots{} @}
> > > > >  @opindex Wlogical-not-parentheses
> > > > >  @opindex Wno-logical-not-parentheses
> > > > >  Warn about logical not used on the left hand side operand of a
> > > > > comparison.
> > > > > -This option does not warn if the RHS operand is of a boolean type.
> > > > > Its
> > > > > -purpose is to detect suspicious code like the following:
> > > > > +This option does not warn if the right operand is considered to be
> > > > > a Boolean
> > > > > +expression.  Its purpose is to detect suspicious code like the
> > > > > following:
> > > > 
> > > > I think "Boolean" shouldn't be capitalized. The patch looks ok to me
> > > > otherwise.
> > > 
> > > No strong opinions, but looking at
> > > https://en.wikipedia.org/wiki/Boolean_expression
> > > I see that it's capitalized there, so I think let's keep "Boolean".
> > 
> > I looked at other occurrences in this file, and they are lower-case.
> > Across the other texi files upper-case is also the exception. Let's ask
> > Sandra for a ruling?
> 
> U.  There seems to be precedent for both capitalizing it and not.
> 
> The C standard doesn't capitalize it (but the only place I could find where
> "boolean" used in a context where it wouldn't be capitalized anyway, such as
> the first word in a section title, is the index).  The C++ standard isn't
> consistent, using both "Boolean" and "boolean" (and sometimes even on the
> same page).  The documents I have handy are older drafts, though; maybe that
> has been cleaned up in current/final versions.
 
Thanks for the research!

> Looking at some other languages, the Python documentation capitalizes:
> 
> https://docs.python.org/3/library/stdtypes.html
> 
> But the Scheme specification editors deliberately chose not to do so:
> 
> http://www.r6rs.org/r6rs-editors/2007-April/002143.html
> 
> The Haskell specification also uses lowercase:
> 
> https://www.haskell.org/onlinereport/haskell2010/haskellch6.html#x13-1170006.1
> 
> Mark-Jason Dominus gave the matter some serious thought:
> 
> http://blog.plover.com/lang/Boolean.html
 
Ok, I buy that.

> Anyway, I will be happy either way as long as the documentation is
> consistent.

Let's go with lowercase then.  I'll post a cleanup patch.

Marek


Re: [PATCH] Delete GCJ

2016-09-06 Thread Richard Biener
On Mon, Sep 5, 2016 at 6:17 PM, Andrew Haley  wrote:
> On 05/09/16 17:15, Richard Biener wrote:
>> On September 5, 2016 5:13:06 PM GMT+02:00, Andrew Haley  
>> wrote:
>>> As discussed.  I think I should ask a Global reviewer to approve this
>>> one.  For obvious reasons I haven't included the diffs to the deleted
>>> gcc/java and libjava directories.  The whole tree, post GCJ-deletion,
>>> is at svn+ssh://gcc.gnu.org/svn/gcc/branches/gcj/gcj-deletion-branch
>>> if anyone would like to try it.
>>
>> Isn't there also java specific C++ frontend parts?
>
> There certainly are, but deleting them without breaking anything else
> is going to be rather delicate.  I'm trying to do this one step at a
> time, rather cautiously.

Ok, that sounds reasonable.

You have my approval for this first part then.  Please wait until after the
GNU Cauldron to allow other global reviewers to object.

Thanks,
Richard.

> Andrew.


Re: Make max_align_t respect _Float128 [version 2]

2016-09-06 Thread Richard Biener
On Mon, Sep 5, 2016 at 7:06 PM, Joseph Myers  wrote:
> [Patch version 2 adds an update to cxx_fundamental_alignment_p.]
>
> The _FloatN, _FloatNx, _DecimalN and _DecimalNx types are specified in
> such a way that they are basic types, meaning that max_align_t must be
> at least as aligned as those types.
>
> On 32-bit x86, max_align_t is currently 8-byte aligned, but
> _Decimal128 and _Float128 are 16-byte aligned, so the alignment of
> max_align_t needs to increase to meet the standard requirements for
> these types.

As doubles on 32bit x86 are 4 byte aligned I suppose the choice could have
been to make those types 8 byte aligned to avoid changes of max_align_t?
(with the obvious eventual performance penalty of needing to do unaligned
loads/stores to xmm registers).

Richard.

> This patch implements such an increase.  Because max_align_t needs to
> be usable for C++ as well as for C,  can't actually refer to
> _Float128, but needs to use __float128 (or some other notation for the
> type) instead.  And since __float128 is architecture-specific, there
> isn't a preprocessor conditional that means "__float128 is available"
> (whereas one could test __FLT128_MANT_DIG__ to see if _Float128 is
> available; __SIZEOF_FLOAT128__ is available on x86 only).  But I
> believe the only case that actually has an alignment problem here is
> 32-bit x86, and  already has lots of conditional specific to
> particular architectures or OSes, so this patch uses a conditional on
> __i386__; that also is the minimal change that ensures neither size
> nor alignment of max_align_t is changed in any case other than where
> it is necessary.  If any other architectures turn out to have such an
> issue, it will show up as failures of one of the testcases added by
> this patch.
>
> Such an increase is of course an ABI change, but a reasonably safe
> one, in that max_align_t doesn't tend to appear in library interfaces
> (rather, it's something to use when writing allocators and similar
> code; most matches found on codesearch.debian.net look like copies of
> the gnulib stddef.h module rather than anything actually using
> max_align_t at all).
>
> cxx_fundamental_alignment_p has a corresponding change (adding
> _Float128 to the types considered).
>
> (I think glibc malloc alignment should also increase to 16-byte on
> 32-bit x86 so it works for allocating objects of these types, which is
> now straightforward given the fix made for 32-bit powerpc.)
>
> Bootstrapped with no regressions on x86_64-pc-linux-gnu, and
> spot-tested with -m32 that the new float128-align.c test now compiles
> where previously it didn't.  OK to commit?
>
> gcc:
> 2016-09-05  Joseph Myers  
>
> * ginclude/stddef.h (max_align_t) [__i386__]: Add __float128
> element.
>
> gcc/c-family:
> 2016-09-05  Joseph Myers  
>
> * c-common.c (cxx_fundamental_alignment_p): Also consider
> alignment of float128_type_node.
>
> gcc/testsuite:
> 2016-09-05  Joseph Myers  
>
> * gcc.dg/float128-align.c, gcc.dg/float128x-align.c,
> gcc.dg/float16-align.c, gcc.dg/float32-align.c,
> gcc.dg/float32x-align.c, gcc.dg/float64-align.c,
> gcc.dg/float64x-align.c, gcc.dg/floatn-align.h: New tests.
>
> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c (revision 239987)
> +++ gcc/c-family/c-common.c (working copy)
> @@ -12855,8 +12855,11 @@ scalar_to_vector (location_t loc, enum tree_code c
>  bool
>  cxx_fundamental_alignment_p  (unsigned align)
>  {
> -  return (align <=  MAX (TYPE_ALIGN (long_long_integer_type_node),
> -TYPE_ALIGN (long_double_type_node)));
> +  unsigned int max_align = MAX (TYPE_ALIGN (long_long_integer_type_node),
> +   TYPE_ALIGN (long_double_type_node));
> +  if (float128_type_node != NULL_TREE)
> +max_align = MAX (max_align, TYPE_ALIGN (float128_type_node));
> +  return align <= max_align;
>  }
>
>  /* Return true if T is a pointer to a zero-sized aggregate.  */
> Index: gcc/ginclude/stddef.h
> ===
> --- gcc/ginclude/stddef.h   (revision 239987)
> +++ gcc/ginclude/stddef.h   (working copy)
> @@ -426,6 +426,14 @@ typedef __WINT_TYPE__ wint_t;
>  typedef struct {
>long long __max_align_ll __attribute__((__aligned__(__alignof__(long 
> long;
>long double __max_align_ld __attribute__((__aligned__(__alignof__(long 
> double;
> +  /* _Float128 is defined as a basic type, so max_align_t must be
> + sufficiently aligned for it.  This code must work in C++, so we
> + use __float128 here; that is only available on some
> + architectures, but only on i386 is extra alignment needed for
> + __float128.  */
> +#ifdef __i386__
> +  __float128 __max_align_f128 
> __attribute__((__aligned__(__alignof(__float128;
> +#endif
>  } max_align_t;
>  #endif
>  #endif /* C11 or C++11

Re: Implement C _FloatN, _FloatNx types [version 6]

2016-09-06 Thread Richard Biener
On Mon, Sep 5, 2016 at 1:40 PM, Joseph Myers  wrote:
> On Sat, 3 Sep 2016, Andreas Schwab wrote:
>
>> On Aug 17 2016, Joseph Myers  wrote:
>>
>> > Index: gcc/testsuite/gcc.dg/torture/float32-basic.c
>> > ===
>> > --- gcc/testsuite/gcc.dg/torture/float32-basic.c(nonexistent)
>> > +++ gcc/testsuite/gcc.dg/torture/float32-basic.c(working copy)
>> > @@ -0,0 +1,9 @@
>> > +/* Test _Float32.  */
>> > +/* { dg-do run } */
>> > +/* { dg-options "" } */
>> > +/* { dg-add-options float32 } */
>> > +/* { dg-require-effective-target float32_runtime } */
>> > +
>> > +#define WIDTH 32
>> > +#define EXT 0
>> > +#include "floatn-basic.h"
>>
>> This fails on powerpc32, in vafn.
>
> That seems like the alpha issue - an ABI needs to be defined and
> implemented.

Note that this makes it maybe a good idea to only enable _FloatXX support
for targets that have explicitely done so once we near the GCC 7 release.
Otherwise we risk a later ABI change if the ABI differs from GCCs implementation
from the time there was no ABI defined for those types.

Richard.

>  Unfortunately the powerabi mailing list, which would have
> been the right place for ABI coordination between implementors, died
> several years ago.  (The 32-bit ABI source code is available at
> .)
>
> If we suppose that _Float32 values should be passed in FPRs in the same
> circumstances in which float values are passed in FPRs, then
> rs6000_gimplify_va_arg would, in the case of an SFmode value coming from
> saved FPRs, need to extract a double value from the saved FPRs and then
> convert to float.  (There would also be the question of later _Float32
> arguments passed on the stack in variable arguments; if those are handled
> like prototyped arguments, they would go in 4-byte stack slots, and
> probably the compiler already does that.)  Of course other ABI choices are
> possible.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: Ping for a gfortran testsuite patch

2016-09-06 Thread Paul Richard Thomas
Dear Uros,

Yes, that's fine. OK for trunk, 6- and 5-branches.

Thanks

Paul

On 5 September 2016 at 22:42, Uros Bizjak  wrote:
> Hello Paul!
>
> I would like to point out my testsuite patch [1] to one of gfortran
> testsuite file, authored by you. The patch corrects fortran call to a
> prototyped c function.
>
> I would be very grateful, if you can spare a few moments to review and
> perhaps approve the mentioned patch, as it fixes a testsuite failure
> for my target.
>
> [1] https://gcc.gnu.org/ml/fortran/2016-09/msg5.html
>
> Best regards,
> Uros.



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


  1   2   >