date:20170905

[PATCH] Fix PR82102

2017-09-05 Thread Richard Biener


The following fixes PR82102, GIMPLE_NOPs don't have a lhs so we
have to care for that case.

Tested on x86_64-unknown-linux-gnu, applied as obvious.

Richard.

2017-09-05  Richard Biener  

PR tree-optimization/82102
* tree-ssa-pre.c (fini_eliminate): Check if lhs is NULL.

* gcc.dg/torture/pr82102.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 251689)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4860,6 +4860,7 @@ fini_eliminate (void)
lhs = gimple_get_lhs (stmt);
 
   if (inserted_exprs
+ && lhs
  && TREE_CODE (lhs) == SSA_NAME
  && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (lhs)))
continue;
Index: gcc/testsuite/gcc.dg/torture/pr82102.c
===
--- gcc/testsuite/gcc.dg/torture/pr82102.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr82102.c  (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+
+void *a, *b;
+struct pt3_i2cbuf {
+int num_cmds;
+} c;
+void *memcpy(void *, void *, __SIZE_TYPE__);
+void put_stop();
+void translate(struct pt3_i2cbuf *p1, int p2)
+{
+  p1->num_cmds = 0;
+  if (p2)
+put_stop();
+}
+void pt3_i2c_master_xfer(int p1)
+{
+  translate(&c, p1);
+  memcpy(a, b, c.num_cmds);
+  for (; p1;)
+;
+}

[Ada] Fix ICE on multi-dimensional array

2017-09-05 Thread Eric Botcazou

This is a regression present on the mainline, 7 and 6 branches, in the form of 
an ICE during tree-sra, which is confused by an unconstrained array type.

Tested on x86_64-suse-linux, applied on mainline, 7 and 6 branches.


2017-09-05  Eric Botcazou  

* gcc-interface/trans.c (pos_to_constructor): Skip conversions to an
unconstrained array type.


2017-09-05  Eric Botcazou  

* testsuite/gnat.dg/array29.ad[sb]: New test.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 251553)
+++ gcc-interface/trans.c	(working copy)
@@ -9826,7 +9826,14 @@ pos_to_constructor (Node_Id gnat_expr, t
    gnat_component_type);
   else
 	{
-	  gnu_expr = gnat_to_gnu (gnat_expr);
+	  /* If the expression is a conversion to an unconstrained array type,
+	 skip it to avoid spilling to memory.  */
+	  if (Nkind (gnat_expr) == N_Type_Conversion
+	  && Is_Array_Type (Etype (gnat_expr))
+	  && !Is_Constrained (Etype (gnat_expr)))
+	gnu_expr = gnat_to_gnu (Expression (gnat_expr));
+	  else
+	gnu_expr = gnat_to_gnu (gnat_expr);
 
 	  /* Before assigning the element to the array, make sure it is
 	 in range.  */
-- { dg-do compile }
-- { dg-options "-O" }

package body Array29 is

  procedure Copy (Src : in Matrix; Dst : out Matrix) is
  begin
for I in Src'Range (1) loop
  for J in Src'Range (2) loop
Dst (I, J) := Src (I, J);
  end loop;
end loop;
  end;

  procedure Proc is
N : constant := 2;
FM1 : constant Matrix (1 .. N, 1 .. N) := ((1.0, 2.0), (3.0, 4.0));
FM2 : constant Matrix (1 .. N, 1 .. N) := ((1.0, 2.0), (3.0, 4.0));
A : constant array (1 .. 2) of Matrix (1 .. N, 1 .. N)
  := (Matrix (FM1), Matrix (FM2));
Final : Matrix (1 .. N, 1 .. N);
  begin
Copy (Src => A (1), Dst => Final);
  end;

end Array29;
package Array29 is

  type Matrix is array (Integer range <>, Integer range <>) of Long_Float;

  procedure Proc;

end Array29;

Re: [PATCH] Handle wide-chars in native_encode_string

2017-09-05 Thread Richard Biener

On Mon, 4 Sep 2017, Joseph Myers wrote:

> On Mon, 4 Sep 2017, Richard Biener wrote:
> 
> > always have a consistend "character" size and how the individual
> > "characters" are encoded.  The patch assumes that the array element
> > type of the STRING_CST can be used to get access to individual
> > characters by means of the element type size and those elements
> > are stored in host byteorder.  Which means the patch simply handles
> 
> It's actually target byte order, i.e. the STRING_CST stores the same 
> sequence of target bytes as would appear on the target system (modulo 
> certain strings such as in asm statements and attributes, for which 
> translation to the execution character set is disabled because those 
> strings are only processed in the compiler on the host, not on the target 
> - but you should never encounter such strings in the optimizers etc.).  
> This is documented in generic.texi (complete with a warning about how it's 
> not well-defined what the encoding is if target bytes are not the same as 
> host bytes).

Ah thanks.

> I suspect that, generically in the compiler, the use of C++ might make it 
> easier than it would have been some time ago to build some abstractions 
> around target strings that work for all of narrow strings, wide strings, 
> char16_t strings etc. (for extracting individual elements - or individual 
> characters which might be multibyte characters in the narrow string case, 
> etc.) - as would be useful for e.g. wide string format checking and more 
> generally for making e.g. optimizations for narrow strings also work for 
> wide strings.  (Such abstractions wouldn't solve the question of what the 
> format is if host and target bytes differ, but their use would reduce the 
> number of places needing changing to establish a definition of the format 
> in that case if someone were to do a port to a system with bytes bigger 
> than 8 bits.)
> 
> However, as I understand the place you're patching, it doesn't have any 
> use for such an abstraction; it just needs to copy a sequence of bytes 
> from one place to another.  (And even with host bytes different from 
> target bytes, clearly it would make sense to define the internal 
> interfaces to make the encodings consistent so this function still only 
> needs to copy bytes from one place to another and still doesn't need such 
> abstractions.)

Right.  Given they are in target representation the patch becomes much
simpler and we can handle all STRING_CSTs modulo for the case where
BITS_PER_UNIT != CHAR_BIT (as you say).  I suppose we can easily
declare we'll never support a CHAR_BIT != 8 host and we currently
don't have any BITS_PER_UNIT != 8 port (we had c4x).  I'm not
sure what constraints we have on CHAR_TYPE_SIZE vs. BITS_PER_UNIT,
or for what port it would make sense to have differing values.
Or what it means for native encoding (should the BITS_PER_UNIT != CHAR_BIT
test be CHAR_TYPE_SIZE != CHAR_BIT instead?).  BITS_PER_UNIT is
also only documented in rtl.texi rather than in tm.texi.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-09-05  Richard Biener  

PR tree-optimization/82084
* fold-const.c (can_native_encode_string_p): Handle wide characters.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 251661)
+++ gcc/fold-const.c(working copy)
@@ -7489,10 +7489,11 @@ can_native_encode_string_p (const_tree e
 {
   tree type = TREE_TYPE (expr);
 
-  if (TREE_CODE (type) != ARRAY_TYPE
+  /* Wide-char strings are encoded in target byte-order so native
+ encoding them is trivial.  */
+  if (BITS_PER_UNIT != CHAR_BIT
+  || TREE_CODE (type) != ARRAY_TYPE
   || TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE
-  || (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (type)))
- != BITS_PER_UNIT)
   || !tree_fits_shwi_p (TYPE_SIZE_UNIT (type)))
 return false;
   return true;

[Ada] Do not generate useless temporary for allocator

2017-09-05 Thread Eric Botcazou

This is a regression present on the mainline, 7 and 6 branches: the compiler 
generates an useless temporary for an allocator.

Tested on x86_64-suse-linux, applied on mainline, 7 and 6 branches.


2017-09-05  Eric Botcazou  

* gcc-interface/trans.c (Call_to_gnu): If this is a function call and
there is no target, do not create a temporary for the return value for
an allocator either.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 251691)
+++ gcc-interface/trans.c	(working copy)
@@ -4338,11 +4338,11 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
 	  parameters.
 
2. There is no target and the call is made for neither an object nor a
-	  renaming declaration, nor a return statement, and the return type has
-	  variable size, because in this case the gimplifier cannot create the
-	  temporary, or more generally is simply an aggregate type, because the
-	  gimplifier would create the temporary in the outermost scope instead
-	  of locally.
+	  renaming declaration, nor a return statement, nor an allocator, and
+	  the return type has variable size because in this case the gimplifier
+	  cannot create the temporary, or more generally is simply an aggregate
+	  type, because the gimplifier would then create the temporary in the
+	  outermost scope instead of locally.
 
3. There is a target and it is a slice or an array with fixed size,
 	  and the return type has variable size, because the gimplifier
@@ -4361,6 +4361,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
 	  && Nkind (Parent (gnat_node)) != N_Object_Declaration
 	  && Nkind (Parent (gnat_node)) != N_Object_Renaming_Declaration
 	  && Nkind (Parent (gnat_node)) != N_Simple_Return_Statement
+	  && !(Nkind (Parent (gnat_node)) == N_Qualified_Expression
+		   && Nkind (Parent (Parent (gnat_node))) == N_Allocator)
 	  && AGGREGATE_TYPE_P (gnu_result_type)
 	  && !TYPE_IS_FAT_POINTER_P (gnu_result_type))
 	  || (gnu_target

[Ada] Enhance -gnatR3 output for simple dynamic record types

2017-09-05 Thread Eric Botcazou

This changes the terse -gnatR3 output for simple dynamic record types like:

package Q is

  type My_Index is range 1 .. 1024 * 1024;
  type Arr is array (My_Index range <>) of Short_Integer;

  function N return My_Index;

end Q;

with Q; use Q;

package P is

  type R is record
I1 : Integer;
S1 : Short_Integer;
A1 : Arr (1 .. Q.N);
  end record;

end P;

from:

Representation information for unit P (spec)


for R'Size use ??;
for R'Alignment use 4;
for R use record
   I1 at 0 range  0 .. 31;
   S1 at 4 range  0 .. 15;
   A1 at 6 range  0 .. ??;
end record;

to:

Representation information for unit P (spec)


for R'Object_Size use  Var1 + 3) * 16) + 31) & -32) ;
for R'Value_Size use  ((Var1 + 3) * 16) ;
for R'Alignment use 4;
for R use record
   I1 at 0 range  0 .. 31;
   S1 at 4 range  0 .. 15;
   A1 at 6 range  0 ..  ((Var1 * 16))  - 1;
end record;

where Var1 is a symbolic representation of the dynamic value.


2017-09-05  Eric Botcazou  

* repinfo.ads: Document new treatment of dynamic values.
(TCode): Bump upper bound to 29.
(Dynamic_Val): New constant set to 29.
* repinfo.adb (Print_Expr) : New case.
(Rep_Value)  : Likewise.
* repinfo.h (Dynamic_Val): New macro.
* gcc-interface/decl.c (annotate_value): Tidy up and cache result for
DECL_P nodes too.
: Set TCODE instead of recursing.
: Set TCODE instead of calling Create_Node manually.
: New case.
: Fold conversions into inner operations.
: Adjust.
: Do not fall through.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 251553)
+++ gcc-interface/decl.c	(working copy)
@@ -8047,13 +8047,13 @@ components_to_record (Node_Id gnat_compo
 static Uint
 annotate_value (tree gnu_size)
 {
+  static int var_count = 0;
   TCode tcode;
-  Node_Ref_Or_Val ops[3], ret, pre_op1 = No_Uint;
+  Node_Ref_Or_Val ops[3] = { No_Uint, No_Uint, No_Uint };
   struct tree_int_map in;
-  int i;
 
   /* See if we've already saved the value for this node.  */
-  if (EXPR_P (gnu_size))
+  if (EXPR_P (gnu_size) || DECL_P (gnu_size))
 {
   struct tree_int_map *e;
 
@@ -8067,9 +8067,7 @@ annotate_value (tree gnu_size)
 in.base.from = NULL_TREE;
 
   /* If we do not return inside this switch, TCODE will be set to the
- code to use for a Create_Node operand and LEN (set above) will be
- the number of recursive calls for us to make.  */
-
+ code to be used in a call to Create_Node.  */
   switch (TREE_CODE (gnu_size))
 {
 case INTEGER_CST:
@@ -8078,38 +8076,51 @@ annotate_value (tree gnu_size)
   if (tree_int_cst_sgn (gnu_size) < 0)
 	{
 	  tree t = wide_int_to_tree (sizetype, wi::neg (gnu_size));
-	  return annotate_value (build1 (NEGATE_EXPR, sizetype, t));
+	  tcode = Negate_Expr;
+	  ops[0] = UI_From_gnu (t);
 	}
-
-  return TREE_OVERFLOW (gnu_size) ? No_Uint : UI_From_gnu (gnu_size);
+  else
+	return TREE_OVERFLOW (gnu_size) ? No_Uint : UI_From_gnu (gnu_size);
+  break;
 
 case COMPONENT_REF:
   /* The only case we handle here is a simple discriminant reference.  */
   if (DECL_DISCRIMINANT_NUMBER (TREE_OPERAND (gnu_size, 1)))
 	{
-	  tree n = DECL_DISCRIMINANT_NUMBER (TREE_OPERAND (gnu_size, 1));
+	  tree ref = gnu_size;
+	  gnu_size = TREE_OPERAND (ref, 1);
 
 	  /* Climb up the chain of successive extensions, if any.  */
-	  while (TREE_CODE (TREE_OPERAND (gnu_size, 0)) == COMPONENT_REF
-		 && DECL_NAME (TREE_OPERAND (TREE_OPERAND (gnu_size, 0), 1))
+	  while (TREE_CODE (TREE_OPERAND (ref, 0)) == COMPONENT_REF
+		 && DECL_NAME (TREE_OPERAND (TREE_OPERAND (ref, 0), 1))
 		== parent_name_id)
-	gnu_size = TREE_OPERAND (gnu_size, 0);
+	ref = TREE_OPERAND (ref, 0);
 
-	  if (TREE_CODE (TREE_OPERAND (gnu_size, 0)) == PLACEHOLDER_EXPR)
-	return
-	  Create_Node (Discrim_Val, annotate_value (n), No_Uint, No_Uint);
+	  if (TREE_CODE (TREE_OPERAND (ref, 0)) == PLACEHOLDER_EXPR)
+	{
+	  /* Fall through to common processing as a FIELD_DECL.  */
+	  tcode = Discrim_Val;
+	  ops[0] = UI_From_gnu (DECL_DISCRIMINANT_NUMBER (gnu_size));
+	}
+	  else
+	return No_Uint;
 	}
+  else
+	return No_Uint;
+  break;
 
-  return No_Uint;
+case VAR_DECL:
+  tcode = Dynamic_Val;
+  ops[0] = UI_From_Int (++var_count);
+  break;
 
-CASE_CONVERT:   case NON_LVALUE_EXPR:
+CASE_CONVERT:
+case NON_LVALUE_EXPR:
   return annotate_value (TREE_OPERAND (gnu_size, 0));
 
   /* Now just list the operations we handle.  */
 case COND_EXPR:		tcode = Cond_Expr; break;
-case PLUS_EXPR:		tcode = Plus_Expr; break;
 case MINUS_EXPR:		tcode = Minus_Expr; break;
-case MULT_EXPR:		tcode = Mult_Expr; break;
 case TRUNC_DIV_EXPR:	tcode = Trunc_Div_Expr; bre

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Christophe Lyon

Hi Bernd,


On 4 September 2017 at 16:52, Kyrill  Tkachov
 wrote:
>
> On 29/04/17 18:45, Bernd Edlinger wrote:
>>
>> Ping...
>>
>> I attached a rebased version since there was a merge conflict in
>> the xordi3 pattern, otherwise the patch is still identical.
>> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
>> early when the target has no neon or iwmmxt.
>>
>>
>> Thanks
>> Bernd.
>>
>>
>>
>> On 11/28/16 20:42, Bernd Edlinger wrote:
>>>
>>> On 11/25/16 12:30, Ramana Radhakrishnan wrote:

 On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
  wrote:
>
> Hi!
>
> This improves the stack usage on the sha512 test case for the case
> without hardware fpu and without iwmmxt by splitting all di-mode
> patterns right while expanding which is similar to what the
> shift-pattern
> does.  It does nothing in the case iwmmxt and fpu=neon or vfp as well
> as
> thumb1.
>
 I would go further and do this in the absence of Neon, the VFP unit
 being there doesn't help with DImode operations i.e. we do not have 64
 bit integer arithmetic instructions without Neon. The main reason why
 we have the DImode patterns split so late is to give a chance for
 folks who want to do 64 bit arithmetic in Neon a chance to make this
 work as well as support some of the 64 bit Neon intrinsics which IIRC
 map down to these instructions. Doing this just for soft-float doesn't
 improve the default case only. I don't usually test iwmmxt and I'm not
 sure who has the ability to do so, thus keeping this restriction for
 iwMMX is fine.


>>> Yes I understand, thanks for pointing that out.
>>>
>>> I was not aware what iwmmxt exists at all, but I noticed that most
>>> 64bit expansions work completely different, and would break if we split
>>> the pattern early.
>>>
>>> I can however only look at the assembler outout for iwmmxt, and make
>>> sure that the stack usage does not get worse.
>>>
>>> Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
>>> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
>>> for the test cases, and vfp and soft-float at around 270 bytes stack
>>> usage.
>>>
> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>
> Note this also splits many ldrd/strd instructions and therefore I will
> post a followup-patch that mitigates this effect by enabling the
> ldrd/strd
> peephole optimization after the necessary reg-testing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.

 What do you mean by arm-linux-gnueabihf - when folks say that I
 interpret it as --with-arch=armv7-a --with-float=hard
 --with-fpu=vfpv3-d16 or (--with-fpu=neon).

 If you've really bootstrapped and regtested it on armhf, doesn't this
 patch as it stand have no effect there i.e. no change ?
 arm-linux-gnueabihf usually means to me someone has configured with
 --with-float=hard, so there are no regressions in the hard float ABI
 case,

>>> I know it proves little.  When I say arm-linux-gnueabihf
>>> I do in fact mean --enable-languages=all,ada,go,obj-c++
>>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
>>> --with-float=hard.
>>>
>>> My main interest in the stack usage is of course not because of linux,
>>> but because of eCos where we have very small task stacks and in fact
>>> no fpu support by the O/S at all, so that patch is exactly what we need.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf
>>> Is it OK for trunk?
>
>
> The code is ok.
> AFAICS testing this with --with-fpu=vfpv3-d16 does exercise the new code as
> the splits
> will happen for !TARGET_NEON (it is of course !TARGET_IWMMXT and
> TARGET_IWMMXT2
> is irrelevant here).
>
> So this is ok for trunk.
> Thanks, and sorry again for the delay.
> Kyrill
>

This patch (r251663) causes a regression on armeb-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu vfpv3-d16-fp16
FAIL:gcc.dg/vect/vect-singleton_1.c (internal compiler error)
FAIL:gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects
(internal compiler error)

the test passes if gcc is configured --with-fpu neon-fp16


Christophe

>>>
>>> Thanks
>>> Bernd.
>
>

Re: [61/77] Use scalar_int_mode in the AArch64 port

2017-09-05 Thread James Greenhalgh

On Thu, Jul 13, 2017 at 10:00:03AM +0100, Richard Sandiford wrote:
> This patch makes the AArch64 port use scalar_int_mode in various places.
> Other ports won't need this kind of change; we only need it for AArch64
> because of the variable-sized SVE modes.
> 
> The only change in functionality is in the rtx_costs handling
> of CONST_INT.  If the caller doesn't supply a mode, we now pass
> word_mode rather than VOIDmode to aarch64_internal_mov_immediate.
> aarch64_movw_imm will therefore not now truncate large constants
> in this situation.

OK.

Thanks,
James


> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_is_extend_from_extract):
>   Take a scalar_int_mode instead of a machine_mode.
>   (aarch64_mask_and_shift_for_ubfiz_p): Likewise.
>   (aarch64_move_imm): Likewise.
>   (aarch64_output_scalar_simd_mov_immediate): Likewise.
>   (aarch64_simd_scalar_immediate_valid_for_move): Likewise.
>   (aarch64_simd_attr_length_rglist): Delete.
>   * config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Take
>   a scalar_int_mode instead of a machine_mode.
>   (aarch64_add_offset): Likewise.
>   (aarch64_internal_mov_immediate): Likewise
>   (aarch64_add_constant_internal): Likewise.
>   (aarch64_add_constant): Likewise.
>   (aarch64_movw_imm): Likewise.
>   (aarch64_move_imm): Likewise.
>   (aarch64_rtx_arith_op_extract_p): Likewise.
>   (aarch64_mask_and_shift_for_ubfiz_p): Likewise.
>   (aarch64_simd_scalar_immediate_valid_for_move): Likewise.
>   Remove assert that the mode isn't a vector.
>   (aarch64_output_scalar_simd_mov_immediate): Likewise.
>   (aarch64_expand_mov_immediate): Update calls after above changes.
>   (aarch64_output_casesi): Use as_a .
>   (aarch64_and_bitmask_imm): Check for scalar integer modes.
>   (aarch64_strip_extend): Likewise.
>   (aarch64_extr_rtx_p): Likewise.
>   (aarch64_rtx_costs): Likewise, using wode_mode as the mode of
>   a CONST_INT when the mode parameter is VOIDmode.
>

Re: [75/77] Use scalar_mode in the AArch64 port

2017-09-05 Thread James Greenhalgh

On Thu, Jul 13, 2017 at 10:04:58AM +0100, Richard Sandiford wrote:
> Similar to the previous scalar_int_mode patch.

OK.

Thanks,
James

> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
>   Take a scalar_mode rather than a machine_mode.
>   (aarch64_operands_adjust_ok_for_ldpstp): Likewise.
>   * config/aarch64/aarch64.c (aarch64_simd_container_mode): Likewise.
>   (aarch64_operands_adjust_ok_for_ldpstp): Likewise.
>   (aarch64_gen_adjusted_ldpstp): Likewise.
>   (aarch64_expand_vector_init): Use scalar_mode instead of machine_mode.
>

[Ada] Fix ICE on Taft-Amendment types

2017-09-05 Thread Eric Botcazou

This is a regression recently introduced on the mainline for Taft-Amendment 
types, when the restriction on inter-unit inlining was lifted.

Tested on x86_64-suse-linux, applied on mainline.


2017-09-05  Eric Botcazou  

* gcc-interface/trans.c (adjust_for_implicit_deref): New function.
(gnat_to_gnu) : Translate result type first.
(N_Indexed_Component): Invoke adjust_for_implicit_deref on the prefix.
(N_Slice): Likewise.
(N_Selected_Component): Likewise.  Do not try again to translate it.
(N_Free_Statement): Invoke adjust_for_implicit_deref on the expression


2017-09-05  Eric Botcazou  

* gnat.dg/taft_type4.adb: New test.
* gnat.dg/taft_type4_pkg.ad[sb]: New helper.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 251695)
+++ gcc-interface/trans.c	(working copy)
@@ -242,6 +242,7 @@ static bool addressable_p (tree, tree);
 static tree assoc_to_constructor (Entity_Id, Node_Id, tree);
 static tree pos_to_constructor (Node_Id, tree, Entity_Id);
 static void validate_unchecked_conversion (Node_Id);
+static Node_Id adjust_for_implicit_deref (Node_Id);
 static tree maybe_implicit_deref (tree);
 static void set_expr_location_from_node (tree, Node_Id, bool = false);
 static void set_gnu_expr_location_from_node (tree, Node_Id);
@@ -6274,8 +6275,9 @@ gnat_to_gnu (Node_Id gnat_node)
 /*/
 
 case N_Explicit_Dereference:
-  gnu_result = gnat_to_gnu (Prefix (gnat_node));
+  /* Make sure the designated type is complete before dereferencing.  */
   gnu_result_type = get_unpadded_type (Etype (gnat_node));
+  gnu_result = gnat_to_gnu (Prefix (gnat_node));
   gnu_result = build_unary_op (INDIRECT_REF, NULL_TREE, gnu_result);
 
   /* If atomic access is required on the RHS, build the atomic load.  */
@@ -6286,7 +6288,8 @@ gnat_to_gnu (Node_Id gnat_node)
 
 case N_Indexed_Component:
   {
-	tree gnu_array_object = gnat_to_gnu (Prefix (gnat_node));
+	tree gnu_array_object
+	  = gnat_to_gnu (adjust_for_implicit_deref (Prefix (gnat_node)));
 	tree gnu_type;
 	int ndim;
 	int i;
@@ -6399,7 +6402,8 @@ gnat_to_gnu (Node_Id gnat_node)
 
 case N_Slice:
   {
-	tree gnu_array_object = gnat_to_gnu (Prefix (gnat_node));
+	tree gnu_array_object
+	  = gnat_to_gnu (adjust_for_implicit_deref (Prefix (gnat_node)));
 
 	gnu_result_type = get_unpadded_type (Etype (gnat_node));
 
@@ -6423,7 +6427,8 @@ gnat_to_gnu (Node_Id gnat_node)
 
 case N_Selected_Component:
   {
-	Entity_Id gnat_prefix = Prefix (gnat_node);
+	Entity_Id gnat_prefix
+	  = adjust_for_implicit_deref (Prefix (gnat_node));
 	Entity_Id gnat_field = Entity (Selector_Name (gnat_node));
 	tree gnu_prefix = gnat_to_gnu (gnat_prefix);
 
@@ -6456,17 +6461,6 @@ gnat_to_gnu (Node_Id gnat_node)
 	  {
 	tree gnu_field = gnat_to_gnu_field_decl (gnat_field);
 
-	/* If the prefix has incomplete type, try again to translate it.
-	   The idea is that the translation of the field just above may
-	   have completed it through gnat_to_gnu_entity, in case it is
-	   the dereference of an access to Taft Amendment type used in
-	   the instantiation of a generic body from an external unit.  */
-	if (!COMPLETE_TYPE_P (TREE_TYPE (gnu_prefix)))
-	  {
-		gnu_prefix = gnat_to_gnu (gnat_prefix);
-		gnu_prefix = maybe_implicit_deref (gnu_prefix);
-	  }
-
 	gnu_result
 	  = build_component_ref (gnu_prefix, gnu_field,
  (Nkind (Parent (gnat_node))
@@ -7725,7 +7719,8 @@ gnat_to_gnu (Node_Id gnat_node)
 case N_Free_Statement:
   if (!type_annotate_only)
 	{
-	  tree gnu_ptr = gnat_to_gnu (Expression (gnat_node));
+	  tree gnu_ptr
+	= gnat_to_gnu (adjust_for_implicit_deref (Expression (gnat_node)));
 	  tree gnu_ptr_type = TREE_TYPE (gnu_ptr);
 	  tree gnu_obj_type, gnu_actual_obj_type;
 
@@ -9913,6 +9908,21 @@ validate_unchecked_conversion (Node_Id g
 }
 }
 
+/* EXP is to be used in a context where access objects are implicitly
+   dereferenced.  Handle the cases when it is an access object.  */
+
+static Node_Id
+adjust_for_implicit_deref (Node_Id exp)
+{
+  Entity_Id type = Underlying_Type (Etype (exp));
+
+  /* Make sure the designated type is complete before dereferencing.  */
+  if (Is_Access_Type (type))
+gnat_to_gnu_entity (Designated_Type (type), NULL_TREE, false);
+
+  return exp;
+}
+
 /* EXP is to be treated as an array or record.  Handle the cases when it is
an access object and perform the required dereferences.  */

Re: [Libgomp, Fortran] Fix canadian cross build

2017-09-05 Thread Yvan Roux

On 18 August 2017 at 10:27, Yvan Roux  wrote:
> On 4 August 2017 at 15:52, Yvan Roux  wrote:
>> On 11 July 2017 at 12:25, Yvan Roux  wrote:
>>> On 3 July 2017 at 11:21, Yvan Roux  wrote:
 On 23 June 2017 at 15:44, Yvan Roux  wrote:
> Hello,
>
> Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are
> missing in a canadian cross build, at least when target gfortran
> compiler comes from PATH and not from GFORTRAN_FOR_TARGET.
>
> Back in 2010, executability test of GFORTRAN was added to fix libgomp
> build on cygwin, but when the executable doesn't contain the path,
> "test -x" fails and part of the library are not built.
>
> This patch fixes the issue by using M4 macro AC_PATH_PROG (which
> returns the absolute name) instead of AC_CHECK_PROG in the function
> defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS.  I renamed it
> into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4.
>
> Tested by building cross and candian cross toolchain (host:
> i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a
> complete libgomp.
>
> ok for trunk ?

 ping?
>>>
>>> ping?
>>
>> ping
>
> ping
>
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html

ping

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html


> Thanks
> Yvan
>
> config/ChangeLog
> 2017-06-23  Yvan Roux  
>
> * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
> (NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the 
> replacement
> of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the
> program.
> (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.
>
> ChangeLog
> 2017-06-23  Yvan Roux  
>
> * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
> NCN_STRICT_CHECK_TARGET_TOOLS.
> * configure: Regenerate.

[Ada] Small housekeeping work

2017-09-05 Thread Eric Botcazou

Tested on x86_64-suse-linux, applied on mainline.


2017-09-05  Eric Botcazou  

* gcc-interface/gigi.h (renaming_from_generic_instantiation_p):Turn to
(renaming_from_instantiation_p): ...this.
* gcc-interface/decl.c (gnat_to_gnu_entity): Use inline predicate
instead of explicit tests on kind of entities.  Adjust for renaming.
(gnat_to_gnu_profile_type): Likewise.
(gnat_to_gnu_subprog_type): Likewise.
* gcc-interface/trans.c (Identifier_to_gnu): Likewise.
(Case_Statement_to_gnu): Likewise.
(gnat_to_gnu): Likewise.
(process_freeze_entity): Likewise.
(process_type): Likewise.
(add_stmt_with_node): Adjust for renaming.
* gcc-interface/utils.c (gnat_pushdecl): Adjust for renaming.
(renaming_from_generic_instantiation_p): Rename to...
(renaming_from_instantiation_p): ...this.  Use inline predicate.
(pad_type_hasher::keep_cache_entry): Fold.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 251698)
+++ gcc-interface/decl.c	(working copy)
@@ -341,14 +341,14 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	gnat_temp
 	  = Corresponding_Spec (Parent (Declaration_Node (gnat_temp)));
 
-	  if (IN (Ekind (gnat_temp), Subprogram_Kind)
+	  if (Is_Subprogram (gnat_temp)
 	  && Present (Protected_Body_Subprogram (gnat_temp)))
 	gnat_temp = Protected_Body_Subprogram (gnat_temp);
 
 	  if (Ekind (gnat_temp) == E_Entry
 	  || Ekind (gnat_temp) == E_Entry_Family
 	  || Ekind (gnat_temp) == E_Task_Type
-	  || (IN (Ekind (gnat_temp), Subprogram_Kind)
+	  || (Is_Subprogram (gnat_temp)
 		  && present_gnu_tree (gnat_temp)
 		  && (current_function_decl
 		  == gnat_to_gnu_entity (gnat_temp, NULL_TREE, false
@@ -426,7 +426,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
  inherit another source location.  */
   gnu_entity_name = get_entity_name (gnat_entity);
   if (Sloc (gnat_entity) != No_Location
-  && !renaming_from_generic_instantiation_p (gnat_entity))
+  && !renaming_from_instantiation_p (gnat_entity))
 Sloc_to_locus (Sloc (gnat_entity), &input_location);
 
   /* For cases when we are not defining (i.e., we are referencing from
@@ -2922,7 +2922,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
   /* Create the type for a string literal.  */
   {
 	Entity_Id gnat_full_type
-	  = (IN (Ekind (Etype (gnat_entity)), Private_Kind)
+	  = (Is_Private_Type (Etype (gnat_entity))
 	 && Present (Full_View (Etype (gnat_entity)))
 	 ? Full_View (Etype (gnat_entity)) : Etype (gnat_entity));
 	tree gnu_string_type = get_unpadded_type (gnat_full_type);
@@ -3198,7 +3198,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	if (has_discr)
 	  {
 		/* The actual parent subtype is the full view.  */
-		if (IN (Ekind (gnat_parent), Private_Kind))
+		if (Is_Private_Type (gnat_parent))
 		  {
 		if (Present (Full_View (gnat_parent)))
 		  gnat_parent = Full_View (gnat_parent);
@@ -3583,14 +3583,14 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	Entity_Id gnat_desig_equiv = Gigi_Equivalent_Type (gnat_desig_type);
 	/* Whether it comes from a limited with.  */
 	const bool is_from_limited_with
-	  = (IN (Ekind (gnat_desig_equiv), Incomplete_Kind)
+	  = (Is_Incomplete_Type (gnat_desig_equiv)
 	 && From_Limited_With (gnat_desig_equiv));
 	/* Whether it is a completed Taft Amendment type.  Such a type is to
 	   be treated as coming from a limited with clause if it is not in
 	   the main unit, i.e. we break potential circularities here in case
 	   the body of an external unit is loaded for inter-unit inlining.  */
 const bool is_completed_taft_type
-	  = (IN (Ekind (gnat_desig_equiv), Incomplete_Kind)
+	  = (Is_Incomplete_Type (gnat_desig_equiv)
 	 && Has_Completion_In_Body (gnat_desig_equiv)
 	 && Present (Full_View (gnat_desig_equiv)));
 	/* The "full view" of the designated type.  If this is an incomplete
@@ -3603,12 +3603,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	Entity_Id gnat_desig_full_direct_first
 	  = (is_from_limited_with
 	 ? Non_Limited_View (gnat_desig_equiv)
-	 : (IN (Ekind (gnat_desig_equiv), Incomplete_Or_Private_Kind)
+	 : (Is_Incomplete_Or_Private_Type (gnat_desig_equiv)
 		? Full_View (gnat_desig_equiv) : Empty));
 	Entity_Id gnat_desig_full_direct
 	  = ((is_from_limited_with
 	  && Present (gnat_desig_full_direct_first)
-	  && IN (Ekind (gnat_desig_full_direct_first), Private_Kind))
+	  && Is_Private_Type (gnat_desig_full_direct_first))
 	 ? Full_View (gnat_desig_full_direct_first)
 	 : gnat_desig_full_direct_first);
 	Entity_Id gnat_desig_full
@@ -3856,9 +3856,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  p->next = defer_incomplete_list;
 	  defer_incomplete_list = p;
 	}
-	  else if (!IN (Ekind (Base_Type
-			   (Directly_Designated_Type (gnat_entity))),
-

Re: [PATCH, GCC/testsuite/ARM, ping3] Fix coprocessor intrinsic test failures on ARMv8-A

2017-09-05 Thread Thomas Preudhomme


Ping?

Best regards,

Thomas

On 23/08/17 11:59, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 17/07/17 09:51, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 12/07/17 14:31, Thomas Preudhomme wrote:

Coprocessor intrinsic tests in gcc.target/arm/acle test whether
__ARM_FEATURE_COPROC has the right bit defined before calling the
intrinsic. This allows to test both the correct setting of that macro
and the availability and correct working of the intrinsic. However the
__ARM_FEATURE_COPROC macro is no longer defined for ARMv8-A since
r249399.

This patch changes the testcases to skip that test for ARMv8-A and
ARMv8-R targets.  It also fixes some irregularity in the coprocessor
effective targets:
- add ldcl and stcl to the list of instructions listed as guarded by
   arm_coproc1_ok
- enable tests guarded by arm_coproc2_ok, arm_coproc3_ok and
   arm_coproc4_ok for Thumb-2 capable targets but disable for Thumb-1
   targets.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-07-04  Thomas Preud'homme  

 * gcc.target/arm/acle/cdp.c: Skip __ARM_FEATURE_COPROC check for
 ARMv8-A and ARMv8-R.
 * gcc.target/arm/acle/cdp2.c: Likewise.
 * gcc.target/arm/acle/ldc.c: Likewise.
 * gcc.target/arm/acle/ldc2.c: Likewise.
 * gcc.target/arm/acle/ldc2l.c: Likewise.
 * gcc.target/arm/acle/ldcl.c: Likewise.
 * gcc.target/arm/acle/mcr.c: Likewise.
 * gcc.target/arm/acle/mcr2.c: Likewise.
 * gcc.target/arm/acle/mcrr.c: Likewise.
 * gcc.target/arm/acle/mcrr2.c: Likewise.
 * gcc.target/arm/acle/mrc.c: Likewise.
 * gcc.target/arm/acle/mrc2.c: Likewise.
 * gcc.target/arm/acle/mrrc.c: Likewise.
 * gcc.target/arm/acle/mrrc2.c: Likewise.
 * gcc.target/arm/acle/stc.c: Likewise.
 * gcc.target/arm/acle/stc2.c: Likewise.
 * gcc.target/arm/acle/stc2l.c: Likewise.
 * gcc.target/arm/acle/stcl.c: Likewise.
 * lib/target-supports.exp:
 (check_effective_target_arm_coproc1_ok_nocache): Mention ldcl
 and stcl in the comment.
 (check_effective_target_arm_coproc2_ok_nocache): Allow Thumb-2 targets
 and disable Thumb-1 targets.
 (check_effective_target_arm_coproc3_ok_nocache): Likewise.
 (check_effective_target_arm_coproc4_ok_nocache): Likewise.

Tested by running all tests in gcc.target/arm/acle before and after this
patch for ARMv6-M, ARMv7-M, ARMv7E-M, ARMv3, ARMv4 (ARM state), ARMv4T
(Thumb state), ARMv5 (ARM state), ARMv5TE (ARM state), ARMv6 (ARM
state), ARMv6T2 (Thumb state) and and ARMv8-A (both state). The only
changes are for ARMv8-A where tests FAILing are now PASSing again.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/testsuite/gcc.target/arm/acle/cdp.c b/gcc/testsuite/gcc.target/arm/acle/cdp.c
index cebd8c4024ea1930f490f63e5267a33bac59a3a8..cfa922a797cddbf4a99f27ec156fd2d2fc9a460d 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cdp.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cdp.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc1_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x1) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x1) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/cdp2.c b/gcc/testsuite/gcc.target/arm/acle/cdp2.c
index 945d435d2fb99962ff47d921d9cb3633cb75bb79..b18076c26274043be8ac71e6516b9b6eac3b4137 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cdp2.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cdp2.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc2_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x2) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x2) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/ldc.c b/gcc/testsuite/gcc.target/arm/acle/ldc.c
index cd57343208fc5b17e5391d11d126d20e224d6566..10c879f4a15e7c293541c61dc974d972798ecedf 100644
--- a/gcc/testsuite/gcc.target/arm/acle/ldc.c
+++ b/gcc/testsuite/gcc.target/arm/acle/ldc.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc1_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x1) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x1) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/ldc2.c b/gcc/testsuite/gcc.target/arm/acle/ldc2.c
index d7691e30d763d1e921817fd586b47888e1b5c78f..d561adacccf358a1dbfa9db253c9bc08847c7e33 100644
--- a/gcc/testsuite/gcc.target/arm/acle/ldc2.c
+++ b/gcc/testsuite/gcc.target/arm/acle/ldc2.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc2_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x2) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x2) == 0

Re: [PATCH, GCC/ARM, ping] Remove ARMv8-M code for D17-D31

2017-09-05 Thread Thomas Preudhomme


Ping?

Best regards,

Thomas

On 25/08/17 12:18, Thomas Preudhomme wrote:

Hi,

I've now also added a couple more changes:

* size to_clear_bitmap according to maxregno to be consistent with its use
* use directly TARGET_HARD_FLOAT instead of clear_vfpregs


Original message below (ChangeLog unchanged):

Function cmse_nonsecure_entry_clear_before_return has code to deal with
high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do
not support more than 16 double VFP registers (D0-D15). This makes this
security-sensitive code harder to read for not much benefit since
libcall for cmse_nonsecure_call functions do not deal with those high
VFP registers anyway.

This commit gets rid of this code for simplicity and fixes 2 issues in
the same function:

- stop the first loop when reaching maxregno to avoid dealing with VFP
   registers if targetting Thumb-1 or using -mfloat-abi=soft
- include maxregno in that loop

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-06-13  Thomas Preud'homme  

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second entry of
 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.

Testing: Testsuite shows no regression when run for ARMv8-M Baseline and
ARMv8-M Mainline.

Is this ok for trunk?

Best regards,

Thomas

On 23/08/17 11:56, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 17/07/17 17:25, Thomas Preudhomme wrote:
My bad, found an off-by-one error in the sizing of bitmaps. Please find fixed 
patch in attachment.


ChangeLog entry is unchanged:

*** gcc/ChangeLog ***

2017-06-13  Thomas Preud'homme  

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second entry of
 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.

Best regards,

Thomas

On 17/07/17 09:52, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 12/07/17 09:59, Thomas Preudhomme wrote:

Hi Richard,

On 07/07/17 15:19, Richard Earnshaw (lists) wrote:


Hmm, I think that's because really this is a partial conversion.  It
looks like doing this properly would involve moving that existing code
to use sbitmaps as well.  I think doing that would be better for
long-term maintenance perspectives, but I'm not going to insist that you
do it now.


There's also the assert later but I've found a way to improve it slightly. 
While switching to auto_sbitmap I also changed the code slightly to 
allocate directly bitmaps to the right size. Since the change is probably 
bigger than what you had in mind I'd appreciate if you can give me an OK 
again. See updated patch in attachment. ChangeLog entry is unchanged:


2017-06-13  Thomas Preud'homme  

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second entry of
 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.



As a result I'll let you take the call as to whether you keep this
version or go back to your earlier patch.  If you do decide to keep this
version, then see the comment below.


Given the changes I'm more happy with how the patch looks now and making it 
go in can be a nice incentive to change other ARMv8-M Security Extension 
related code later on.


Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3d15a8185a74164743961d7d666cef4d60b8b11e..680a3c564bdad4ae7cacd57b61f099bdf42d3e73 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3658,6 +3658,11 @@ arm_option_override (void)
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");
 
+  /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions
+ and ARMv8-M Baseline and Mainline do not allow such configuration.  */
+  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
+error ("ARMv8-M Security Extensions incompatible with selected FPU");
+
   /* Disable scheduling fusion by default if it's not armv7 processor
  or doesn't prefer ldrd/strd.  */
   if (flag_schedule_fusion == 2
@@ -25038,42 +25043,37 @@ thumb1_expand_prologue (void)
 void
 cmse_nonsecure_entry_clear_before_return (void)
 {
-  uint64_t to_clear_mask[2];
+  int regno, maxregno = TARGET_HARD_FLOAT ? LAST_VFP_REGNUM : IP_REGNUM;
   uint32_t padding_bits_to_clear = 0;
   uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear;
-  int regno, maxregno = IP_REGNUM;
+  auto_sbitmap to_clear_bitmap (maxregno + 1);
   tree result_type;
   rtx result_rtl;
 
-  to_clear_mask[0] = (1ULL << (NU

[Ada] Fix ICE on type witn zero precision

2017-09-05 Thread Eric Botcazou

This is a regression present on the mainline, 7 and 6 branches, in the form of 
an ICE during tree-ccp, which is confused by a type witn zero precision.

Tested on x86_64-suse-linux, applied on mainline, 7 and 6 branches.


2017-09-05  Eric Botcazou  

* gcc-interface/utils.c (unchecked_convert): When the result type is a
non-biased integral type with size 0, set the result to 0 directly.


2017-09-05  Eric Botcazou  

* gnat.dg/specs/uc2.ads: New test.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 251700)
+++ gcc-interface/utils.c	(working copy)
@@ -5257,20 +5257,26 @@ unchecked_convert (tree type, tree expr,
 	? TYPE_RM_SIZE (etype)
 	: TYPE_SIZE (etype)) == 0)))
 {
-  tree base_type
-	= gnat_type_for_size (TREE_INT_CST_LOW (TYPE_SIZE (type)),
-			  type_unsigned_for_rm (type));
-  tree shift_expr
-	= convert (base_type,
-		   size_binop (MINUS_EXPR,
-			   TYPE_SIZE (type), TYPE_RM_SIZE (type)));
-  expr
-	= convert (type,
-		   build_binary_op (RSHIFT_EXPR, base_type,
-build_binary_op (LSHIFT_EXPR, base_type,
-		 convert (base_type, expr),
-		 shift_expr),
-shift_expr));
+  if (integer_zerop (TYPE_RM_SIZE (type)))
+	expr = build_int_cst (type, 0);
+  else
+	{
+	  tree base_type
+	= gnat_type_for_size (TREE_INT_CST_LOW (TYPE_SIZE (type)),
+  type_unsigned_for_rm (type));
+	  tree shift_expr
+	= convert (base_type,
+		   size_binop (MINUS_EXPR,
+   TYPE_SIZE (type), TYPE_RM_SIZE (type)));
+	  expr
+	= convert (type,
+		   build_binary_op (RSHIFT_EXPR, base_type,
+build_binary_op (LSHIFT_EXPR, base_type,
+			 convert (base_type,
+  expr),
+			 shift_expr),
+shift_expr));
+	}
 }
 
   /* An unchecked conversion should never raise Constraint_Error.  The code
-- { dg-do compile }
-- { dg-options "-O" }

with Ada.Unchecked_Conversion;

package UC2 is

  subtype Word_Type is Integer range 0 .. 0;
  type Arr is array (1 .. Word_Type'Size) of Boolean;
  pragma Pack(Arr);

  function Conv is
 new Ada.Unchecked_Conversion (Source => Arr, Target => Word_Type);

  A : Arr;
  W : Word_Type := Conv(A);

end UC2;

[Ada] Fix bogus constraint error on value conversion with -gnatVa

2017-09-05 Thread Eric Botcazou

Tested on x86_64-suse-linux, applied on mainline.


2017-09-05  Eric Botcazou  

* gcc-interface/trans.c (Attribute_to_gnu) : Do notstrip
conversions around prefixes that are not references.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 251700)
+++ gcc-interface/trans.c	(working copy)
@@ -1629,10 +1629,14 @@ Attribute_to_gnu (Node_Id gnat_node, tre
 
 case Attr_Address:
 case Attr_Unrestricted_Access:
-  /* Conversions don't change addresses but can cause us to miss the
-	 COMPONENT_REF case below, so strip them off.  */
-  gnu_prefix = remove_conversions (gnu_prefix,
-   !Must_Be_Byte_Aligned (gnat_node));
+  /* Conversions don't change the address of references but can cause
+	 build_unary_op to miss the references below, so strip them off.
+	 On the contrary, if the address-of operation causes a temporary
+	 to be created, then it must be created with the proper type.  */
+  gnu_expr = remove_conversions (gnu_prefix,
+ !Must_Be_Byte_Aligned (gnat_node));
+  if (REFERENCE_CLASS_P (gnu_expr))
+	gnu_prefix = gnu_expr;
 
   /* If we are taking 'Address of an unconstrained object, this is the
 	 pointer to the underlying array.  */

[Ada] Get rid of call to int->fp conversion routine

2017-09-05 Thread Eric Botcazou

Tested on x86_64-suse-linux, applied on mainline.


2017-09-05  Eric Botcazou  

* gcc-interface/trans.c (convert_with_check): Use a custom base type
if the base type of the expression has a different machine mode.
Rename a couple of parameters and local variable.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 251704)
+++ gcc-interface/trans.c	(working copy)
@@ -9252,63 +9252,71 @@ emit_check (tree gnu_cond, tree gnu_expr
 
 /* Return an expression that converts GNU_EXPR to GNAT_TYPE, doing overflow
checks if OVERFLOW_P is true and range checks if RANGE_P is true.
-   GNAT_TYPE is known to be an integral type.  If TRUNCATE_P true, do a
-   float to integer conversion with truncation; otherwise round.
-   GNAT_NODE is the GNAT node conveying the source location for which the
-   error should be signaled.  */
+   If TRUNCATE_P true, do a float-to-integer conversion with truncation,
+   otherwise round.  GNAT_NODE is the GNAT node conveying the source location
+   for which the error should be signaled.  */
 
 static tree
-convert_with_check (Entity_Id gnat_type, tree gnu_expr, bool overflowp,
-		bool rangep, bool truncatep, Node_Id gnat_node)
+convert_with_check (Entity_Id gnat_type, tree gnu_expr, bool overflow_p,
+		bool range_p, bool truncate_p, Node_Id gnat_node)
 {
   tree gnu_type = get_unpadded_type (gnat_type);
-  tree gnu_in_type = TREE_TYPE (gnu_expr);
-  tree gnu_in_basetype = get_base_type (gnu_in_type);
   tree gnu_base_type = get_base_type (gnu_type);
+  tree gnu_in_type = TREE_TYPE (gnu_expr);
+  tree gnu_in_base_type = get_base_type (gnu_in_type);
   tree gnu_result = gnu_expr;
 
   /* If we are not doing any checks, the output is an integral type and the
  input is not a floating-point type, just do the conversion.  This is
  required for packed array types and is simpler in all cases anyway.   */
-  if (!rangep
-  && !overflowp
+  if (!range_p
+  && !overflow_p
   && INTEGRAL_TYPE_P (gnu_base_type)
-  && !FLOAT_TYPE_P (gnu_in_type))
+  && !FLOAT_TYPE_P (gnu_in_base_type))
 return convert (gnu_type, gnu_expr);
 
-  /* First convert the expression to its base type.  This
- will never generate code, but makes the tests below much simpler.
- But don't do this if converting from an integer type to an unconstrained
- array type since then we need to get the bounds from the original
- (unpacked) type.  */
+  /* If the mode of the input base type is larger, then converting to it below
+ may pessimize the final conversion step, for example generate a libcall
+ instead of a simple instruction, so use a narrower type in this case.  */
+  if (TYPE_MODE (gnu_in_base_type) != TYPE_MODE (gnu_in_type)
+  && !(TREE_CODE (gnu_in_type) == INTEGER_TYPE
+	   && TYPE_BIASED_REPRESENTATION_P (gnu_in_type)))
+gnu_in_base_type = gnat_type_for_mode (TYPE_MODE (gnu_in_type),
+	   TYPE_UNSIGNED (gnu_in_type));
+
+  /* First convert the expression to the base type.  This will never generate
+ code, but makes the tests below simpler.  But don't do this if converting
+ from an integer type to an unconstrained array type since then we need to
+ get the bounds from the original (unpacked) type.  */
   if (TREE_CODE (gnu_type) != UNCONSTRAINED_ARRAY_TYPE)
-gnu_result = convert (gnu_in_basetype, gnu_result);
+gnu_result = convert (gnu_in_base_type, gnu_result);
 
-  /* If overflow checks are requested,  we need to be sure the result will
- fit in the output base type.  But don't do this if the input
- is integer and the output floating-point.  */
-  if (overflowp
-  && !(FLOAT_TYPE_P (gnu_base_type) && INTEGRAL_TYPE_P (gnu_in_basetype)))
+  /* If overflow checks are requested,  we need to be sure the result will fit
+ in the output base type.  But don't do this if the input is integer and
+ the output floating-point.  */
+  if (overflow_p
+  && !(FLOAT_TYPE_P (gnu_base_type) && INTEGRAL_TYPE_P (gnu_in_base_type)))
 {
   /* Ensure GNU_EXPR only gets evaluated once.  */
   tree gnu_input = gnat_protect_expr (gnu_result);
   tree gnu_cond = boolean_false_node;
-  tree gnu_in_lb = TYPE_MIN_VALUE (gnu_in_basetype);
-  tree gnu_in_ub = TYPE_MAX_VALUE (gnu_in_basetype);
+  tree gnu_in_lb = TYPE_MIN_VALUE (gnu_in_base_type);
+  tree gnu_in_ub = TYPE_MAX_VALUE (gnu_in_base_type);
   tree gnu_out_lb = TYPE_MIN_VALUE (gnu_base_type);
   tree gnu_out_ub = TYPE_MAX_VALUE (gnu_base_type);
 
   /* Convert the lower bounds to signed types, so we're sure we're
 	 comparing them properly.  Likewise, convert the upper bounds
 	 to unsigned types.  */
-  if (INTEGRAL_TYPE_P (gnu_in_basetype) && TYPE_UNSIGNED (gnu_in_basetype))
+  if (INTEGRAL_TYPE_P (gnu_in_base_type)
+	  && TYPE_UNSIGNED (gnu_in_base_type))
 	gnu_in_lb
-	  = convert (gnat_signe

Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Richard Biener

On Mon, 4 Sep 2017, Andrew Pinski wrote:

> On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina  
> wrote:
> >> >   vect__5.25_58 = VIEW_CONVERT_EXPR >> intD.11>(vect__4.21_65);
> >> >   vect__5.25_57 = VIEW_CONVERT_EXPR >> intD.11>(vect__4.22_63);
> >> >   vect__5.25_56 = VIEW_CONVERT_EXPR >> intD.11>(vect__4.23_61);
> >> >   vect__5.25_55 = VIEW_CONVERT_EXPR >> > intD.11>(vect__4.24_59);
> >> >
> >> > I suspect this patch will be quite bad for us performance wise as it
> >> > thinks it's as cheap to do all our integer operations on the vector side 
> >> > with
> >> vectors of 1 element. But I'm still waiting for the perf numbers to 
> >> confirm.
> >>
> >> Looks like the backend advertises that it can do POPCOUNT on V1DI.  So SLP
> >> vectorization decides it can vectorize this without unrolling.
> >
> > We don't, POPCOUNT is only defined for vector modes V8QI and V16QI, we also 
> > don't define support
> > For V1DI anywhere in the backend, we do however say we support V1DF, but 
> > removing
> > That doesn't cause the ICE to go away.
> >
> >> Vectorization with V2DI is rejected:
> >>
> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function is not
> >> vectorizable.
> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not vectorized:
> >> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: * Re-trying
> >> analysis with vector size 8
> >>
> >> and that now succeeds (it probably didn't succeed before the patch).
> >
> > In the .optimized file, I see it vectorised it to
> >
> >   vect__5.25_58 = VIEW_CONVERT_EXPR > intD.11>(vect__4.21_65);
> >   vect__5.25_57 = VIEW_CONVERT_EXPR > intD.11>(vect__4.22_63);
> >   vect__5.25_56 = VIEW_CONVERT_EXPR > intD.11>(vect__4.23_61);
> >   vect__5.25_55 = VIEW_CONVERT_EXPR > intD.11>(vect__4.24_59);
> >   _54 = POPCOUNT (vect__5.25_58);
> >   _53 = POPCOUNT (vect__5.25_57);
> >
> > Which is something we just don't have a pattern for. Before this patch, it 
> > was rejecting "long unsigned int"
> > With this patch is somehow thinks we support an integer vector of 1 
> > element, even though 1) we don't have an optab
> > Defined for this operation for POPCOUNT (or at all in aarch64 as far as I 
> > can tell), and 2) we don't have it in our supported list of vector modes.
> 
> Here are the two popcount optab aarch64 has:
> (define_mode_iterator GPI [SI DI])
> (define_expand "popcount2"
>   [(match_operand:GPI 0 "register_operand")
>(match_operand:GPI 1 "register_operand")]
> 
> 
> (define_insn "popcount2"
> (define_mode_iterator VB [V8QI V16QI])
>   [(set (match_operand:VB 0 "register_operand" "=w")
> (popcount:VB (match_operand:VB 1 "register_operand" "w")))]
> 
> As you can see we only define popcount optab for SI, DI, V8QI and
> V16QI.  (note SI and DI uses the V8QI and V16QI during the expansion
> but that is a different story).
> 
> Maybe somehow the vectorizer is thinking V1DI and DI are
> interchangeable in some places.

We ask

Breakpoint 5, vectorizable_internal_function (cfn=CFN_BUILT_IN_POPCOUNTL, 
fndecl=, 
vectype_out=, 
vectype_in=)
at /tmp/trunk/gcc/tree-vect-stmts.c:1666
1666  if (internal_fn_p (cfn))
(gdb) p debug_generic_expr (vectype_out)
vector(2) int
$10 = void
(gdb) p debug_generic_expr (vectype_in)
vector(1) long unsigned int
$11 = void
(gdb) fin
Run till exit from #0  vectorizable_internal_function (
cfn=CFN_BUILT_IN_POPCOUNTL, 
fndecl=, 
vectype_out=, 
vectype_in=)
at /tmp/trunk/gcc/tree-vect-stmts.c:1666
0x01206afc in vectorizable_call (gs=, 
gsi=0x0, vec_stmt=0x0, slp_node=0x2451290)
at /tmp/trunk/gcc/tree-vect-stmts.c:2762
2762ifn = vectorizable_internal_function (cfn, callee, 
vectype_out,
Value returned is $12 = IFN_POPCOUNT

so somehow direct_internal_fn_supported_p says true for a POPCOUNTL
V1DI -> V2SI.  I'd argue the question is odd already but the
ultimative answer is cleary wrong ;)

We have

(gdb) p info
$22 = (const direct_internal_fn_info &) @0x19b8e0c: {type0 = 0, type1 = 0, 
  vectorizable = 1}

so require vectype_in as vectype_out which means we ask for V1DI -> V1DI
(and the vectorizer compensates for the demotion).  But the
mode of V1DI is actually DI (because the target doesn't support V1DI).
Thus we end up with asking for popcount with DImode which is
available.

Note that this V1DI vector type having DImode is desirable for Jons
case as far as I understand.  DImode gets used because aarch64
advertises generic vectorization support with word_mode.

Things may get tricky later when we look for VEC_PACK_TRUNC
of V1DI, V1DI -> V2SI but the vec_pack_trunc_optab advertises
DImode support via the VDN iterator (maybe expecting this only
in case no V2SI support is available).

So in the end the vectorizer works as expected ;)

What we don't seem to handle is a single-element vector typed, DImode
constructor with a single DImode element.

We call store_bit_fie

Fix PR ada/62235

2017-09-05 Thread Eric Botcazou

Tested on x86_64-suse-linux, applied on mainline and 7 branch.


2017-09-05  Eric Botcazou  

PR ada/62235
* gcc-interface/decl.c (gnat_to_gnu_entity): Skip regular processing
for Itypes that are E_Record_Subtype with a cloned subtype.
: Use the DECL of the cloned type directly, if any.


2017-09-05  Eric Botcazou  

* gnat.dg/incomplete5.ad[sb]: New test.
* gnat.dg/incomplete5_pkg.ad[sb]: New helper.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 251700)
+++ gcc-interface/decl.c	(working copy)
@@ -312,11 +312,14 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 
   /* Since a use of an Itype is a definition, process it as such if it is in
  the main unit, except for E_Access_Subtype because it's actually a use
- of its base type, see below.  */
+ of its base type, and for E_Record_Subtype with cloned subtype because
+ it's actually a use of the cloned subtype, see below.  */
   if (!definition
   && is_type
   && Is_Itype (gnat_entity)
-  && Ekind (gnat_entity) != E_Access_Subtype
+  && !(kind == E_Access_Subtype
+	   || (kind == E_Record_Subtype
+	   && Present (Cloned_Subtype (gnat_entity
   && !present_gnu_tree (gnat_entity)
   && In_Extended_Main_Code_Unit (gnat_entity))
 {
@@ -3411,7 +3414,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	{
 	  gnu_decl = gnat_to_gnu_entity (Cloned_Subtype (gnat_entity),
 	 NULL_TREE, false);
-	  maybe_present = true;
+	  saved = true;
 	  break;
 	}
 
-- { dg-do compile }

package body Incomplete5 is

   function Get (O: Base_Object) return Integer is
   begin
  return Get_Handle(O).I;
   end;

end Incomplete5;
with Incomplete5_Pkg;

package Incomplete5 is

   type Rec1 is private;

   type Rec2 is private;

   package My_G is new Incomplete5_Pkg (Rec1);

   use My_G;

   function Get (O: Base_Object) return Integer;

private

   type Rec1 is record
  I : Integer;
   end record;

   type Rec2 is record
  A : Access_Type;
   end record;

end Incomplete5;
package body Incomplete5_Pkg is

   function Get_Handle (Object: Base_Object) return Access_Type is
   begin
  return Object.Handle;
   end;

   function From_Handle (Handle: Access_Type) return Base_Object is
   begin
  return (Handle=>Handle);
   end;

end Incomplete5_Pkg;
generic
   type Record_Type;
package Incomplete5_Pkg is

   type Access_Type is access Record_Type;

   type Base_Object is tagged record
  Handle: Access_Type;
   end record;

   function Get_Handle(Object: Base_Object) return Access_Type;

   function From_Handle(Handle: Access_Type) return Base_Object;

end Incomplete5_Pkg;

Re: [00/77] Add wrapper classes for machine_modes

2017-09-05 Thread Jakub Jelinek

On Sun, Sep 03, 2017 at 09:18:33PM +0100, Richard Sandiford wrote:
> Gerald Pfeifer  writes:
> > Hi Richard,
> >
> > I'm afraid your patchset has broken bootstrap on i686-unknown-freebsd10.3,
> > in fact, it appears on FreeBSD in general (amd64-unknown-freebsd11 as well):
> 
> This sounds like the same as PR82045.  Could you try the patch I posted
> here: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00062.html ?

Richard, could you please next time commit such a huge series of patches
that has not been tested separately as one commit (or test the patches also
individually)?  It is sometimes fine for patch review if the granularity is
smaller, but e.g. for bisection having long ranges of commits that don't
build/bootstrap at all is highly undesirable.  That seems to be at least on
x86_64-linux the case of r251470 to r251503 inclusive.

Thanks

Jakub

Re: [Libgomp, Fortran] Fix canadian cross build

2017-09-05 Thread Jakub Jelinek

On Tue, Sep 05, 2017 at 10:58:22AM +0200, Yvan Roux wrote:
> ping
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html

This really needs to be reviewed by a build machinery maintainer.

> > config/ChangeLog
> > 2017-06-23  Yvan Roux  
> >
> > * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
> > (NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the 
> > replacement
> > of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the
> > program.
> > (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.
> >
> > ChangeLog
> > 2017-06-23  Yvan Roux  
> >
> > * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
> > NCN_STRICT_CHECK_TARGET_TOOLS.
> > * configure: Regenerate.

Jakub

Re: [00/77] Add wrapper classes for machine_modes

2017-09-05 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Sun, Sep 03, 2017 at 09:18:33PM +0100, Richard Sandiford wrote:
>> Gerald Pfeifer  writes:
>> > Hi Richard,
>> >
>> > I'm afraid your patchset has broken bootstrap on i686-unknown-freebsd10.3,
>> > in fact, it appears on FreeBSD in general (amd64-unknown-freebsd11 as 
>> > well):
>> 
>> This sounds like the same as PR82045.  Could you try the patch I posted
>> here: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00062.html ?
>
> Richard, could you please next time commit such a huge series of patches
> that has not been tested separately as one commit (or test the patches also
> individually)?  It is sometimes fine for patch review if the granularity is
> smaller, but e.g. for bisection having long ranges of commits that don't
> build/bootstrap at all is highly undesirable.  That seems to be at least on
> x86_64-linux the case of r251470 to r251503 inclusive.
>
> Thanks

Yeah, sorry about that.  I had tested the original series as separate
patches, but messed up when making a final tweak in the switch to using
"require ()".

I'd wanted to commit them separately precisely so that a bisect would
identify a particular patch (this did help for PR82045).  But obviously
the mistake in expmed.c meant that quite a few revisions are duds.

Thanks,
Richard

[C++/ARM Patch] PR 81942 ("ICE on empty constexpr constructor with C++14")

2017-09-05 Thread Paolo Carlini


Hi,

Hi,

in this ICE on valid, a gcc_assert fires when a GOTO_EXPR is handled by 
cxx_eval_constant_expression which is the translation of a "return;" on 
a targetm.cxx.cdtor_returns_this target (like ARM):


;; Function constexpr A::A() (null)
;; enabled by -tree-original


{
  // predicted unlikely by goto predictor.;
  goto ;
}
:;
return this;

I think the right way to handle this is marking such special labels with 
a LABEL_DECL_CDTOR flag and using it in the returns helper function (we 
already use a similar strategy with LABEL_DECL_BREAK and 
LABEL_DECL_CONTINUE and the breaks and continues helpers). Then 
adjusting the ICEing gcc_assert is trivial. Tested x86_64-linux and 
aarch64-linux.


Thanks,
Paolo.

//

/cp
2017-09-05  Paolo Carlini  

PR c++/81942
* cp-tree.h (LABEL_DECL_CDTOR): Add and document.
* decl.c (start_preparsed_function): Set LABEL_DECL_CDTOR when
creating cdtor_label.
* constexpr.c (returns): Add the case of a constructor/destructor
returning via a LABEL_DECL_CDTOR label.
(cxx_eval_constant_expression, case [GOTO_EXPR]): Likewise.

/testsuite
2017-09-05  Paolo Carlini  

PR c++/81942
* g++.dg/cpp1y/constexpr-return3.C: New.
Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 251700)
+++ cp/constexpr.c  (working copy)
@@ -3671,7 +3671,9 @@ static bool
 returns (tree *jump_target)
 {
   return *jump_target
-&& TREE_CODE (*jump_target) == RETURN_EXPR;
+&& (TREE_CODE (*jump_target) == RETURN_EXPR
+   || (TREE_CODE (*jump_target) == LABEL_DECL
+   && LABEL_DECL_CDTOR (*jump_target)));
 }
 
 static bool
@@ -4554,7 +4556,9 @@ cxx_eval_constant_expression (const constexpr_ctx
 
 case GOTO_EXPR:
   *jump_target = TREE_OPERAND (t, 0);
-  gcc_assert (breaks (jump_target) || continues (jump_target));
+  gcc_assert (breaks (jump_target) || continues (jump_target)
+ /* Allow for jumping to a cdtor_label.  */
+ || returns (jump_target));
   break;
 
 case LOOP_EXPR:
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 251700)
+++ cp/cp-tree.h(working copy)
@@ -456,6 +456,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   DECL_CONSTRAINT_VAR_P (in a PARM_DECL)
   TEMPLATE_DECL_COMPLEX_ALIAS_P (in TEMPLATE_DECL)
   DECL_INSTANTIATING_NSDMI_P (in a FIELD_DECL)
+  LABEL_DECL_CDTOR (in LABEL_DECL)
3: DECL_IN_AGGR_P.
4: DECL_C_BIT_FIELD (in a FIELD_DECL)
   DECL_ANON_UNION_VAR_P (in a VAR_DECL)
@@ -3833,6 +3834,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a
 #define LABEL_DECL_CONTINUE(NODE) \
   DECL_LANG_FLAG_1 (LABEL_DECL_CHECK (NODE))
 
+/* Nonzero if NODE is the target for genericization of 'return' stmts
+   in constructors/destructors of targetm.cxx.cdtor_returns_this targets.  */
+#define LABEL_DECL_CDTOR(NODE) \
+  DECL_LANG_FLAG_2 (LABEL_DECL_CHECK (NODE))
+
 /* True if NODE was declared with auto in its return type, but it has
started compilation and so the return type might have been changed by
return type deduction; its declared return type should be found in
Index: cp/decl.c
===
--- cp/decl.c   (revision 251700)
+++ cp/decl.c   (working copy)
@@ -15072,7 +15073,10 @@ start_preparsed_function (tree decl1, tree attrs,
   if (DECL_DESTRUCTOR_P (decl1)
   || (DECL_CONSTRUCTOR_P (decl1)
  && targetm.cxx.cdtor_returns_this ()))
-cdtor_label = create_artificial_label (input_location);
+{
+  cdtor_label = create_artificial_label (input_location);
+  LABEL_DECL_CDTOR (cdtor_label) = true;
+}
 
   start_fname_decls ();
 
Index: testsuite/g++.dg/cpp1y/constexpr-return3.C
===
--- testsuite/g++.dg/cpp1y/constexpr-return3.C  (revision 0)
+++ testsuite/g++.dg/cpp1y/constexpr-return3.C  (working copy)
@@ -0,0 +1,11 @@
+// PR c++/81942
+// { dg-do compile { target c++14 } }
+
+class A {
+public:
+constexpr A() {
+  return;
+}
+};
+
+A mwi;

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Tamar Christina

Hi Richard,

That was an really interesting analysis, thanks for the details!

Would you be submitting the patch you proposed at the end as a fix?

Thanks,
Tamar

> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: 05 September 2017 10:38
> To: Andrew Pinski
> Cc: Tamar Christina; Andreas Schwab; Jon Beniston; gcc-
> patc...@gcc.gnu.org; nd
> Subject: Re: [RFC, vectorizer] Allow single element vector types for vector
> reduction operations
> 
> On Mon, 4 Sep 2017, Andrew Pinski wrote:
> 
> > On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina
>  wrote:
> > >> >   vect__5.25_58 = VIEW_CONVERT_EXPR > >> intD.11>(vect__4.21_65);
> > >> >   vect__5.25_57 = VIEW_CONVERT_EXPR > >> intD.11>(vect__4.22_63);
> > >> >   vect__5.25_56 = VIEW_CONVERT_EXPR > >> intD.11>(vect__4.23_61);
> > >> >   vect__5.25_55 = VIEW_CONVERT_EXPR > >> > intD.11>(vect__4.24_59);
> > >> >
> > >> > I suspect this patch will be quite bad for us performance wise as
> > >> > it thinks it's as cheap to do all our integer operations on the
> > >> > vector side with
> > >> vectors of 1 element. But I'm still waiting for the perf numbers to
> confirm.
> > >>
> > >> Looks like the backend advertises that it can do POPCOUNT on V1DI.
> > >> So SLP vectorization decides it can vectorize this without unrolling.
> > >
> > > We don't, POPCOUNT is only defined for vector modes V8QI and V16QI,
> > > we also don't define support For V1DI anywhere in the backend, we do
> > > however say we support V1DF, but removing That doesn't cause the ICE
> to go away.
> > >
> > >> Vectorization with V2DI is rejected:
> > >>
> > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function
> > >> is not vectorizable.
> > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not
> vectorized:
> > >> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
> > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: *
> > >> Re-trying analysis with vector size 8
> > >>
> > >> and that now succeeds (it probably didn't succeed before the patch).
> > >
> > > In the .optimized file, I see it vectorised it to
> > >
> > >   vect__5.25_58 = VIEW_CONVERT_EXPR intD.11>(vect__4.21_65);
> > >   vect__5.25_57 = VIEW_CONVERT_EXPR intD.11>(vect__4.22_63);
> > >   vect__5.25_56 = VIEW_CONVERT_EXPR intD.11>(vect__4.23_61);
> > >   vect__5.25_55 = VIEW_CONVERT_EXPR intD.11>(vect__4.24_59);
> > >   _54 = POPCOUNT (vect__5.25_58);
> > >   _53 = POPCOUNT (vect__5.25_57);
> > >
> > > Which is something we just don't have a pattern for. Before this patch, it
> was rejecting "long unsigned int"
> > > With this patch is somehow thinks we support an integer vector of 1
> > > element, even though 1) we don't have an optab Defined for this
> operation for POPCOUNT (or at all in aarch64 as far as I can tell), and 2) we
> don't have it in our supported list of vector modes.
> >
> > Here are the two popcount optab aarch64 has:
> > (define_mode_iterator GPI [SI DI])
> > (define_expand "popcount2"
> >   [(match_operand:GPI 0 "register_operand")
> >(match_operand:GPI 1 "register_operand")]
> >
> >
> > (define_insn "popcount2"
> > (define_mode_iterator VB [V8QI V16QI])
> >   [(set (match_operand:VB 0 "register_operand" "=w")
> > (popcount:VB (match_operand:VB 1 "register_operand" "w")))]
> >
> > As you can see we only define popcount optab for SI, DI, V8QI and
> > V16QI.  (note SI and DI uses the V8QI and V16QI during the expansion
> > but that is a different story).
> >
> > Maybe somehow the vectorizer is thinking V1DI and DI are
> > interchangeable in some places.
> 
> We ask
> 
> Breakpoint 5, vectorizable_internal_function
> (cfn=CFN_BUILT_IN_POPCOUNTL,
> fndecl=,
> vectype_out=,
> vectype_in=)
> at /tmp/trunk/gcc/tree-vect-stmts.c:1666
> 1666  if (internal_fn_p (cfn))
> (gdb) p debug_generic_expr (vectype_out)
> vector(2) int
> $10 = void
> (gdb) p debug_generic_expr (vectype_in)
> vector(1) long unsigned int
> $11 = void
> (gdb) fin
> Run till exit from #0  vectorizable_internal_function (
> cfn=CFN_BUILT_IN_POPCOUNTL,
> fndecl=,
> vectype_out=,
> vectype_in=)
> at /tmp/trunk/gcc/tree-vect-stmts.c:1666
> 0x01206afc in vectorizable_call (gs=,
> gsi=0x0, vec_stmt=0x0, slp_node=0x2451290)
> at /tmp/trunk/gcc/tree-vect-stmts.c:2762
> 2762ifn = vectorizable_internal_function (cfn, callee,
> vectype_out,
> Value returned is $12 = IFN_POPCOUNT
> 
> so somehow direct_internal_fn_supported_p says true for a POPCOUNTL
> V1DI -> V2SI.  I'd argue the question is odd already but the ultimative answer
> is cleary wrong ;)
> 
> We have
> 
> (gdb) p info
> $22 = (const direct_internal_fn_info &) @0x19b8e0c: {type0 = 0, type1 = 0,
>   vectorizable = 1}
> 
> so require vectype_in as vectype_out which means we ask for V1DI -> V1DI
> (and the vectorizer compensates for the demotion).  But the mode of V1DI is
> actually DI (because the target doesn't support V1DI).
> Thus we end up

[PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Uros Bizjak

Hello!

This patch allows to emit memory_blockage pattern instead of default
asm volatile as a memory blockage. This patch is needed, so targets
(e.g. x86) can define and emit more optimal memory blockage pseudo
insn. And let's call scheduler memory barriers a "memory blockage"
pseudo insn, not "memory barrier" which should describe real
instruction.

2017-09-05  Uros Bizjak  

* optabs.c (expand_memory_blockage): New function.
(expand_asm_memory_barrier): Rename ...
(expand_asm_memory_blockage): ... to this.
(expand_mem_thread_fence): Call expand_memory_blockage
instead of expand_asm_memory_barrier.
(expand_mem_singnal_fence): Ditto.
(expand_atomic_load): Ditto.
(expand_atomic_store): Ditto.
* doc/md.texi (Standard Pattern Names For Generation):
Document memory_blockage instruction pattern.

Bootstrapped on x86_64-linux-gnu, regression test (with additional x86
patch that fixes recent optimization regression with FP atomic loads)
is in progress.

OK for mainline?

Uros.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 14aab9474bc2..df4dc8ccd0e1 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6734,6 +6734,13 @@ scheduler and other passes from moving instructions and 
using register
 equivalences across the boundary defined by the blockage insn.
 This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.
 
+@cindex @code{memmory_blockage} instruction pattern
+@item @samp{memory_blockage}
+This pattern defines a pseudo insn that prevents the instruction
+scheduler and other passes from moving instructions accessing memory
+across the boundary defined by the blockage insn.  This instruction
+needs to read and write volatile BLKmode memory.
+
 @cindex @code{memory_barrier} instruction pattern
 @item @samp{memory_barrier}
 If the target memory model is not fully synchronous, then this pattern
diff --git a/gcc/optabs.c b/gcc/optabs.c
index b65707080eee..c3b1bc848bf7 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6276,10 +6276,10 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx 
*ptarget_oval,
   return true;
 }
 
-/* Generate asm volatile("" : : : "memory") as the memory barrier.  */
+/* Generate asm volatile("" : : : "memory") as the memory blockage.  */
 
 static void
-expand_asm_memory_barrier (void)
+expand_asm_memory_blockage (void)
 {
   rtx asm_op, clob;
 
@@ -6295,6 +6295,18 @@ expand_asm_memory_barrier (void)
   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
 }
 
+/* Do not schedule instructions accessing memory across this point.  */
+
+static void
+expand_memory_blockage (void)
+{
+#ifdef HAVE_memory_blockage
+  emit_insn (gen_memory_blockage ());
+#else
+  expand_asm_memory_blockage ();
+#endif
+}
+
 /* This routine will either emit the mem_thread_fence pattern or issue a 
sync_synchronize to generate a fence for memory model MEMMODEL.  */
 
@@ -6306,14 +6318,14 @@ expand_mem_thread_fence (enum memmodel model)
   if (targetm.have_mem_thread_fence ())
 {
   emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
-  expand_asm_memory_barrier ();
+  expand_memory_blockage ();
 }
   else if (targetm.have_memory_barrier ())
 emit_insn (targetm.gen_memory_barrier ());
   else if (synchronize_libfunc != NULL_RTX)
 emit_library_call (synchronize_libfunc, LCT_NORMAL, VOIDmode);
   else
-expand_asm_memory_barrier ();
+expand_memory_blockage ();
 }
 
 /* Emit a signal fence with given memory model.  */
@@ -6324,7 +6336,7 @@ expand_mem_signal_fence (enum memmodel model)
   /* No machine barrier is required to implement a signal fence, but
  a compiler memory barrier must be issued, except for relaxed MM.  */
   if (!is_mm_relaxed (model))
-expand_asm_memory_barrier ();
+expand_memory_blockage ();
 }
 
 /* This function expands the atomic load operation:
@@ -6346,7 +6358,7 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel 
model)
   struct expand_operand ops[3];
   rtx_insn *last = get_last_insn ();
   if (is_mm_seq_cst (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
 
   create_output_operand (&ops[0], target, mode);
   create_fixed_operand (&ops[1], mem);
@@ -6354,7 +6366,7 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel 
model)
   if (maybe_expand_insn (icode, 3, ops))
{
  if (!is_mm_relaxed (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
  return ops[0].value;
}
   delete_insns_since (last);
@@ -6404,14 +6416,14 @@ expand_atomic_store (rtx mem, rtx val, enum memmodel 
model, bool use_release)
 {
   rtx_insn *last = get_last_insn ();
   if (!is_mm_relaxed (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
   create_fixed_operand (&ops[0], mem);
   create_input_operand (&ops[1], val, mode);
   create_integer_operand (&ops[2], model);
   if (maybe_expand_insn (

Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-09-05 Thread Alexander Monakov

On Mon, 4 Sep 2017, Uros Bizjak wrote:
> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
> 
> New failures:
> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps

Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
I can do all of the above if you agree.

I think RTL peepholes are a poor way of fixing the original problem, which
actually exists on all targets with separate int/fp registers.  For instance,
trunk (without my patch) still gets a far simpler testcase wrong (-O2, 64-bit):

float f(_Atomic float *p)
{
  return *p;
}

f:
movl(%rdi), %eax
movl%eax, -4(%rsp)
movss   -4(%rsp), %xmm0
ret

I believe adding more peepholes to handle this, independently in each affected
backend, is hardly appropriate.

The issue is caused by translation of floating-point atomic loads/stores to a
sequence of integer atomic and a VIEW_CONVERT_EXPR on GIMPLE, which in turn
causes the load to be emitted (wrongly) in an integer mode on RTL.  I don't
know if GIMPLE atomic builtins can work with floating operands (which IMHO
would be a preferable way), but if not, then special-casing a sequence of
integer atomic mem op and a VCE at expand time might be a cleaner way of
achieving the desired optimization.

Thanks.
Alexander

Re: [PATCH] Add noexcept to shared_ptr owner comparisons (LWG 2873)

2017-09-05 Thread Jonathan Wakely


On 05/09/17 08:30 +0200, Christophe Lyon wrote:

Hi Jonathan

On 5 June 2017 at 11:34, Jonathan Wakely  wrote:

C++17 requires these to be noexcept, and there's no reason not to do
it for earlier standard modes too.

* include/bits/shared_ptr_base.h (__shared_ptr::owner_before)
(__weak_ptr::owner_before, _Sp_owner_less::operator()): Add noexcept
specifiers as per LWG 2873 and LWG 2942.
* testsuite/20_util/owner_less/noexcept.cc: New.
* testsuite/20_util/shared_ptr/observers/owner_before.cc: Test
noexcept guarantees.
* testsuite/20_util/weak_ptr/observers/owner_before.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.



I've noticed you have backported this patch to gcc-6-branch (r251673).
The new test testsuite/20_util/owner_less/noexcept.cc fails with:
/libstdc++-v3/testsuite/20_util/owner_less/noexcept.cc:34: error:
aggregate 'const std::owner_less ov' has incomplete type and
cannot be defined


Huh, I fixed that in the gcc-5-branch backport, but not gcc-6-branch.

Fixed by this patch.


commit 3c6ae9c30e86263cf764cef946b8783810583079
Author: Jonathan Wakely 
Date:   Tue Sep 5 10:31:05 2017 +0100

Remove owner_less test that fails on gcc-6-branch

* testsuite/20_util/owner_less/noexcept.cc: Remove owner_less
tests.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 8a87b7b155a..d88a08f2452 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2017-09-05  Jonathan Wakely  
+
+	* testsuite/20_util/owner_less/noexcept.cc: Remove owner_less
+	tests.
+
 2017-09-04  Jonathan Wakely  
 
 	Backport from mainline
diff --git a/libstdc++-v3/testsuite/20_util/owner_less/noexcept.cc b/libstdc++-v3/testsuite/20_util/owner_less/noexcept.cc
index 25c9afde8e1..fcf5d4f2679 100644
--- a/libstdc++-v3/testsuite/20_util/owner_less/noexcept.cc
+++ b/libstdc++-v3/testsuite/20_util/owner_less/noexcept.cc
@@ -29,12 +29,3 @@ const std::owner_less> owi;
 static_assert( noexcept(owi(wi, wi)), "" );
 static_assert( noexcept(owi(si, wi)), "" );
 static_assert( noexcept(owi(wi, si)), "" );
-const std::shared_ptr sl;
-const std::weak_ptr wc;
-const std::owner_less ov;
-static_assert( noexcept(ov(si, si)), "" );
-static_assert( noexcept(ov(si, sl)), "" );
-static_assert( noexcept(ov(sl, si)), "" );
-static_assert( noexcept(ov(si, wc)), "" );
-static_assert( noexcept(ov(wc, si)), "" );
-static_assert( noexcept(ov(wc, wi)), "" );

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-09-05 Thread Wilco Dijkstra

Kyrill Tkachov wrote:

> I like the simplifications in the selection logic here :)
> However, changing the value for ARM from 6 to 4 looks a bit arbitrary to me.
> There's probably a reason why default values for ARM and Thumb-2 are 
> different
> (maybe not a good one) and I'd rather not change it without some code 
> size data measurements.

To quote myself from the thread:

Long conditional sequences are slow on modern cores - the value 6 for
max_insns_skipped is a few decades out of date as it was meant for ARM2!
Even with -Os the performance loss for larger values is not worth the
small codesize gain (there are many better options to reduce codesize
that actually improve performance at the same time). So using the same
code generation heuristics for ARM and Thumb-2 is a good idea.

A simple codesize comparison on CSiBE shows using 4 rather than 6 for
max_insns_skipped is just 0.07% larger on ARM with -Os. So it's not
obvious that increasing max_insns_skipped in -Os is a useful codesize
optimization...

>So I'd rather not let that hold this cleanup patch though, so this is ok
>  (assuming a normal bootstrap and testing cycle) without changing the 6 
> to a 4
> and you can propose a change to 4 as a separate patch that can be 
> discussed on its own.

Based on the above is that really needed? What specific problem do you
expect to occur with the value 4? 

Wilco

Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-09-05 Thread Uros Bizjak

On Tue, Sep 5, 2017 at 12:28 PM, Alexander Monakov  wrote:
> On Mon, 4 Sep 2017, Uros Bizjak wrote:
>> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
>>
>> New failures:
>> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
>> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
>
> Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
> fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
> I can do all of the above if you agree.

No, I have a solution for the regression. A prerequisite is patch at [1].

[1] https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00239.html

Uros.

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Alexander Monakov

On Tue, 5 Sep 2017, Uros Bizjak wrote:
> This patch allows to emit memory_blockage pattern instead of default
> asm volatile as a memory blockage. This patch is needed, so targets
> (e.g. x86) can define and emit more optimal memory blockage pseudo
> insn.

Optimal in what sense?  What pattern do you intend to use on x86, and
would any target be able to use the same?

> And let's call scheduler memory barriers a "memory blockage"
> pseudo insn, not "memory barrier" which should describe real
> instruction.

Note this is not about scheduling, but all RTL passes.  This (pseudo-)
instruction is meant to prevent all memory movement across it, including
RTL CSE, RTL DSE, etc.

Alexander

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Richard Biener

On Tue, 5 Sep 2017, Tamar Christina wrote:

> Hi Richard,
> 
> That was an really interesting analysis, thanks for the details!
> 
> Would you be submitting the patch you proposed at the end as a fix?

I'm testing it currently.

Richard.

> Thanks,
> Tamar
> 
> > -Original Message-
> > From: Richard Biener [mailto:rguent...@suse.de]
> > Sent: 05 September 2017 10:38
> > To: Andrew Pinski
> > Cc: Tamar Christina; Andreas Schwab; Jon Beniston; gcc-
> > patc...@gcc.gnu.org; nd
> > Subject: Re: [RFC, vectorizer] Allow single element vector types for vector
> > reduction operations
> > 
> > On Mon, 4 Sep 2017, Andrew Pinski wrote:
> > 
> > > On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina
> >  wrote:
> > > >> >   vect__5.25_58 = VIEW_CONVERT_EXPR > > >> intD.11>(vect__4.21_65);
> > > >> >   vect__5.25_57 = VIEW_CONVERT_EXPR > > >> intD.11>(vect__4.22_63);
> > > >> >   vect__5.25_56 = VIEW_CONVERT_EXPR > > >> intD.11>(vect__4.23_61);
> > > >> >   vect__5.25_55 = VIEW_CONVERT_EXPR > > >> > intD.11>(vect__4.24_59);
> > > >> >
> > > >> > I suspect this patch will be quite bad for us performance wise as
> > > >> > it thinks it's as cheap to do all our integer operations on the
> > > >> > vector side with
> > > >> vectors of 1 element. But I'm still waiting for the perf numbers to
> > confirm.
> > > >>
> > > >> Looks like the backend advertises that it can do POPCOUNT on V1DI.
> > > >> So SLP vectorization decides it can vectorize this without unrolling.
> > > >
> > > > We don't, POPCOUNT is only defined for vector modes V8QI and V16QI,
> > > > we also don't define support For V1DI anywhere in the backend, we do
> > > > however say we support V1DF, but removing That doesn't cause the ICE
> > to go away.
> > > >
> > > >> Vectorization with V2DI is rejected:
> > > >>
> > > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function
> > > >> is not vectorizable.
> > > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not
> > vectorized:
> > > >> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
> > > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: *
> > > >> Re-trying analysis with vector size 8
> > > >>
> > > >> and that now succeeds (it probably didn't succeed before the patch).
> > > >
> > > > In the .optimized file, I see it vectorised it to
> > > >
> > > >   vect__5.25_58 = VIEW_CONVERT_EXPR > intD.11>(vect__4.21_65);
> > > >   vect__5.25_57 = VIEW_CONVERT_EXPR > intD.11>(vect__4.22_63);
> > > >   vect__5.25_56 = VIEW_CONVERT_EXPR > intD.11>(vect__4.23_61);
> > > >   vect__5.25_55 = VIEW_CONVERT_EXPR > intD.11>(vect__4.24_59);
> > > >   _54 = POPCOUNT (vect__5.25_58);
> > > >   _53 = POPCOUNT (vect__5.25_57);
> > > >
> > > > Which is something we just don't have a pattern for. Before this patch, 
> > > > it
> > was rejecting "long unsigned int"
> > > > With this patch is somehow thinks we support an integer vector of 1
> > > > element, even though 1) we don't have an optab Defined for this
> > operation for POPCOUNT (or at all in aarch64 as far as I can tell), and 2) 
> > we
> > don't have it in our supported list of vector modes.
> > >
> > > Here are the two popcount optab aarch64 has:
> > > (define_mode_iterator GPI [SI DI])
> > > (define_expand "popcount2"
> > >   [(match_operand:GPI 0 "register_operand")
> > >(match_operand:GPI 1 "register_operand")]
> > >
> > >
> > > (define_insn "popcount2"
> > > (define_mode_iterator VB [V8QI V16QI])
> > >   [(set (match_operand:VB 0 "register_operand" "=w")
> > > (popcount:VB (match_operand:VB 1 "register_operand" "w")))]
> > >
> > > As you can see we only define popcount optab for SI, DI, V8QI and
> > > V16QI.  (note SI and DI uses the V8QI and V16QI during the expansion
> > > but that is a different story).
> > >
> > > Maybe somehow the vectorizer is thinking V1DI and DI are
> > > interchangeable in some places.
> > 
> > We ask
> > 
> > Breakpoint 5, vectorizable_internal_function
> > (cfn=CFN_BUILT_IN_POPCOUNTL,
> > fndecl=,
> > vectype_out=,
> > vectype_in=)
> > at /tmp/trunk/gcc/tree-vect-stmts.c:1666
> > 1666  if (internal_fn_p (cfn))
> > (gdb) p debug_generic_expr (vectype_out)
> > vector(2) int
> > $10 = void
> > (gdb) p debug_generic_expr (vectype_in)
> > vector(1) long unsigned int
> > $11 = void
> > (gdb) fin
> > Run till exit from #0  vectorizable_internal_function (
> > cfn=CFN_BUILT_IN_POPCOUNTL,
> > fndecl=,
> > vectype_out=,
> > vectype_in=)
> > at /tmp/trunk/gcc/tree-vect-stmts.c:1666
> > 0x01206afc in vectorizable_call (gs=,
> > gsi=0x0, vec_stmt=0x0, slp_node=0x2451290)
> > at /tmp/trunk/gcc/tree-vect-stmts.c:2762
> > 2762ifn = vectorizable_internal_function (cfn, callee,
> > vectype_out,
> > Value returned is $12 = IFN_POPCOUNT
> > 
> > so somehow direct_internal_fn_supported_p says true for a POPCOUNTL
> > V1DI -> V2SI.  I'd argue the question is odd already but the ultimative 
> > answer
> > is cleary wrong ;)
> >

Re: [00/77] Add wrapper classes for machine_modes

2017-09-05 Thread Eric Gallager

On 9/3/17, Gerald Pfeifer  wrote:
> Hi Richard,
>
> I'm afraid your patchset has broken bootstrap on i686-unknown-freebsd10.3,
> in fact, it appears on FreeBSD in general (amd64-unknown-freebsd11 as
> well):
>
>
>   /scratch/tmp/gerald/GCC-HEAD/gcc/builtins.c:4913:6: error: cannot pass
>   object of non-POD type 'scalar_int_mode' through variadic function; call
>   will abort at runtime [-Wnon-pod-varargs]
>
>  ptr_mode, bot, ptr_mode);
>  ^
>
>   /scratch/tmp/gerald/GCC-HEAD/gcc/builtins.c:4913:21: error: cannot pass
>   object of non-POD type 'scalar_int_mode' through variadic function; call
>   will abort at runtime [-Wnon-pod-varargs]
>
>  ptr_mode, bot, ptr_mode);
> ^
>
>
> Digging into this, I believe this is due to clang 3.4.1 serving as system
> compiler (which probably is why nobody else reported this so far):
>
>   c++ -std=gnu++98 -fno-PIE -c -g -DIN_GCC -fno-strict-aliasing
>   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
>   -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
>   -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long
>   -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H
>   :
>   -o calls.o -MT calls.o -MMD -MP -MF ./.deps/calls.TPo
>   /scratch/tmp/gerald/GCC-HEAD/gcc/calls.c
>
> Gerald
>

Note that bug 64867 exists to add this warning for gcc, too:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64867

Re: [PATCH] [PR79542][Ada] Fix ICE in dwarf2out.c with nested func. inlining

2017-09-05 Thread Pierre-Marie de Rodat


On 09/04/2017 11:26 AM, Richard Biener wrote:

No more pending issues and yes, I guess the fix is ok for the branch.


Ok, thanks! This is now comitted on the 7 release branch.

--
Pierre-Marie de Rodat

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Uros Bizjak

On Tue, Sep 5, 2017 at 12:35 PM, Alexander Monakov  wrote:
> On Tue, 5 Sep 2017, Uros Bizjak wrote:
>> This patch allows to emit memory_blockage pattern instead of default
>> asm volatile as a memory blockage. This patch is needed, so targets
>> (e.g. x86) can define and emit more optimal memory blockage pseudo
>> insn.
>
> Optimal in what sense?  What pattern do you intend to use on x86, and
> would any target be able to use the same?

You don't have to emit a generic asm-like pattern. This is the same
situation as with blockage insn, where targets can emit "blockage"
instead of generic asm insn.

x86 defines memory_blockage as:

(define_expand "memory_blockage"
  [(set (match_dup 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
  ""
{
  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
  MEM_VOLATILE_P (operands[0]) = 1;
})

However, this definition can't be generic, since unspec is used.

>> And let's call scheduler memory barriers a "memory blockage"
>> pseudo insn, not "memory barrier" which should describe real
>> instruction.
>
> Note this is not about scheduling, but all RTL passes.  This (pseudo-)
> instruction is meant to prevent all memory movement across it, including
> RTL CSE, RTL DSE, etc.

Yes, the above insn satisfies all mentioned use cases.

Oh, I noticed that I attach the wrong version of the patch. Correct
version attached.

Uros.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 14aab9474bc2..df4dc8ccd0e1 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6734,6 +6734,13 @@ scheduler and other passes from moving instructions and 
using register
 equivalences across the boundary defined by the blockage insn.
 This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.
 
+@cindex @code{memmory_blockage} instruction pattern
+@item @samp{memory_blockage}
+This pattern defines a pseudo insn that prevents the instruction
+scheduler and other passes from moving instructions accessing memory
+across the boundary defined by the blockage insn.  This instruction
+needs to read and write volatile BLKmode memory.
+
 @cindex @code{memory_barrier} instruction pattern
 @item @samp{memory_barrier}
 If the target memory model is not fully synchronous, then this pattern
diff --git a/gcc/optabs.c b/gcc/optabs.c
index b65707080eee..5821d2a9547c 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6276,10 +6276,10 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx 
*ptarget_oval,
   return true;
 }
 
-/* Generate asm volatile("" : : : "memory") as the memory barrier.  */
+/* Generate asm volatile("" : : : "memory") as the memory blockage.  */
 
 static void
-expand_asm_memory_barrier (void)
+expand_asm_memory_blockage (void)
 {
   rtx asm_op, clob;
 
@@ -6295,6 +6295,21 @@ expand_asm_memory_barrier (void)
   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
 }
 
+#ifndef HAVE_memory_blockage
+#define HAVE_memory_blockage 0
+#endif
+
+/* Do not schedule instructions accessing memory across this point.  */
+
+static void
+expand_memory_blockage (void)
+{
+  if (HAVE_memory_blockage)
+emit_insn (gen_memory_blockage ());
+  else
+expand_asm_memory_blockage ();
+}
+
 /* This routine will either emit the mem_thread_fence pattern or issue a 
sync_synchronize to generate a fence for memory model MEMMODEL.  */
 
@@ -6306,14 +6321,14 @@ expand_mem_thread_fence (enum memmodel model)
   if (targetm.have_mem_thread_fence ())
 {
   emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
-  expand_asm_memory_barrier ();
+  expand_memory_blockage ();
 }
   else if (targetm.have_memory_barrier ())
 emit_insn (targetm.gen_memory_barrier ());
   else if (synchronize_libfunc != NULL_RTX)
 emit_library_call (synchronize_libfunc, LCT_NORMAL, VOIDmode);
   else
-expand_asm_memory_barrier ();
+expand_memory_blockage ();
 }
 
 /* Emit a signal fence with given memory model.  */
@@ -6324,7 +6339,7 @@ expand_mem_signal_fence (enum memmodel model)
   /* No machine barrier is required to implement a signal fence, but
  a compiler memory barrier must be issued, except for relaxed MM.  */
   if (!is_mm_relaxed (model))
-expand_asm_memory_barrier ();
+expand_memory_blockage ();
 }
 
 /* This function expands the atomic load operation:
@@ -6346,7 +6361,7 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel 
model)
   struct expand_operand ops[3];
   rtx_insn *last = get_last_insn ();
   if (is_mm_seq_cst (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
 
   create_output_operand (&ops[0], target, mode);
   create_fixed_operand (&ops[1], mem);
@@ -6354,7 +6369,7 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel 
model)
   if (maybe_expand_insn (icode, 3, ops))
{
  if (!is_mm_relaxed (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
  return ops[0].value;
}
   delete_insns_since

Re: [C++/ARM Patch] PR 81942 ("ICE on empty constexpr constructor with C++14")

2017-09-05 Thread Nathan Sidwell


On 09/05/2017 06:19 AM, Paolo Carlini wrote:

in this ICE on valid, a gcc_assert fires when a GOTO_EXPR is handled by 
cxx_eval_constant_expression which is the translation of a "return;" on 
a targetm.cxx.cdtor_returns_this target (like ARM):


I think the right way to handle this is marking such special labels with 
a LABEL_DECL_CDTOR flag and using it in the returns helper function (we 
already use a similar strategy with LABEL_DECL_BREAK and 
LABEL_DECL_CONTINUE and the breaks and continues helpers). Then 
adjusting the ICEing gcc_assert is trivial. Tested x86_64-linux and 
aarch64-linux.


OK.

(heh, i notice we call the hook 'cdtor_returns_this', but AFAICT it only 
applies to ctors.  Not your problem though.)


nathan

--
Nathan Sidwell

Re: [1/9] Make more use of int_mode_for_mode

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:26 PM, Richard Sandiford
 wrote:
> This patch converts more places that could use int_mode_for_mode
> instead of mode_for_size.  This is in preparation for an upcoming
> patch that makes mode_for_size itself return an opt_mode.
>
> The reason for using required () in exp2_immediate_p is that
> we go on to do:
>
> trunc_int_for_mode (..., int_mode)
>
> which would be invalid for (and have failed for) BLKmode.
>
> The reason for using required () in spu_convert_move and
> resolve_simple_move is that we go on to use registers of
> the returned mode in non-call rtl instructions, which would
> be invalid for BLKmode.

Ok.

Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * config/spu/spu.c (exp2_immediate_p): Use int_mode_for_mode.
> (spu_convert_move): Likewise.
> * lower-subreg.c (resolve_simple_move): Likewise.
>
> Index: gcc/config/spu/spu.c
> ===
> --- gcc/config/spu/spu.c2017-09-04 11:50:24.563372530 +0100
> +++ gcc/config/spu/spu.c2017-09-04 12:18:41.572976650 +0100
> @@ -3372,7 +3372,7 @@ arith_immediate_p (rtx op, machine_mode
>constant_to_array (mode, op, arr);
>
>bytes = GET_MODE_UNIT_SIZE (mode);
> -  mode = mode_for_size (GET_MODE_UNIT_BITSIZE (mode), MODE_INT, 0);
> +  mode = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
>
>/* Check that bytes are repeated. */
>for (i = bytes; i < 16; i += bytes)
> @@ -3415,7 +3415,7 @@ exp2_immediate_p (rtx op, machine_mode m
>mode = GET_MODE_INNER (mode);
>
>bytes = GET_MODE_SIZE (mode);
> -  int_mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0);
> +  int_mode = int_mode_for_mode (mode).require ();
>
>/* Check that bytes are repeated. */
>for (i = bytes; i < 16; i += bytes)
> @@ -4503,7 +4503,7 @@ spu_expand_mov (rtx * ops, machine_mode
>  spu_convert_move (rtx dst, rtx src)
>  {
>machine_mode mode = GET_MODE (dst);
> -  machine_mode int_mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 
> 0);
> +  machine_mode int_mode = int_mode_for_mode (mode).require ();
>rtx reg;
>gcc_assert (GET_MODE (src) == TImode);
>reg = int_mode != mode ? gen_reg_rtx (int_mode) : dst;
> Index: gcc/lower-subreg.c
> ===
> --- gcc/lower-subreg.c  2017-09-04 11:50:08.544543511 +0100
> +++ gcc/lower-subreg.c  2017-09-04 12:18:41.572976650 +0100
> @@ -956,11 +956,7 @@ resolve_simple_move (rtx set, rtx_insn *
>if (real_dest == NULL_RTX)
> real_dest = dest;
>if (!SCALAR_INT_MODE_P (dest_mode))
> -   {
> - dest_mode = mode_for_size (GET_MODE_SIZE (dest_mode) * 
> BITS_PER_UNIT,
> -MODE_INT, 0);
> - gcc_assert (dest_mode != BLKmode);
> -   }
> +   dest_mode = int_mode_for_mode (dest_mode).require ();
>dest = gen_reg_rtx (dest_mode);
>if (REG_P (real_dest))
> REG_ATTRS (dest) = REG_ATTRS (real_dest);

Re: [2/9] Make more use of int_mode_for_size

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:31 PM, Richard Sandiford
 wrote:
> This patch converts more places that could use int_mode_for_size instead
> of mode_for_size.  This is in preparation for an upcoming patch that
> makes mode_for_size itself return an opt_mode.
>
> require () seems like the right choice in expand_builtin_powi
> because we have got past the point of backing out.  We go on to do:
>
>   op1 = expand_expr (arg1, NULL_RTX, mode2, EXPAND_NORMAL);
>   if (GET_MODE (op1) != mode2)
> op1 = convert_to_mode (mode2, op1, 0);
>
> which would be invalid for (and have failed for) BLKmode.
>
> In get_builtin_sync_mode and expand_ifn_atomic_compare_exchange,
> the possible bitsizes are {8, 16, 32, 64, 128}, all of which give
> target-independent integer modes (up to TImode).  The comment above
> the call in get_builtin_sync_mode makes clear that an integer mode
> must be found.
>
> We can use require () in expand_builtin_atomic_clear and
> expand_builtin_atomic_test_and_set because there's always an integer
> mode for the boolean type.  The same goes for the POINTER_SIZE request
> in layout_type.  Similarly we can use require () in combine_instructions
> and gen_lowpart_common because there's always an integer mode for
> HOST_BITS_PER_WIDE_INT (DImode when BITS_PER_UNIT == 8), and
> HOST_BITS_PER_DOUBLE_INT (TImode).
>
> The calls in aarch64_function_value, arm_function_value,
> aapcs_allocate_return_reg and mips_function_value_1 are handling
> cases in which a big-endian target passes or returns values at
> the most significant end of a register.  In each case the ABI
> constrains the size to a small amount and does not handle
> non-power-of-2 sizes wider than a word.
>
> The calls in c6x_expand_movmem, i386.c:emit_memset,
> lm32_block_move_inline, microblaze_block_move_straight and
> mips_block_move_straight are dealing with expansions of
> block memory operations using register-wise operations,
> and those registers must have non-BLK mode.
>
> The reason for using require () in ix86_expand_sse_cmp,
> mips_expand_ins_as_unaligned_store, spu.c:adjust_operand and
> spu_emit_branch_and_set is that we go on to emit non-call
> instructions that use registers of that mode, which wouldn't
> be valid for BLKmode.

Ok.

Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * builtins.c (expand_builtin_powi): Use int_mode_for_size.
> (get_builtin_sync_mode): Likewise.
> (expand_ifn_atomic_compare_exchange): Likewise.
> (expand_builtin_atomic_clear): Likewise.
> (expand_builtin_atomic_test_and_set): Likewise.
> (fold_builtin_atomic_always_lock_free): Likewise.
> * calls.c (compute_argument_addresses): Likewise.
> (emit_library_call_value_1): Likewise.
> (store_one_arg): Likewise.
> * combine.c (combine_instructions): Likewise.
> * config/aarch64/aarch64.c (aarch64_function_value): Likewise.
> * config/arm/arm.c (arm_function_value): Likewise.
> (aapcs_allocate_return_reg): Likewise.
> * config/c6x/c6x.c (c6x_expand_movmem): Likewise.
> * config/i386/i386.c (construct_container): Likewise.
> (ix86_gimplify_va_arg): Likewise.
> (ix86_expand_sse_cmp): Likewise.
> (emit_memmov): Likewise.
> (emit_memset): Likewise.
> (expand_small_movmem_or_setmem): Likewise.
> (ix86_expand_pextr): Likewise.
> (ix86_expand_pinsr): Likewise.
> * config/lm32/lm32.c (lm32_block_move_inline): Likewise.
> * config/microblaze/microblaze.c (microblaze_block_move_straight):
> Likewise.
> * config/mips/mips.c (mips_function_value_1) Likewise.
> (mips_block_move_straight): Likewise.
> (mips_expand_ins_as_unaligned_store): Likewise.
> * config/powerpcspe/powerpcspe.c
> (rs6000_darwin64_record_arg_advance_flush): Likewise.
> (rs6000_darwin64_record_arg_flush): Likewise.
> * config/rs6000/rs6000.c
> (rs6000_darwin64_record_arg_advance_flush): Likewise.
> (rs6000_darwin64_record_arg_flush): Likewise.
> * config/sparc/sparc.c (sparc_function_arg_1): Likewise.
> (sparc_function_value_1): Likewise.
> * config/spu/spu.c (adjust_operand): Likewise.
> (spu_emit_branch_or_set): Likewise.
> (arith_immediate_p): Likewise.
> * emit-rtl.c (gen_lowpart_common): Likewise.
> * expr.c (expand_expr_real_1): Likewise.
> * function.c (assign_parm_setup_block): Likewise.
> * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
> * reload1.c (alter_reg): Likewise.
> * stor-layout.c (mode_for_vector): Likewise.
> (layout_type): Likewise.
>
> gcc/ada/
> * gcc-interface/utils2.c (build_load_modify_store):
> Use int_mode_for_size.
>
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c  2017-09-04 08:30:09.328308115 +0100
> +++

Re: [3/9] (decimal_)float_mode_for_size in real.h

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:31 PM, Richard Sandiford
 wrote:
> This patch makes the binary float macros in real.h use
> float_mode_for_size and adds a corresponding decimal_float_mode_for_size
> for the decimal macros.

Ok.

Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * machmode.h (decimal_float_mode_for_size): New function.
> * real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use float_mode_for_size.
> (REAL_VALUE_TO_TARGET_DOUBLE): Likewise.
> (REAL_VALUE_TO_TARGET_SINGLE): Likewise.
> (REAL_VALUE_TO_TARGET_DECIMAL128): Use decimal_float_mode_for_size.
> (REAL_VALUE_TO_TARGET_DECIMAL64): Likewise.
> (REAL_VALUE_TO_TARGET_DECIMAL32): Likewise.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-08-30 12:20:57.010045759 +0100
> +++ gcc/machmode.h  2017-09-04 12:18:47.820398622 +0100
> @@ -652,6 +652,15 @@ float_mode_for_size (unsigned int size)
>return dyn_cast  (mode_for_size (size, MODE_FLOAT, 0));
>  }
>
> +/* Likewise for MODE_DECIMAL_FLOAT.  */
> +
> +inline opt_scalar_float_mode
> +decimal_float_mode_for_size (unsigned int size)
> +{
> +  return dyn_cast 
> +(mode_for_size (size, MODE_DECIMAL_FLOAT, 0));
> +}
> +
>  /* Similar to mode_for_size, but find the smallest mode for a given width.  
> */
>
>  extern machine_mode smallest_mode_for_size (unsigned int, enum mode_class);
> Index: gcc/real.h
> ===
> --- gcc/real.h  2017-08-30 12:09:02.416468293 +0100
> +++ gcc/real.h  2017-09-04 12:18:47.820398622 +0100
> @@ -383,27 +383,28 @@ #define REAL_VALUE_MINUS_ZERO(x)  real_is
>  /* IN is a REAL_VALUE_TYPE.  OUT is an array of longs.  */
>  #define REAL_VALUE_TO_TARGET_LONG_DOUBLE(IN, OUT)  \
>real_to_target (OUT, &(IN),  \
> - mode_for_size (LONG_DOUBLE_TYPE_SIZE, MODE_FLOAT, 0))
> + float_mode_for_size (LONG_DOUBLE_TYPE_SIZE).require ())
>
>  #define REAL_VALUE_TO_TARGET_DOUBLE(IN, OUT) \
> -  real_to_target (OUT, &(IN), mode_for_size (64, MODE_FLOAT, 0))
> +  real_to_target (OUT, &(IN), float_mode_for_size (64).require ())
>
>  /* IN is a REAL_VALUE_TYPE.  OUT is a long.  */
>  #define REAL_VALUE_TO_TARGET_SINGLE(IN, OUT) \
> -  ((OUT) = real_to_target (NULL, &(IN), mode_for_size (32, MODE_FLOAT, 0)))
> +  ((OUT) = real_to_target (NULL, &(IN), float_mode_for_size (32).require ()))
>
>  /* Real values to IEEE 754 decimal floats.  */
>
>  /* IN is a REAL_VALUE_TYPE.  OUT is an array of longs.  */
>  #define REAL_VALUE_TO_TARGET_DECIMAL128(IN, OUT) \
> -  real_to_target (OUT, &(IN), mode_for_size (128, MODE_DECIMAL_FLOAT, 0))
> +  real_to_target (OUT, &(IN), decimal_float_mode_for_size (128).require ())
>
>  #define REAL_VALUE_TO_TARGET_DECIMAL64(IN, OUT) \
> -  real_to_target (OUT, &(IN), mode_for_size (64, MODE_DECIMAL_FLOAT, 0))
> +  real_to_target (OUT, &(IN), decimal_float_mode_for_size (64).require ())
>
>  /* IN is a REAL_VALUE_TYPE.  OUT is a long.  */
>  #define REAL_VALUE_TO_TARGET_DECIMAL32(IN, OUT) \
> -  ((OUT) = real_to_target (NULL, &(IN), mode_for_size (32, 
> MODE_DECIMAL_FLOAT, 0)))
> +  ((OUT) = real_to_target (NULL, &(IN), \
> +  decimal_float_mode_for_size (32).require ()))
>
>  extern REAL_VALUE_TYPE real_value_truncate (format_helper, REAL_VALUE_TYPE);
>

Re: [4/9] Make mode_for_size return an opt_mode

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:35 PM, Richard Sandiford
 wrote:
> ...to make it consistent with int_mode_for_size etc.
>
> require () seems like the right choice in replace_reg_with_saved_mem
> because we use the chosen mode for saving and restoring registers,
> which cannot be done in BLKmode.  Similarly require () seems like
> the right choice in calls related to secondary memory reloads (the ones
> in config/, and in get_secondary_mem) because the reload must always
> have a defined mode, which e.g. determines the size of the slot.
>
> We can use require () in simplify_subreg_concatn and assemble_integer
> because it isn't meaningful to create a subreg with BLKmode (for one
> thing, we couldn't tell then whether it was partial, paradoxical, etc.).
>
> make_fract_type and make_accum_type must find a mode because that's
> what distinguishes accumulator FIXED_POINT_TYPEs from fractional
> FIXED_POINT_TYPEs.

Ok.

Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * machmode.h (opt_machine_mode): New type.
> (opt_mode): Allow construction from anything that can be
> converted to a T.
> (is_a, as_a, dyn_cast): Add overloads for opt_mode.
> (mode_for_size): Return an opt_machine_mode.
> * stor-layout.c (mode_for_size): Likewise.
> (mode_for_size_tree): Update call accordingly.
> (bitwise_mode_for_mode): Likewise.
> (make_fract_type): Likewise.
> (make_accum_type): Likewise.
> * caller-save.c (replace_reg_with_saved_mem): Update call
> accordingly.
> * config/alpha/alpha.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
> * config/i386/i386.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
> * config/s390/s390.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
> * config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
> * expmed.c (extract_bit_field_1): Likewise.
> * reload.c (get_secondary_mem): Likewise.
> * varasm.c (assemble_integer): Likewise.
> * lower-subreg.c (simplify_subreg_concatn): Likewise.  Move
> early-out.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-09-04 12:18:47.820398622 +0100
> +++ gcc/machmode.h  2017-09-04 12:18:50.674859598 +0100
> @@ -20,6 +20,8 @@ Software Foundation; either version 3, o
>  #ifndef HAVE_MACHINE_MODES
>  #define HAVE_MACHINE_MODES
>
> +typedef opt_mode opt_machine_mode;
> +
>  extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
>  extern const unsigned short mode_precision[NUM_MACHINE_MODES];
>  extern const unsigned char mode_inner[NUM_MACHINE_MODES];
> @@ -237,6 +239,8 @@ #define POINTER_BOUNDS_MODE_P(MODE)
>
>ALWAYS_INLINE opt_mode () : m_mode (E_VOIDmode) {}
>ALWAYS_INLINE opt_mode (const T &m) : m_mode (m) {}
> +  template
> +  ALWAYS_INLINE opt_mode (const U &m) : m_mode (T (m)) {}
>ALWAYS_INLINE opt_mode (from_int m) : m_mode (machine_mode (m)) {}
>
>machine_mode else_void () const;
> @@ -325,6 +329,13 @@ is_a (machine_mode m)
>return T::includes_p (m);
>  }
>
> +template
> +inline bool
> +is_a (const opt_mode &m)
> +{
> +  return T::includes_p (m.else_void ());
> +}
> +
>  /* Assert that mode M has type T, and return it in that form.  */
>
>  template
> @@ -335,6 +346,13 @@ as_a (machine_mode m)
>return typename mode_traits::from_int (m);
>  }
>
> +template
> +inline T
> +as_a (const opt_mode &m)
> +{
> +  return as_a  (m.else_void ());
> +}
> +
>  /* Convert M to an opt_mode.  */
>
>  template
> @@ -346,6 +364,13 @@ dyn_cast (machine_mode m)
>return opt_mode ();
>  }
>
> +template
> +inline opt_mode
> +dyn_cast (const opt_mode &m)
> +{
> +  return dyn_cast  (m.else_void ());
> +}
> +
>  /* Return true if mode M has type T, storing it as a T in *RESULT
> if so.  */
>
> @@ -627,11 +652,7 @@ GET_MODE_2XWIDER_MODE (const T &m)
>  extern const unsigned char mode_complex[NUM_MACHINE_MODES];
>  #define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
>
> -/* Return the mode for data of a given size SIZE and mode class CLASS.
> -   If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE.
> -   The value is BLKmode if no other mode is found.  */
> -
> -extern machine_mode mode_for_size (unsigned int, enum mode_class, int);
> +extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
>
>  /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
> exists.  If LIMIT is nonzero, modes wider than MAX_FIXED_MODE_SIZE
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c   2017-09-04 12:18:44.944553324 +0100
> +++ gcc/stor-layout.c   2017-09-04 12:18:50.675762071 +0100
> @@ -291,19 +291,19 @@ finalize_size_functions (void)
>vec_free (size_functions);
>  }
>
> -/* Return the machine mode to use for a nonscalar of SIZE bits.  The
> -   mode must be in class

Re: [5/9] Add mode_for_int_vector helper functions

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:36 PM, Richard Sandiford
 wrote:
> There are at least a few places that want to create an integer vector
> with a specified element size and element count, or to create the
> integer equivalent of an existing mode.  This patch adds helpers
> for doing that.
>
> The require ()s are all used in functions that go on to emit
> instructions that use the result as a vector mode.
>
> 2017-09-04  Richard Sandiford  
>
> gcc/
> * machmode.h (mode_for_int_vector): New function.
> * stor-layout.c (mode_for_int_vector): Likewise.
> * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Use it.
> * config/powerpcspe/powerpcspe.c (rs6000_do_expand_vec_perm): 
> Likewise.
> * config/rs6000/rs6000.c (rs6000_do_expand_vec_perm): Likewise.
> * config/s390/s390.c (s390_expand_vec_compare_cc): Likewise.
> (s390_expand_vcond): Likewise.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-09-04 12:18:50.674859598 +0100
> +++ gcc/machmode.h  2017-09-04 12:18:53.153306182 +0100
> @@ -706,6 +706,21 @@ extern machine_mode bitwise_mode_for_mod
>
>  extern machine_mode mode_for_vector (scalar_mode, unsigned);
>
> +extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
> +
> +/* Return the integer vector equivalent of MODE, if one exists.  In other
> +   words, return the mode for an integer vector that has the same number
> +   of bits as MODE and the same number of elements as MODE, with the
> +   latter being 1 if MODE is scalar.  The returned mode can be either
> +   an integer mode or a vector mode.  */
> +
> +inline opt_machine_mode
> +mode_for_int_vector (machine_mode mode)

So this is similar to int_mode_for_mode which means...

int_vector_mode_for_vector_mode?

> +{

Nothing prevents use with non-vector MODE here, can we place an assert here?

> +  return mode_for_int_vector (GET_MODE_UNIT_BITSIZE (mode),
> + GET_MODE_NUNITS (mode));
> +}
> +
>  /* A class for iterating through possible bitfield modes.  */
>  class bit_field_mode_iterator
>  {
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c   2017-09-04 12:18:50.675762071 +0100
> +++ gcc/stor-layout.c   2017-09-04 12:18:53.153306182 +0100
> @@ -517,6 +517,23 @@ mode_for_vector (scalar_mode innermode,
>return mode;
>  }
>
> +/* Return the mode for a vector that has NUNITS integer elements of
> +   INT_BITS bits each, if such a mode exists.  The mode can be either
> +   an integer mode or a vector mode.  */
> +
> +opt_machine_mode
> +mode_for_int_vector (unsigned int int_bits, unsigned int nunits)

That's more vector_int_mode_for_size (...), no?  Similar to int_mode_for_size
or mode_for_size.

Ok with those renamings.  I wonder if int_vector_mode_for_vector_mode
is necessary -- is calling vector_int_mode_for_size
(GET_MODE_UNIT_BITSIZE (mode),
GET_MODE_NUNITS (mode)) too cumbersome?

> +{
> +  scalar_int_mode int_mode;
> +  if (int_mode_for_size (int_bits, 0).exists (&int_mode))
> +{
> +  machine_mode vec_mode = mode_for_vector (int_mode, nunits);

Uh, so we _do_ have an existing badly named 'mode_for_vector' ...

> +  if (vec_mode != BLKmode)
> +   return vec_mode;
> +}
> +  return opt_machine_mode ();
> +}
> +
>  /* Return the alignment of MODE. This will be bounded by 1 and
> BIGGEST_ALIGNMENT.  */
>
> Index: gcc/config/aarch64/aarch64.c
> ===
> --- gcc/config/aarch64/aarch64.c2017-09-04 12:18:44.874165502 +0100
> +++ gcc/config/aarch64/aarch64.c2017-09-04 12:18:53.144272229 +0100
> @@ -8282,9 +8282,6 @@ aarch64_emit_approx_sqrt (rtx dst, rtx s
>return false;
>  }
>
> -  machine_mode mmsk
> -= mode_for_vector (int_mode_for_mode (GET_MODE_INNER (mode)).require (),
> -  GET_MODE_NUNITS (mode));
>if (!recp)
>  {
>if (!(flag_mlow_precision_sqrt
> @@ -8302,7 +8299,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx s
>  /* Caller assumes we cannot fail.  */
>  gcc_assert (use_rsqrt_p (mode));
>
> -
> +  machine_mode mmsk = mode_for_int_vector (mode).require ();
>rtx xmsk = gen_reg_rtx (mmsk);
>if (!recp)
>  /* When calculating the approximate square root, compare the
> Index: gcc/config/powerpcspe/powerpcspe.c
> ===
> --- gcc/config/powerpcspe/powerpcspe.c  2017-09-04 12:18:44.919414816 +0100
> +++ gcc/config/powerpcspe/powerpcspe.c  2017-09-04 12:18:53.148287319 +0100
> @@ -38739,8 +38739,7 @@ rs6000_do_expand_vec_perm (rtx target, r
>
>imode = vmode;
>if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
> -imode = mode_for_vector
> -  (int_mode_for_mode (GET_MODE_INNER (vmode)).require (), nelt);
> +imode = mode_for_int_vector (vmode).require ();
>
>x =

Re: [7/9] Make targetm.get_mask_mode return an opt_mode

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:41 PM, Richard Sandiford
 wrote:
> ...for consistency with mode_for_vector.

Ok.

Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * target.def (get_mask_mode): Change return type to opt_mode.
> Expand commentary.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_get_mask_mode): Return an opt_mode.
> * targhooks.c (default_get_mask_mode): Likewise.
> * config/i386/i386.c (ix86_get_mask_mode): Likewise.
> * optabs-query.c (can_vec_mask_load_store_p): Update use of
> targetm.get_mask_mode.
> * tree.c (build_truth_vector_type): Likewise.
>
> Index: gcc/target.def
> ===
> --- gcc/target.def  2017-09-04 11:50:24.568774867 +0100
> +++ gcc/target.def  2017-09-04 12:18:58.594757220 +0100
> @@ -1877,10 +1877,16 @@ The default is zero which means to not i
>  /* Function to get a target mode for a vector mask.  */
>  DEFHOOK
>  (get_mask_mode,
> - "This hook returns mode to be used for a mask to be used for a vector\n\
> -of specified @var{length} with @var{nunits} elements.  By default an 
> integer\n\
> -vector mode of a proper size is returned.",
> - machine_mode,
> + "A vector mask is a value that holds one boolean result for every element\n\
> +in a vector.  This hook returns the machine mode that should be used to\n\
> +represent such a mask when the vector in question is @var{length} bytes\n\
> +long and contains @var{nunits} elements.  The hook returns an empty\n\
> +@code{opt_machine_mode} if no such mode exists.\n\
> +\n\
> +The default implementation returns the mode of an integer vector that\n\
> +is @var{length} bytes long and that contains @var{nunits} elements,\n\
> +if such a mode exists.",
> + opt_machine_mode,
>   (unsigned nunits, unsigned length),
>   default_get_mask_mode)
>
> Index: gcc/doc/tm.texi
> ===
> --- gcc/doc/tm.texi 2017-09-04 11:50:24.566073698 +0100
> +++ gcc/doc/tm.texi 2017-09-04 12:18:58.593753447 +0100
> @@ -5820,10 +5820,16 @@ mode returned by @code{TARGET_VECTORIZE_
>  The default is zero which means to not iterate over other vector sizes.
>  @end deftypefn
>
> -@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
> (unsigned @var{nunits}, unsigned @var{length})
> -This hook returns mode to be used for a mask to be used for a vector
> -of specified @var{length} with @var{nunits} elements.  By default an integer
> -vector mode of a proper size is returned.
> +@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
> (unsigned @var{nunits}, unsigned @var{length})
> +A vector mask is a value that holds one boolean result for every element
> +in a vector.  This hook returns the machine mode that should be used to
> +represent such a mask when the vector in question is @var{length} bytes
> +long and contains @var{nunits} elements.  The hook returns an empty
> +@code{opt_machine_mode} if no such mode exists.
> +
> +The default implementation returns the mode of an integer vector that
> +is @var{length} bytes long and that contains @var{nunits} elements,
> +if such a mode exists.
>  @end deftypefn
>
>  @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop 
> *@var{loop_info})
> Index: gcc/targhooks.h
> ===
> --- gcc/targhooks.h 2017-09-04 11:50:24.568774867 +0100
> +++ gcc/targhooks.h 2017-09-04 12:18:58.594757220 +0100
> @@ -102,7 +102,7 @@ default_builtin_support_vector_misalignm
>  int, bool);
>  extern machine_mode default_preferred_simd_mode (scalar_mode mode);
>  extern unsigned int default_autovectorize_vector_sizes (void);
> -extern machine_mode default_get_mask_mode (unsigned, unsigned);
> +extern opt_machine_mode default_get_mask_mode (unsigned, unsigned);
>  extern void *default_init_cost (struct loop *);
>  extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
>struct _stmt_vec_info *, int,
> Index: gcc/targhooks.c
> ===
> --- gcc/targhooks.c 2017-09-04 12:18:55.825348732 +0100
> +++ gcc/targhooks.c 2017-09-04 12:18:58.594757220 +0100
> @@ -1200,7 +1200,7 @@ default_autovectorize_vector_sizes (void
>
>  /* By defaults a vector of integers is used as a mask.  */
>
> -machine_mode
> +opt_machine_mode
>  default_get_mask_mode (unsigned nunits, unsigned vector_size)
>  {
>unsigned elem_size = vector_size / nunits;
> @@ -1210,12 +1210,12 @@ default_get_mask_mode (unsigned nunits,
>
>gcc_assert (elem_size * nunits == vector_size);
>
> -  if (!mode_for_vector (elem_mode, nunits).exists (&vector_mode)
> -  || !VECTOR_MODE_P (vector_mode)
> -  || !targetm.vector_mode_supported_p (vector_mode))
> -vector_mode = BLKmo

Re: [6/9] Make mode_for_vector return an opt_mode

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:39 PM, Richard Sandiford
 wrote:
> ...following on from the mode_for_size change.  The patch also removes
> machmode.h versions of the stor-layout.c comments, since the comments
> in the .c file are more complete.

Ok.

Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * machmode.h (mode_for_vector): Return an opt_mode.
> * stor-layout.c (mode_for_vector): Likewise.
> (mode_for_int_vector): Update accordingly.
> (layout_type): Likewise.
> * config/i386/i386.c (emit_memmov): Likewise.
> (ix86_expand_set_or_movmem): Likewise.
> (ix86_expand_vector_init): Likewise.
> (ix86_get_mask_mode): Likewise.
> * config/powerpcspe/powerpcspe.c (rs6000_expand_vec_perm_const_1):
> Likewise.
> * config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Likewise.
> * expmed.c (extract_bit_field_1): Likewise.
> * expr.c (expand_expr_real_2): Likewise.
> * optabs-query.c (can_vec_perm_p): Likewise.
> (can_vec_mask_load_store_p): Likewise.
> * optabs.c (expand_vec_perm): Likewise.
> * targhooks.c (default_get_mask_mode): Likewise.
> * tree-vect-stmts.c (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (get_vectype_for_scalar_type_and_size): Likewise.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-09-04 12:18:53.153306182 +0100
> +++ gcc/machmode.h  2017-09-04 12:18:55.821333642 +0100
> @@ -682,8 +682,6 @@ decimal_float_mode_for_size (unsigned in
>  (mode_for_size (size, MODE_DECIMAL_FLOAT, 0));
>  }
>
> -/* Similar to mode_for_size, but find the smallest mode for a given width.  
> */
> -
>  extern machine_mode smallest_mode_for_size (unsigned int, enum mode_class);
>
>  /* Find the narrowest integer mode that contains at least SIZE bits.
> @@ -695,17 +693,9 @@ smallest_int_mode_for_size (unsigned int
>return as_a  (smallest_mode_for_size (size, MODE_INT));
>  }
>
> -/* Return an integer mode of exactly the same size as the input mode.  */
> -
>  extern opt_scalar_int_mode int_mode_for_mode (machine_mode);
> -
>  extern machine_mode bitwise_mode_for_mode (machine_mode);
> -
> -/* Return a mode that is suitable for representing a vector,
> -   or BLKmode on failure.  */
> -
> -extern machine_mode mode_for_vector (scalar_mode, unsigned);
> -
> +extern opt_machine_mode mode_for_vector (scalar_mode, unsigned);
>  extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
>
>  /* Return the integer vector equivalent of MODE, if one exists.  In other
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c   2017-09-04 12:18:53.153306182 +0100
> +++ gcc/stor-layout.c   2017-09-04 12:18:55.824344959 +0100
> @@ -471,11 +471,11 @@ bitwise_type_for_mode (machine_mode mode
>return inner_type;
>  }
>
> -/* Find a mode that is suitable for representing a vector with
> -   NUNITS elements of mode INNERMODE.  Returns BLKmode if there
> -   is no suitable mode.  */
> +/* Find a mode that is suitable for representing a vector with NUNITS
> +   elements of mode INNERMODE, if one exists.  The returned mode can be
> +   either an integer mode or a vector mode.  */
>
> -machine_mode
> +opt_machine_mode
>  mode_for_vector (scalar_mode innermode, unsigned nunits)
>  {
>machine_mode mode;
> @@ -499,22 +499,18 @@ mode_for_vector (scalar_mode innermode,
>FOR_EACH_MODE_FROM (mode, mode)
>  if (GET_MODE_NUNITS (mode) == nunits
> && GET_MODE_INNER (mode) == innermode)
> -  break;
> +  return mode;
>
>/* For integers, try mapping it to a same-sized scalar mode.  */
> -  if (mode == VOIDmode
> -  && GET_MODE_CLASS (innermode) == MODE_INT)
> +  if (GET_MODE_CLASS (innermode) == MODE_INT)
>  {
>unsigned int nbits = nunits * GET_MODE_BITSIZE (innermode);
> -  mode = int_mode_for_size (nbits, 0).else_blk ();
> +  if (int_mode_for_size (nbits, 0).exists (&mode)
> + && have_regs_of_mode[mode])
> +   return mode;
>  }
>
> -  if (mode == VOIDmode
> -  || (GET_MODE_CLASS (mode) == MODE_INT
> - && !have_regs_of_mode[mode]))
> -return BLKmode;
> -
> -  return mode;
> +  return opt_machine_mode ();
>  }
>
>  /* Return the mode for a vector that has NUNITS integer elements of
> @@ -525,12 +521,10 @@ mode_for_vector (scalar_mode innermode,
>  mode_for_int_vector (unsigned int int_bits, unsigned int nunits)
>  {
>scalar_int_mode int_mode;
> -  if (int_mode_for_size (int_bits, 0).exists (&int_mode))
> -{
> -  machine_mode vec_mode = mode_for_vector (int_mode, nunits);
> -  if (vec_mode != BLKmode)
> -   return vec_mode;
> -}
> +  machine_mode vec_mode;
> +  if (int_mode_for_size (int_bits, 0).exists (&int_mode)
> +  && mode_for_vector (int_mode, nunits).exists (&vec_mode))
> +retur

Re: [8/9] Make mode_for_size_tree return an opt_mode

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:42 PM, Richard Sandiford
 wrote:
> ...for consistency with mode_for_size

Ok.

Thanks,
Richard.

> 2017-09-04  Richard Sandiford  
>
> gcc/
> * stor-layout.h (mode_for_size_tree): Return an opt_mode.
> * stor-layout.c (mode_for_size_tree): Likewise.
> (mode_for_array): Update accordingly.
> (layout_decl): Likewise.
> (compute_record_mode): Likewise.  Only set the mode once.
>
> gcc/ada/
> * gcc-interface/utils.c (make_packable_type): Update call to
> mode_for_size_tree.
>
> Index: gcc/stor-layout.h
> ===
> --- gcc/stor-layout.h   2017-08-21 12:14:47.158835574 +0100
> +++ gcc/stor-layout.h   2017-09-04 12:19:01.144339518 +0100
> @@ -99,7 +99,7 @@ extern tree make_unsigned_type (int);
> If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE.
> The value is BLKmode if no other mode is found.  This is like
> mode_for_size, but is passed a tree.  */
> -extern machine_mode mode_for_size_tree (const_tree, enum mode_class, int);
> +extern opt_machine_mode mode_for_size_tree (const_tree, enum mode_class, 
> int);
>
>  extern tree bitwise_type_for_mode (machine_mode);
>
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c   2017-09-04 12:18:55.824344959 +0100
> +++ gcc/stor-layout.c   2017-09-04 12:19:01.144339518 +0100
> @@ -321,19 +321,19 @@ mode_for_size (unsigned int size, enum m
>
>  /* Similar, except passed a tree node.  */
>
> -machine_mode
> +opt_machine_mode
>  mode_for_size_tree (const_tree size, enum mode_class mclass, int limit)
>  {
>unsigned HOST_WIDE_INT uhwi;
>unsigned int ui;
>
>if (!tree_fits_uhwi_p (size))
> -return BLKmode;
> +return opt_machine_mode ();
>uhwi = tree_to_uhwi (size);
>ui = uhwi;
>if (uhwi != ui)
> -return BLKmode;
> -  return mode_for_size (ui, mclass, limit).else_blk ();
> +return opt_machine_mode ();
> +  return mode_for_size (ui, mclass, limit);
>  }
>
>  /* Return the narrowest mode of class MCLASS that contains at least
> @@ -563,7 +563,7 @@ mode_for_array (tree elem_type, tree siz
>  int_size / int_elem_size))
> limit_p = false;
>  }
> -  return mode_for_size_tree (size, MODE_INT, limit_p);
> +  return mode_for_size_tree (size, MODE_INT, limit_p).else_blk ();
>  }
>
>  /* Subroutine of layout_decl: Force alignment required for the data type.
> @@ -683,17 +683,18 @@ layout_decl (tree decl, unsigned int kno
>   && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
>   && GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT)
> {
> - machine_mode xmode
> -   = mode_for_size_tree (DECL_SIZE (decl), MODE_INT, 1);
> - unsigned int xalign = GET_MODE_ALIGNMENT (xmode);
> -
> - if (xmode != BLKmode
> - && !(xalign > BITS_PER_UNIT && DECL_PACKED (decl))
> - && (known_align == 0 || known_align >= xalign))
> + machine_mode xmode;
> + if (mode_for_size_tree (DECL_SIZE (decl),
> + MODE_INT, 1).exists (&xmode))
> {
> - SET_DECL_ALIGN (decl, MAX (xalign, DECL_ALIGN (decl)));
> - SET_DECL_MODE (decl, xmode);
> - DECL_BIT_FIELD (decl) = 0;
> + unsigned int xalign = GET_MODE_ALIGNMENT (xmode);
> + if (!(xalign > BITS_PER_UNIT && DECL_PACKED (decl))
> + && (known_align == 0 || known_align >= xalign))
> +   {
> + SET_DECL_ALIGN (decl, MAX (xalign, DECL_ALIGN (decl)));
> + SET_DECL_MODE (decl, xmode);
> + DECL_BIT_FIELD (decl) = 0;
> +   }
> }
> }
>
> @@ -1756,22 +1757,24 @@ compute_record_mode (tree type)
>if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode
>&& tree_fits_uhwi_p (TYPE_SIZE (type))
>&& GET_MODE_BITSIZE (mode) == tree_to_uhwi (TYPE_SIZE (type)))
> -SET_TYPE_MODE (type, mode);
> +;
>else
> -SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1));
> +mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk ();
>
>/* If structure's known alignment is less than what the scalar
>   mode would need, and it matters, then stick with BLKmode.  */
> -  if (TYPE_MODE (type) != BLKmode
> +  if (mode != BLKmode
>&& STRICT_ALIGNMENT
>&& ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
> -   || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (TYPE_MODE (type
> +   || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (mode)))
>  {
>/* If this is the only reason this type is BLKmode, then
>  don't force containing types to be BLKmode.  */
>TYPE_NO_FORCE_BLK (

Re: [9/9] Make bitsize_mode_for_mode return an opt_mode

2017-09-05 Thread Richard Biener

On Mon, Sep 4, 2017 at 1:43 PM, Richard Sandiford
 wrote:
> 2017-09-04  Richard Sandiford  

Ok.

Richard.

> gcc/
> * machmode.h (bitwise_mode_for_mode): Return opt_mode.
> * stor-layout.c (bitwise_mode_for_mode): Likewise.
> (bitwise_type_for_mode): Update accordingly.
>
> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h  2017-09-04 12:18:55.821333642 +0100
> +++ gcc/machmode.h  2017-09-04 12:19:42.856108173 +0100
> @@ -694,7 +694,7 @@ smallest_int_mode_for_size (unsigned int
>  }
>
>  extern opt_scalar_int_mode int_mode_for_mode (machine_mode);
> -extern machine_mode bitwise_mode_for_mode (machine_mode);
> +extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
>  extern opt_machine_mode mode_for_vector (scalar_mode, unsigned);
>  extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
>
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c   2017-09-04 12:19:01.144339518 +0100
> +++ gcc/stor-layout.c   2017-09-04 12:19:42.856108173 +0100
> @@ -404,10 +404,10 @@ int_mode_for_mode (machine_mode mode)
>  }
>  }
>
> -/* Find a mode that can be used for efficient bitwise operations on MODE.
> -   Return BLKmode if no such mode exists.  */
> +/* Find a mode that can be used for efficient bitwise operations on MODE,
> +   if one exists.  */
>
> -machine_mode
> +opt_machine_mode
>  bitwise_mode_for_mode (machine_mode mode)
>  {
>/* Quick exit if we already have a suitable mode.  */
> @@ -445,7 +445,7 @@ bitwise_mode_for_mode (machine_mode mode
>  }
>
>/* Otherwise fall back on integers while honoring MAX_FIXED_MODE_SIZE.  */
> -  return mode_for_size (bitsize, MODE_INT, true).else_blk ();
> +  return mode_for_size (bitsize, MODE_INT, true);
>  }
>
>  /* Find a type that can be used for efficient bitwise operations on MODE.
> @@ -454,8 +454,7 @@ bitwise_mode_for_mode (machine_mode mode
>  tree
>  bitwise_type_for_mode (machine_mode mode)
>  {
> -  mode = bitwise_mode_for_mode (mode);
> -  if (mode == BLKmode)
> +  if (!bitwise_mode_for_mode (mode).exists (&mode))
>  return NULL_TREE;
>
>unsigned int inner_size = GET_MODE_UNIT_BITSIZE (mode);

Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-09-05 Thread Uros Bizjak

On Tue, Sep 5, 2017 at 12:28 PM, Alexander Monakov  wrote:
> On Mon, 4 Sep 2017, Uros Bizjak wrote:
>> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
>>
>> New failures:
>> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
>> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
>
> Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
> fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
> I can do all of the above if you agree.
>
> I think RTL peepholes are a poor way of fixing the original problem, which
> actually exists on all targets with separate int/fp registers.  For instance,
> trunk (without my patch) still gets a far simpler testcase wrong (-O2, 
> 64-bit):

Please note that 32bit x86 implements atomic DImode access with
fild/fistp combination, so the mentioned peephole avoids quite costly
instruction sequence.

For reference, attached patch implements additional peephole2 patterns
that also handle sequences with blockages.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 29b82f86d43a..eceaa73a6799 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -219,29 +219,71 @@
(set (match_operand:DI 2 "memory_operand")
(unspec:DI [(match_dup 0)]
   UNSPEC_FIST_ATOMIC))
-   (set (match_operand:DF 3 "fp_register_operand")
+   (set (match_operand:DF 3 "any_fp_register_operand")
(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
   [(set (match_dup 3) (match_dup 5))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")
 
 (define_peephole2
+  [(set (match_operand:DF 0 "fp_register_operand")
+   (unspec:DF [(match_operand:DI 1 "memory_operand")]
+  UNSPEC_FILD_ATOMIC))
+   (set (match_operand:DI 2 "memory_operand")
+   (unspec:DI [(match_dup 0)]
+  UNSPEC_FIST_ATOMIC))
+   (set (mem:BLK (scratch:SI))
+   (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
+   (set (match_operand:DF 3 "any_fp_register_operand")
+   (match_operand:DF 4 "memory_operand"))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (2, operands[0])
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
+  [(const_int 0)]
+{
+  emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
+  emit_insn (gen_memory_blockage ());
+  DONE;
+})
+
+(define_peephole2
   [(set (match_operand:DF 0 "sse_reg_operand")
(unspec:DF [(match_operand:DI 1 "memory_operand")]
   UNSPEC_LDX_ATOMIC))
(set (match_operand:DI 2 "memory_operand")
(unspec:DI [(match_dup 0)]
   UNSPEC_STX_ATOMIC))
-   (set (match_operand:DF 3 "fp_register_operand")
+   (set (match_operand:DF 3 "any_fp_register_operand")
(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
   [(set (match_dup 3) (match_dup 5))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")
 
+(define_peephole2
+  [(set (match_operand:DF 0 "sse_reg_operand")
+   (unspec:DF [(match_operand:DI 1 "memory_operand")]
+  UNSPEC_LDX_ATOMIC))
+   (set (match_operand:DI 2 "memory_operand")
+   (unspec:DI [(match_dup 0)]
+  UNSPEC_STX_ATOMIC))
+   (set (mem:BLK (scratch:SI))
+   (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
+   (set (match_operand:DF 3 "any_fp_register_operand")
+   (match_operand:DF 4 "memory_operand"))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (2, operands[0])
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
+  [(const_int 0)]
+{
+  emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
+  emit_insn (gen_memory_blockage ());
+  DONE;
+})
+
 (define_expand "atomic_store"
   [(set (match_operand:ATOMIC 0 "memory_operand")
(unspec:ATOMIC [(match_operand:ATOMIC 1 "nonimmediate_operand")
@@ -331,7 +373,7 @@
 
 (define_peephole2
   [(set (match_operand:DF 0 "memory_operand")
-   (match_operand:DF 1 "fp_register_operand"))
+   (match_operand:DF 1 "any_fp_register_operand"))
(set (match_operand:DF 2 "fp_register_operand")
(unspec:DF [(match_operand:DI 3 "memory_operand")]
   UNSPEC_FILD_ATOMIC))
@@ -340,13 +382,34 @@
   UNSPEC_FIST_ATOMIC))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (3, operands[2])
-   && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
   [(set (match_dup 5) (match_dup 1))]
   "operands[5] = gen_lowpart (DFmode, operands[4]);")
 
 (define_peephole2
   [(set (match_operand:DF 0 "

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Alexander Monakov

On Tue, 5 Sep 2017, Uros Bizjak wrote:
> However, this definition can't be generic, since unspec is used.

I see, if the only reason this needs a named pattern is lack of generic UNSPEC
values, I believe it would be helpful to mention that in the documentation.

A few comments on the patch:

> @@ -6734,6 +6734,13 @@ scheduler and other passes from moving instructions 
> and using register
>  equivalences across the boundary defined by the blockage insn.
>  This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.
>  
> +@cindex @code{memmory_blockage} instruction pattern

Typo ('mm').

> +@item @samp{memory_blockage}
> +This pattern defines a pseudo insn that prevents the instruction
> +scheduler and other passes from moving instructions accessing memory
> +across the boundary defined by the blockage insn.  This instruction
> +needs to read and write volatile BLKmode memory.
> +

I see this is mostly cloned from the 'blockage' pattern description, but
this is not quite correct, it's not about moving _instructions_ per se
(RTL CSE propagates values loaded from memory without moving instructions,
RTL DSE eliminates some stores to memory also without moving instructions),
and calling out the scheduler separately doesn't seem useful.  I suggest:

"""
This pattern, if defined, represents a compiler memory barrier, and will be
placed at points across which RTL passes may not propagate memory accesses.
This instruction needs to read and write volatile BLKmode memory.  It does
not need to generate any machine instruction, and like the @code{blockage}
insn needs a named pattern only because there are no generic @code{unspec}
values.  If this pattern is not defined, the compiler falls back by emitting
an instruction corresponding to @code{asm volatile ("" ::: "memory")}.
"""

> @@ -6295,6 +6295,21 @@ expand_asm_memory_barrier (void)
>emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>  }
>  
> +#ifndef HAVE_memory_blockage
> +#define HAVE_memory_blockage 0
> +#endif

Why this?  This style is not used (anymore) elsewhere in the file, afaict
the current approach is to add a definition in target-insns.def and then
use targetm.have_memory_blockage (e.g. like mem_thread_fence is used).

Thanks.
Alexander

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Uros Bizjak

On Tue, Sep 5, 2017 at 2:03 PM, Alexander Monakov  wrote:
> On Tue, 5 Sep 2017, Uros Bizjak wrote:
>> However, this definition can't be generic, since unspec is used.
>
> I see, if the only reason this needs a named pattern is lack of generic UNSPEC
> values, I believe it would be helpful to mention that in the documentation.
>
> A few comments on the patch:
>
>> @@ -6734,6 +6734,13 @@ scheduler and other passes from moving instructions 
>> and using register
>>  equivalences across the boundary defined by the blockage insn.
>>  This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.
>>
>> +@cindex @code{memmory_blockage} instruction pattern
>
> Typo ('mm').
>
>> +@item @samp{memory_blockage}
>> +This pattern defines a pseudo insn that prevents the instruction
>> +scheduler and other passes from moving instructions accessing memory
>> +across the boundary defined by the blockage insn.  This instruction
>> +needs to read and write volatile BLKmode memory.
>> +
>
> I see this is mostly cloned from the 'blockage' pattern description, but
> this is not quite correct, it's not about moving _instructions_ per se
> (RTL CSE propagates values loaded from memory without moving instructions,
> RTL DSE eliminates some stores to memory also without moving instructions),
> and calling out the scheduler separately doesn't seem useful.  I suggest:
>
> """
> This pattern, if defined, represents a compiler memory barrier, and will be
> placed at points across which RTL passes may not propagate memory accesses.
> This instruction needs to read and write volatile BLKmode memory.  It does
> not need to generate any machine instruction, and like the @code{blockage}
> insn needs a named pattern only because there are no generic @code{unspec}
> values.  If this pattern is not defined, the compiler falls back by emitting
> an instruction corresponding to @code{asm volatile ("" ::: "memory")}.
> """
>
>> @@ -6295,6 +6295,21 @@ expand_asm_memory_barrier (void)
>>emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>  }
>>
>> +#ifndef HAVE_memory_blockage
>> +#define HAVE_memory_blockage 0
>> +#endif
>
> Why this?  This style is not used (anymore) elsewhere in the file, afaict
> the current approach is to add a definition in target-insns.def and then
> use targetm.have_memory_blockage (e.g. like mem_thread_fence is used).

Uh, I was not aware of the new approach. Will send a v2 with mentioned
issues fixed.

Thanks,
Uros.

Re: [5/9] Add mode_for_int_vector helper functions

2017-09-05 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, Sep 4, 2017 at 1:36 PM, Richard Sandiford
>  wrote:
>> There are at least a few places that want to create an integer vector
>> with a specified element size and element count, or to create the
>> integer equivalent of an existing mode.  This patch adds helpers
>> for doing that.
>>
>> The require ()s are all used in functions that go on to emit
>> instructions that use the result as a vector mode.
>>
>> 2017-09-04  Richard Sandiford  
>>
>> gcc/
>> * machmode.h (mode_for_int_vector): New function.
>> * stor-layout.c (mode_for_int_vector): Likewise.
>> * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Use it.
>> * config/powerpcspe/powerpcspe.c (rs6000_do_expand_vec_perm): 
>> Likewise.
>> * config/rs6000/rs6000.c (rs6000_do_expand_vec_perm): Likewise.
>> * config/s390/s390.c (s390_expand_vec_compare_cc): Likewise.
>> (s390_expand_vcond): Likewise.
>>
>> Index: gcc/machmode.h
>> ===
>> --- gcc/machmode.h  2017-09-04 12:18:50.674859598 +0100
>> +++ gcc/machmode.h  2017-09-04 12:18:53.153306182 +0100
>> @@ -706,6 +706,21 @@ extern machine_mode bitwise_mode_for_mod
>>
>>  extern machine_mode mode_for_vector (scalar_mode, unsigned);
>>
>> +extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
>> +
>> +/* Return the integer vector equivalent of MODE, if one exists.  In other
>> +   words, return the mode for an integer vector that has the same number
>> +   of bits as MODE and the same number of elements as MODE, with the
>> +   latter being 1 if MODE is scalar.  The returned mode can be either
>> +   an integer mode or a vector mode.  */
>> +
>> +inline opt_machine_mode
>> +mode_for_int_vector (machine_mode mode)
>
> So this is similar to int_mode_for_mode which means...
>
> int_vector_mode_for_vector_mode?

I'd used that style of name originally, but didn't like it because
it gave the impression that the result would be a VECTOR_MODE_P.

mode_for_int_vector was supposed to be consistent with mode_for_vector.

>> +{
>
> Nothing prevents use with non-vector MODE here, can we place an assert here?

That was deliberate.  I wanted it to work with scalars too, returning
a V1xx in that case.

>> +  return mode_for_int_vector (GET_MODE_UNIT_BITSIZE (mode),
>> + GET_MODE_NUNITS (mode));
>> +}
>> +
>>  /* A class for iterating through possible bitfield modes.  */
>>  class bit_field_mode_iterator
>>  {
>> Index: gcc/stor-layout.c
>> ===
>> --- gcc/stor-layout.c   2017-09-04 12:18:50.675762071 +0100
>> +++ gcc/stor-layout.c   2017-09-04 12:18:53.153306182 +0100
>> @@ -517,6 +517,23 @@ mode_for_vector (scalar_mode innermode,
>>return mode;
>>  }
>>
>> +/* Return the mode for a vector that has NUNITS integer elements of
>> +   INT_BITS bits each, if such a mode exists.  The mode can be either
>> +   an integer mode or a vector mode.  */
>> +
>> +opt_machine_mode
>> +mode_for_int_vector (unsigned int int_bits, unsigned int nunits)
>
> That's more vector_int_mode_for_size (...), no?  Similar to int_mode_for_size
> or mode_for_size.
>
> Ok with those renamings.  I wonder if int_vector_mode_for_vector_mode
> is necessary -- is calling vector_int_mode_for_size
> (GET_MODE_UNIT_BITSIZE (mode),
> GET_MODE_NUNITS (mode)) too cumbersome?

IMO yes :-)  It's certainly longer than the equivalent int_mode_for_mode
expansion.

Thanks,
Richard

Re: [PATCH] Handle wide-chars in native_encode_string

2017-09-05 Thread Joseph Myers

On Tue, 5 Sep 2017, Richard Biener wrote:

> don't have any BITS_PER_UNIT != 8 port (we had c4x).  I'm not
> sure what constraints we have on CHAR_TYPE_SIZE vs. BITS_PER_UNIT,
> or for what port it would make sense to have differing values.

BITS_PER_UNIT = size of QImode = unit that target hardware addresses count 
in.

CHAR_TYPE_SIZE = size of target char in the C ABI.

sizeof (char) is always 1 by definition, but in principle you could have 
an architecture where the addressable unit at the hardware level is 
smaller than C char.  CHAR_TYPE_SIZE must always be a multiple of 
BITS_PER_UNIT, and CHAR_TYPE_SIZE != BITS_PER_UNIT is probably even more 
bitrotten (I don't know if we've ever had such a port) than BITS_PER_UNIT 
!= 8.

-- 
Joseph S. Myers
jos...@codesourcery.com

remove unused argument in duplicate_thread_path()

2017-09-05 Thread Aldy Hernandez

It looks like all remaining uses of duplicate_thread_path (ahem, one),
pass a NULL to the REGION_COPY argument.

OK?


curr
Description: Binary data

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Richard Biener

On Tue, 5 Sep 2017, Richard Biener wrote:

> On Tue, 5 Sep 2017, Tamar Christina wrote:
> 
> > Hi Richard,
> > 
> > That was an really interesting analysis, thanks for the details!
> > 
> > Would you be submitting the patch you proposed at the end as a fix?
> 
> I'm testing it currently.

Unfortunately it breaks some required lowering.  I'll have to more
closely look at this.

Richard.

Re: [5/9] Add mode_for_int_vector helper functions

2017-09-05 Thread Richard Biener

On Tue, Sep 5, 2017 at 2:33 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Sep 4, 2017 at 1:36 PM, Richard Sandiford
>>  wrote:
>>> There are at least a few places that want to create an integer vector
>>> with a specified element size and element count, or to create the
>>> integer equivalent of an existing mode.  This patch adds helpers
>>> for doing that.
>>>
>>> The require ()s are all used in functions that go on to emit
>>> instructions that use the result as a vector mode.
>>>
>>> 2017-09-04  Richard Sandiford  
>>>
>>> gcc/
>>> * machmode.h (mode_for_int_vector): New function.
>>> * stor-layout.c (mode_for_int_vector): Likewise.
>>> * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Use it.
>>> * config/powerpcspe/powerpcspe.c (rs6000_do_expand_vec_perm): 
>>> Likewise.
>>> * config/rs6000/rs6000.c (rs6000_do_expand_vec_perm): Likewise.
>>> * config/s390/s390.c (s390_expand_vec_compare_cc): Likewise.
>>> (s390_expand_vcond): Likewise.
>>>
>>> Index: gcc/machmode.h
>>> ===
>>> --- gcc/machmode.h  2017-09-04 12:18:50.674859598 +0100
>>> +++ gcc/machmode.h  2017-09-04 12:18:53.153306182 +0100
>>> @@ -706,6 +706,21 @@ extern machine_mode bitwise_mode_for_mod
>>>
>>>  extern machine_mode mode_for_vector (scalar_mode, unsigned);
>>>
>>> +extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
>>> +
>>> +/* Return the integer vector equivalent of MODE, if one exists.  In other
>>> +   words, return the mode for an integer vector that has the same number
>>> +   of bits as MODE and the same number of elements as MODE, with the
>>> +   latter being 1 if MODE is scalar.  The returned mode can be either
>>> +   an integer mode or a vector mode.  */
>>> +
>>> +inline opt_machine_mode
>>> +mode_for_int_vector (machine_mode mode)
>>
>> So this is similar to int_mode_for_mode which means...
>>
>> int_vector_mode_for_vector_mode?
>
> I'd used that style of name originally, but didn't like it because
> it gave the impression that the result would be a VECTOR_MODE_P.

Oh, it isn't?  Ah, yes, it can be an integer mode...

> mode_for_int_vector was supposed to be consistent with mode_for_vector.
>
>>> +{
>>
>> Nothing prevents use with non-vector MODE here, can we place an assert here?
>
> That was deliberate.  I wanted it to work with scalars too, returning
> a V1xx in that case.

Ok.

>>> +  return mode_for_int_vector (GET_MODE_UNIT_BITSIZE (mode),
>>> + GET_MODE_NUNITS (mode));
>>> +}
>>> +
>>>  /* A class for iterating through possible bitfield modes.  */
>>>  class bit_field_mode_iterator
>>>  {
>>> Index: gcc/stor-layout.c
>>> ===
>>> --- gcc/stor-layout.c   2017-09-04 12:18:50.675762071 +0100
>>> +++ gcc/stor-layout.c   2017-09-04 12:18:53.153306182 +0100
>>> @@ -517,6 +517,23 @@ mode_for_vector (scalar_mode innermode,
>>>return mode;
>>>  }
>>>
>>> +/* Return the mode for a vector that has NUNITS integer elements of
>>> +   INT_BITS bits each, if such a mode exists.  The mode can be either
>>> +   an integer mode or a vector mode.  */
>>> +
>>> +opt_machine_mode
>>> +mode_for_int_vector (unsigned int int_bits, unsigned int nunits)
>>
>> That's more vector_int_mode_for_size (...), no?  Similar to int_mode_for_size
>> or mode_for_size.
>>
>> Ok with those renamings.  I wonder if int_vector_mode_for_vector_mode
>> is necessary -- is calling vector_int_mode_for_size
>> (GET_MODE_UNIT_BITSIZE (mode),
>> GET_MODE_NUNITS (mode)) too cumbersome?
>
> IMO yes :-)  It's certainly longer than the equivalent int_mode_for_mode
> expansion.

I see.

Patch is ok as-is then.

Thanks,
Richard.

> Thanks,
> Richard

Re: Add support to trace comparison instructions and switch statements

2017-09-05 Thread 吴潍浠(此彼)

Hi
Attachment is my updated path.
The implementation of parse_sanitizer_options is not elegance enough. Mixing 
handling flags of fsanitize is easy to make mistakes.

ChangeLog:
gcc/ChangeLog:

2017-09-05  Wish Wu  

* asan.c (initialize_sanitizer_builtins):
Build function type list of trace-cmp.
* builtin-types.def (BT_FN_VOID_UINT8_UINT8):
Define function type of trace-cmp.
(BT_FN_VOID_UINT16_UINT16): Likewise.
(BT_FN_VOID_UINT32_UINT32): Likewise.
(BT_FN_VOID_FLOAT_FLOAT): Likewise.
(BT_FN_VOID_DOUBLE_DOUBLE): Likewise.
(BT_FN_VOID_UINT64_PTR): Likewise.
* common.opt: Add options of sanitize coverage.
* flag-types.h (enum sanitize_coverage_code):
Declare flags of sanitize coverage.
* fold-const.c (fold_range_test):
Disable non-short-circuit feature when sanitize coverage is enabled.
(fold_truth_andor): Likewise.
* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
* opts.c (COVERAGE_SANITIZER_OPT):
Define coverage sanitizer options.
(get_closest_sanitizer_option): Make OPT_fsanitize_,
OPT_fsanitize_recover_ and OPT_fsanitize_coverage_ to use same
function.
(parse_sanitizer_options): Likewise.
(common_handle_option): Add OPT_fsanitize_coverage_.
* sancov.c (instrument_comparison): Instrument comparisons.
(instrument_switch): Likewise.
(sancov_pass): Add trace-cmp support.
* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):
Define builtin functions of trace-cmp.
(BUILT_IN_SANITIZER_COV_TRACE_CMP2): Likewise.
(BUILT_IN_SANITIZER_COV_TRACE_CMP4): Likewise.
(BUILT_IN_SANITIZER_COV_TRACE_CMP8): Likewise.
(BUILT_IN_SANITIZER_COV_TRACE_CMPF): Likewise.
(BUILT_IN_SANITIZER_COV_TRACE_CMPD): Likewise.
(BUILT_IN_SANITIZER_COV_TRACE_SWITCH): Likewise.

gcc/testsuite/ChangeLog:

2017-09-05  Wish Wu  

* gcc.dg/sancov/basic3.c: New test.

Thank you every much for improving my codes.

Wish Wu

--
From:Jakub Jelinek 
Time:2017 Sep 5 (Tue) 01:34
To:Wish Wu 
Cc:Dmitry Vyukov ; gcc-patches ; 
Jeff Law ; wishwu007 
Subject:Re: Add support to trace comparison instructions and switch statements

On Mon, Sep 04, 2017 at 09:36:40PM +0800, 吴潍浠(此彼) wrote:
> gcc/ChangeLog: 
> 
> 2017-09-04  Wish Wu   
> 
> * asan.c (initialize_sanitizer_builtins):  
> * builtin-types.def (BT_FN_VOID_UINT8_UINT8):  
> (BT_FN_VOID_UINT16_UINT16):
> (BT_FN_VOID_UINT32_UINT32):
> (BT_FN_VOID_FLOAT_FLOAT):  
> (BT_FN_VOID_DOUBLE_DOUBLE):
> (BT_FN_VOID_UINT64_PTR):   
> * common.opt:  
> * flag-types.h (enum sanitize_coverage_code):  
> * opts.c (COVERAGE_SANITIZER_OPT): 
> (get_closest_sanitizer_option):
> (parse_sanitizer_options): 
> (common_handle_option):
> * sancov.c (instrument_cond):  
> (instrument_switch):   
> (sancov_pass): 
> * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMP2):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMP4):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMP8):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMPF):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMPD):   
> (BUILT_IN_SANITIZER_COV_TRACE_SWITCH): 

mklog just generates a template, you need to fill in the details
on what has been changed or added or removed.  See other ChangeLog
entries etc. to see what is expected.

> For code :
> void bar (void);
> void
> foo (int x)
> {
>   if (x == 21 || x == 64 || x == 98 || x == 135)
> bar ();
> }
> GIMPLE IL on x86_64:
>   1 
>   2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, 
> symbol_order=0)
>   3 
>   4 foo (int x)
>   5 {

...

That is with -O0 though?  With -O2 you'll see that it changes.
IMNSHO you really want to also handle the GIMPLE_ASSIGN with tcc_comparison
class rhs_code.  Shouldn't be that hard to handle that within
instrument_cond, just the way how you extract lhs and rhs from the insn will
differ based on if it is a GIMPLE_COND or GIMPLE_ASSIGN (and in that case
also for tcc_comparison rhs_code or for

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Tamar Christina



> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: 05 September 2017 13:51
> To: Tamar Christina
> Cc: Andrew Pinski; Andreas Schwab; Jon Beniston; gcc-patches@gcc.gnu.org;
> nd
> Subject: RE: [RFC, vectorizer] Allow single element vector types for vector
> reduction operations
> 
> On Tue, 5 Sep 2017, Richard Biener wrote:
> 
> > On Tue, 5 Sep 2017, Tamar Christina wrote:
> >
> > > Hi Richard,
> > >
> > > That was an really interesting analysis, thanks for the details!
> > >
> > > Would you be submitting the patch you proposed at the end as a fix?
> >
> > I'm testing it currently.
> 
> Unfortunately it breaks some required lowering.  I'll have to more closely
> look at this.

Ah, ok. In the meantime, can this patch be reverted? It's currently breaking 
spec for us so we're
Not able to get any benchmarking numbers.

> 
> Richard.

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Richard Biener

On Tue, 5 Sep 2017, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener [mailto:rguent...@suse.de]
> > Sent: 05 September 2017 13:51
> > To: Tamar Christina
> > Cc: Andrew Pinski; Andreas Schwab; Jon Beniston; gcc-patches@gcc.gnu.org;
> > nd
> > Subject: RE: [RFC, vectorizer] Allow single element vector types for vector
> > reduction operations
> > 
> > On Tue, 5 Sep 2017, Richard Biener wrote:
> > 
> > > On Tue, 5 Sep 2017, Tamar Christina wrote:
> > >
> > > > Hi Richard,
> > > >
> > > > That was an really interesting analysis, thanks for the details!
> > > >
> > > > Would you be submitting the patch you proposed at the end as a fix?
> > >
> > > I'm testing it currently.
> > 
> > Unfortunately it breaks some required lowering.  I'll have to more closely
> > look at this.
> 
> Ah, ok. In the meantime, can this patch be reverted? It's currently breaking 
> spec for us so we're
> Not able to get any benchmarking numbers.

Testing the following instead:

Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c (revision 251642)
+++ gcc/tree-vect-generic.c (working copy)
@@ -1640,7 +1640,7 @@ expand_vector_operations_1 (gimple_stmt_
   || code == VEC_UNPACK_FLOAT_LO_EXPR)
 type = TREE_TYPE (rhs1);
 
-  /* For widening/narrowing vector operations, the relevant type is of 
the
+  /* For widening vector operations, the relevant type is of the
  arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
  calculated in the same way above.  */
   if (code == WIDEN_SUM_EXPR
@@ -1650,9 +1650,6 @@ expand_vector_operations_1 (gimple_stmt_
   || code == VEC_WIDEN_MULT_ODD_EXPR
   || code == VEC_UNPACK_HI_EXPR
   || code == VEC_UNPACK_LO_EXPR
-  || code == VEC_PACK_TRUNC_EXPR
-  || code == VEC_PACK_SAT_EXPR
-  || code == VEC_PACK_FIX_TRUNC_EXPR
   || code == VEC_WIDEN_LSHIFT_HI_EXPR
   || code == VEC_WIDEN_LSHIFT_LO_EXPR)
 type = TREE_TYPE (rhs1);


also fix for a bug uncovered by the previous one:

Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 251710)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -1742,8 +1742,7 @@ find_candidates_dom_walker::before_dom_c
slsr_process_ref (gs);
 
   else if (is_gimple_assign (gs)
-  && SCALAR_INT_MODE_P
-   (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs
{
  tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;

[PATCH v2, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Uros Bizjak

Revised patch, incorporates fixes from Alexander's review comments.

I removed some implementation details from Alexander's description of
memory_blockage named pattern.


2017-09-05  Uros Bizjak  

* target-insns.def: Add memory_blockage.
* optabs.c (expand_memory_blockage): New function.
(expand_asm_memory_barrier): Rename ...
(expand_asm_memory_blockage): ... to this.
(expand_mem_thread_fence): Call expand_memory_blockage
instead of expand_asm_memory_barrier.
(expand_mem_singnal_fence): Ditto.
(expand_atomic_load): Ditto.
(expand_atomic_store): Ditto.
* doc/md.texi (Standard Pattern Names For Generation):
Document memory_blockage instruction pattern.

Bootstrapped and regression tested together with a followup x86 patch
on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 14aab9474bc2..c4c113850fe1 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6734,6 +6734,15 @@ scheduler and other passes from moving instructions and 
using register
 equivalences across the boundary defined by the blockage insn.
 This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.
 
+@cindex @code{memory_blockage} instruction pattern
+@item @samp{memory_blockage}
+This pattern, if defined, represents a compiler memory barrier, and will be
+placed at points across which RTL passes may not propagate memory accesses.
+This instruction needs to read and write volatile BLKmode memory.  It does
+not need to generate any machine instruction.  If this pattern is not defined,
+the compiler falls back to emitting an instruction corresponding
+to @code{asm volatile ("" ::: "memory")}.
+
 @cindex @code{memory_barrier} instruction pattern
 @item @samp{memory_barrier}
 If the target memory model is not fully synchronous, then this pattern
diff --git a/gcc/optabs.c b/gcc/optabs.c
index b65707080eee..94060036e61f 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6276,10 +6276,10 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx 
*ptarget_oval,
   return true;
 }
 
-/* Generate asm volatile("" : : : "memory") as the memory barrier.  */
+/* Generate asm volatile("" : : : "memory") as the memory blockage.  */
 
 static void
-expand_asm_memory_barrier (void)
+expand_asm_memory_blockage (void)
 {
   rtx asm_op, clob;
 
@@ -6295,6 +6295,17 @@ expand_asm_memory_barrier (void)
   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
 }
 
+/* Do not propagate memory accesses across this point.  */
+
+static void
+expand_memory_blockage (void)
+{
+  if (targetm.have_memory_blockage)
+emit_insn (gen_memory_blockage ());
+  else
+expand_asm_memory_blockage ();
+}
+
 /* This routine will either emit the mem_thread_fence pattern or issue a 
sync_synchronize to generate a fence for memory model MEMMODEL.  */
 
@@ -6306,14 +6317,14 @@ expand_mem_thread_fence (enum memmodel model)
   if (targetm.have_mem_thread_fence ())
 {
   emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
-  expand_asm_memory_barrier ();
+  expand_memory_blockage ();
 }
   else if (targetm.have_memory_barrier ())
 emit_insn (targetm.gen_memory_barrier ());
   else if (synchronize_libfunc != NULL_RTX)
 emit_library_call (synchronize_libfunc, LCT_NORMAL, VOIDmode);
   else
-expand_asm_memory_barrier ();
+expand_memory_blockage ();
 }
 
 /* Emit a signal fence with given memory model.  */
@@ -6324,7 +6335,7 @@ expand_mem_signal_fence (enum memmodel model)
   /* No machine barrier is required to implement a signal fence, but
  a compiler memory barrier must be issued, except for relaxed MM.  */
   if (!is_mm_relaxed (model))
-expand_asm_memory_barrier ();
+expand_memory_blockage ();
 }
 
 /* This function expands the atomic load operation:
@@ -6346,7 +6357,7 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel 
model)
   struct expand_operand ops[3];
   rtx_insn *last = get_last_insn ();
   if (is_mm_seq_cst (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
 
   create_output_operand (&ops[0], target, mode);
   create_fixed_operand (&ops[1], mem);
@@ -6354,7 +6365,7 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel 
model)
   if (maybe_expand_insn (icode, 3, ops))
{
  if (!is_mm_relaxed (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
  return ops[0].value;
}
   delete_insns_since (last);
@@ -6404,14 +6415,14 @@ expand_atomic_store (rtx mem, rtx val, enum memmodel 
model, bool use_release)
 {
   rtx_insn *last = get_last_insn ();
   if (!is_mm_relaxed (model))
-   expand_asm_memory_barrier ();
+   expand_memory_blockage ();
   create_fixed_operand (&ops[0], mem);
   create_input_operand (&ops[1], val, mode);
   create_integer_operand (&ops[2], model);
   if (maybe_expand_insn (icode, 3, ops))
{
  if (

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-05 Thread Tamar Christina



> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: 05 September 2017 14:13
> To: Tamar Christina
> Cc: Andrew Pinski; Andreas Schwab; Jon Beniston; gcc-patches@gcc.gnu.org;
> nd
> Subject: RE: [RFC, vectorizer] Allow single element vector types for vector
> reduction operations
> 
> On Tue, 5 Sep 2017, Tamar Christina wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Richard Biener [mailto:rguent...@suse.de]
> > > Sent: 05 September 2017 13:51
> > > To: Tamar Christina
> > > Cc: Andrew Pinski; Andreas Schwab; Jon Beniston;
> > > gcc-patches@gcc.gnu.org; nd
> > > Subject: RE: [RFC, vectorizer] Allow single element vector types for
> > > vector reduction operations
> > >
> > > On Tue, 5 Sep 2017, Richard Biener wrote:
> > >
> > > > On Tue, 5 Sep 2017, Tamar Christina wrote:
> > > >
> > > > > Hi Richard,
> > > > >
> > > > > That was an really interesting analysis, thanks for the details!
> > > > >
> > > > > Would you be submitting the patch you proposed at the end as a fix?
> > > >
> > > > I'm testing it currently.
> > >
> > > Unfortunately it breaks some required lowering.  I'll have to more
> > > closely look at this.
> >
> > Ah, ok. In the meantime, can this patch be reverted? It's currently
> > breaking spec for us so we're Not able to get any benchmarking numbers.
> 
> Testing the following instead:

That does seem to build spec again, haven't tested the testsuite yet. 

> Index: gcc/tree-vect-generic.c
> ==
> =
> --- gcc/tree-vect-generic.c (revision 251642)
> +++ gcc/tree-vect-generic.c (working copy)
> @@ -1640,7 +1640,7 @@ expand_vector_operations_1 (gimple_stmt_
>|| code == VEC_UNPACK_FLOAT_LO_EXPR)
>  type = TREE_TYPE (rhs1);
> 
> -  /* For widening/narrowing vector operations, the relevant type is of the
> +  /* For widening vector operations, the relevant type is of the
>   arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
>   calculated in the same way above.  */
>if (code == WIDEN_SUM_EXPR
> @@ -1650,9 +1650,6 @@ expand_vector_operations_1 (gimple_stmt_
>|| code == VEC_WIDEN_MULT_ODD_EXPR
>|| code == VEC_UNPACK_HI_EXPR
>|| code == VEC_UNPACK_LO_EXPR
> -  || code == VEC_PACK_TRUNC_EXPR
> -  || code == VEC_PACK_SAT_EXPR
> -  || code == VEC_PACK_FIX_TRUNC_EXPR
>|| code == VEC_WIDEN_LSHIFT_HI_EXPR
>|| code == VEC_WIDEN_LSHIFT_LO_EXPR)
>  type = TREE_TYPE (rhs1);
> 
> 
> also fix for a bug uncovered by the previous one:
> 
> Index: gcc/gimple-ssa-strength-reduction.c
> ==
> =
> --- gcc/gimple-ssa-strength-reduction.c (revision 251710)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -1742,8 +1742,7 @@ find_candidates_dom_walker::before_dom_c
> slsr_process_ref (gs);
> 
>else if (is_gimple_assign (gs)
> -  && SCALAR_INT_MODE_P
> -   (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)
> +  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs
> {
>   tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;
>

Re: [PATCH v2, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Alexander Monakov

On Tue, 5 Sep 2017, Uros Bizjak wrote:

> Revised patch, incorporates fixes from Alexander's review comments.
> 
> I removed some implementation details from Alexander's description of
> memory_blockage named pattern.

Well, to me it wasn't really obvious why a named pattern was needed
in the first place, so I wish the explanation could stay in some form.

One small nit, the new function comment in optabs.c,

+/* Do not propagate memory accesses across this point.  */

doesn't seem appropriate, it should probably say something like

/* Emit an insn acting as a compiler memory barrier.  */

Rest of the patch looks fine to me (but I cannot approve it).

Thanks.
Alexander

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Bernd Edlinger

Hi Christophe,

On 09/05/17 10:45, Christophe Lyon wrote:
> Hi Bernd,
> 
> 
> On 4 September 2017 at 16:52, Kyrill  Tkachov
>  wrote:
>>
>> On 29/04/17 18:45, Bernd Edlinger wrote:
>>>
>>> Ping...
>>>
>>> I attached a rebased version since there was a merge conflict in
>>> the xordi3 pattern, otherwise the patch is still identical.
>>> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
>>> early when the target has no neon or iwmmxt.
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>>
>>> On 11/28/16 20:42, Bernd Edlinger wrote:

 On 11/25/16 12:30, Ramana Radhakrishnan wrote:
>
> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
>  wrote:
>>
>> Hi!
>>
>> This improves the stack usage on the sha512 test case for the case
>> without hardware fpu and without iwmmxt by splitting all di-mode
>> patterns right while expanding which is similar to what the
>> shift-pattern
>> does.  It does nothing in the case iwmmxt and fpu=neon or vfp as well
>> as
>> thumb1.
>>
> I would go further and do this in the absence of Neon, the VFP unit
> being there doesn't help with DImode operations i.e. we do not have 64
> bit integer arithmetic instructions without Neon. The main reason why
> we have the DImode patterns split so late is to give a chance for
> folks who want to do 64 bit arithmetic in Neon a chance to make this
> work as well as support some of the 64 bit Neon intrinsics which IIRC
> map down to these instructions. Doing this just for soft-float doesn't
> improve the default case only. I don't usually test iwmmxt and I'm not
> sure who has the ability to do so, thus keeping this restriction for
> iwMMX is fine.
>
>
 Yes I understand, thanks for pointing that out.

 I was not aware what iwmmxt exists at all, but I noticed that most
 64bit expansions work completely different, and would break if we split
 the pattern early.

 I can however only look at the assembler outout for iwmmxt, and make
 sure that the stack usage does not get worse.

 Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
 it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
 for the test cases, and vfp and soft-float at around 270 bytes stack
 usage.

>> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>>
>> Note this also splits many ldrd/strd instructions and therefore I will
>> post a followup-patch that mitigates this effect by enabling the
>> ldrd/strd
>> peephole optimization after the necessary reg-testing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>
> What do you mean by arm-linux-gnueabihf - when folks say that I
> interpret it as --with-arch=armv7-a --with-float=hard
> --with-fpu=vfpv3-d16 or (--with-fpu=neon).
>
> If you've really bootstrapped and regtested it on armhf, doesn't this
> patch as it stand have no effect there i.e. no change ?
> arm-linux-gnueabihf usually means to me someone has configured with
> --with-float=hard, so there are no regressions in the hard float ABI
> case,
>
 I know it proves little.  When I say arm-linux-gnueabihf
 I do in fact mean --enable-languages=all,ada,go,obj-c++
 --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
 --with-float=hard.

 My main interest in the stack usage is of course not because of linux,
 but because of eCos where we have very small task stacks and in fact
 no fpu support by the O/S at all, so that patch is exactly what we need.


 Bootstrapped and reg-tested on arm-linux-gnueabihf
 Is it OK for trunk?
>>
>>
>> The code is ok.
>> AFAICS testing this with --with-fpu=vfpv3-d16 does exercise the new code as
>> the splits
>> will happen for !TARGET_NEON (it is of course !TARGET_IWMMXT and
>> TARGET_IWMMXT2
>> is irrelevant here).
>>
>> So this is ok for trunk.
>> Thanks, and sorry again for the delay.
>> Kyrill
>>
> 
> This patch (r251663) causes a regression on armeb-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu vfpv3-d16-fp16
> FAIL:gcc.dg/vect/vect-singleton_1.c (internal compiler error)
> FAIL:gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects
> (internal compiler error)
> 
> the test passes if gcc is configured --with-fpu neon-fp16
> 

Thank you very much for what you do!

I am able to reproduce this.

Combine creates an invalid insn out of these two insns:

(insn 12 8 18 2 (parallel [
 (set (reg:DI 122)
 (plus:DI (reg:DI 116 [ aD.5331 ])
 (reg:DI 119 [ bD.5332 ])))
 (clobber (reg:CC 100 cc))
 ]) "vect-singleton_1.c":28 1 {*arm_adddi3}
  (expr_list:REG_DEAD (reg:DI 119 [ bD.5332 ])
 (expr_list:REG_DEAD (reg:DI 116 [ aD.5331 ])
 (expr_list:REG_UNUSED (reg:CC 100 cc)

[PATCH, openacc, og7, committed] Make reduction copy clauses 'private'

2017-09-05 Thread Chung-Lin Tang

As we discussed, we are to support a behavior where within individual gangs,
worker/vector level reductions will correctly work with results immediately 
available.
This is on top of the implicit 'copy' clause added when we have loop reductions.

This patch adds a capability to mark map clauses additionally as 'private' (we 
may
be overloading this word a little too much :P), such that within offloaded 
regions
and wrt to our reduction lowering, the variable is (first)private, with 
additional
copy back appended at end of the offloaded region.

Care is taken to make sure this behavior is not applied when potential loop gang
reductions may happen (which this will not work).  In other cases, for 
gang-redundant
code, supposedly the multiple copy backs should all be the same, so the behavior
is same.

This is sort of a refinement of the implicit copy clause for reductions in 
PR70895.
A libgomp testcase is added to test the multiple worker-level reduction result 
case
across multiple gangs. Patch was tested and pushed to openacc-gcc-7-branch.

Chung-Lin
From 2dc21f336368889c1ebf031801a7613f65899ef1 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 5 Sep 2017 22:09:34 +0800
Subject: [PATCH] Add support for making maps 'private' inside offloaded
 regions.

2017-09-05  Chung-Lin Tang  

gcc/
* tree.h (OMP_CLAUSE_MAP_PRIVATE): Define macro.
* gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_PRIVATE enum 
value.
(omp_add_variable): Add GOVD_MAP_PRIVATE to reduction clause flags if
not a gang-partitioned loop directive.
(gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_PRIVATE of new map
clause to 1 if GOVD_MAP_PRIVATE flag is present.
* omp-low.c (lower_oacc_reductions): Handle map clauses with
OMP_CLAUSE_MAP_PRIVATE set in same matter as firstprivate/private.
(lower_omp_target): Likewise. Add copy back code for map clauses with
OMP_CLAUSE_MAP_PRIVATE set.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/reduction-9.c: New test.
---
 gcc/ChangeLog.openacc  | 14 
 gcc/gimplify.c | 34 --
 gcc/omp-low.c  | 28 +--
 gcc/tree.h |  3 ++
 libgomp/ChangeLog.openacc  |  4 +++
 .../libgomp.oacc-c-c++-common/reduction-9.c| 41 ++
 6 files changed, 119 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-9.c

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 4b1ce0b..23e19d9 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,17 @@
+2017-09-05  Chung-Lin Tang  
+
+   * tree.h (OMP_CLAUSE_MAP_PRIVATE): Define macro.
+   * gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_PRIVATE enum 
value.
+   (omp_add_variable): Add GOVD_MAP_PRIVATE to reduction clause flags if
+   not a gang-partitioned loop directive.
+   (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_PRIVATE of new map
+   clause to 1 if GOVD_MAP_PRIVATE flag is present.
+   * omp-low.c (lower_oacc_reductions): Handle map clauses with
+   OMP_CLAUSE_MAP_PRIVATE set in same matter as firstprivate/private.
+   (lower_omp_target): Likewise. Add copy back code for map clauses with
+   OMP_CLAUSE_MAP_PRIVATE set.
+   * tree.h (OMP_CLAUSE_MAP_PRIVATE): Define macro.
+
 2017-08-11  Cesar Philippidis  
 
* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Delete define.
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e481a72..2c10c64 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -102,6 +102,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: must be present already.  */
   GOVD_MAP_FORCE_PRESENT = 524288,
 
+  /* Flag for GOVD_MAP, copy to/from private storage inside offloaded region.  
*/
+  GOVD_MAP_PRIVATE = 1048576,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
   | GOVD_LOCAL)
@@ -6717,6 +6720,21 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree 
decl, unsigned int flags)
   if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
 {
   struct gimplify_omp_ctx *outer_ctx = ctx->outer_context;
+
+  bool gang = false, worker = false, vector = false;
+  for (tree c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+   {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_GANG)
+   gang = true;
+ else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WORKER)
+   worker = true;
+ else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_VECTOR)
+   vector = true;
+   }
+
+  /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
+  bool map_private = !gang && (worker || vector);
+
   while (out

[PING][PATCH][PR sanitizer/77631] Support separate debug info in libbacktrace

2017-09-05 Thread Denis Khalikov


Hello,
this is a ping for that patch:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01958.html

Thanks.

Re: remove unused argument in duplicate_thread_path()

2017-09-05 Thread Jeff Law

On 09/05/2017 06:39 AM, Aldy Hernandez wrote:
> It looks like all remaining uses of duplicate_thread_path (ahem, one),
> pass a NULL to the REGION_COPY argument.
> 
> OK?
> 
OK.
jeff

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Wilco Dijkstra

Bernd Edlinger wrote:

> Combine creates an invalid insn out of these two insns:

Yes it looks like a latent bug. We need to use arm_general_register_operand
as arm_adddi3/subdi3 only allow integer registers. You don't need a new 
predicate
s_register_operand_nv. Also I'd prefer something like arm_general_adddi_operand.

+  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"

The split condition for adddi3 now looks more accurate indeed, although we could
remove the !TARGET_NEON from the split condition as this is always true given
arm_adddi3 uses "TARGET_32BIT && !TARGET_NEON".

Also there are more cases, a quick grep suggests *anddi_notdi_di has the same 
issue.

Wilco

PING: [PATCH] Add -static-pie to GCC driver to create static PIE

2017-09-05 Thread H.J. Lu

On Mon, Aug 28, 2017 at 10:13 AM, H.J. Lu  wrote:
> On Mon, Aug 28, 2017 at 9:10 AM, Joseph Myers  wrote:
>> On Tue, 8 Aug 2017, H.J. Lu wrote:
>>
>>> This patch adds -static-pie to GCC driver to create static PIE.  A static
>>> position independent executable (PIE) is similar to static executable,
>>> but can be loaded at any address without a dynamic linker.  All linker
>>> input files must be compiled with -fpie or -fPIE and linker must support
>>> --no-dynamic-linker to avoid linking with dynamic linker.  "-z text" is
>>> also needed to prevent dynamic relocations in read-only segments.
>>>
>>> OK for trunk?
>>
>> I think the documentation for various options needs updating to clarify
>> exactly what they mean.  (And potentially help text, which for driver
>> options is in gcc.c:display_help with the common.opt text being ignored in
>> that case.)
>
> Done.
>
>> -static is no longer just "prevents linking with the shared libraries" as
>> the documentation says, given it's also overriding (explicit or
>> configure-time default) -pie.  -pie is no longer just "Produce a position
>> independent executable", it's producing a *dynamically linked* PIE.
>
> Done.
>
>>> +@item -static-pie
>>> +@opindex static-pie
>>> +Produce a static position independent executable on targets that support
>>> +it.  A static position independent executable is similar to static
>>> +executable, but can be loaded at any address without a dynamic linker.
>>
>> "to a static executable".
>>
>
> Done.
>
> Here is the updated patch.   OK for trunk?
>
> Thanks.
>

PING.


-- 
H.J.

Re: [PATCH] Another type demotion issue with ubsan (PR sanitizer/82072)

2017-09-05 Thread Jeff Law

On 09/04/2017 08:21 AM, Marek Polacek wrote:
> Vittorio reported another issue with convert_to_integer_1: for
> u = -l;
> where u is unsigned and l is long long the function does:
> 
>  911   return convert (type,
>  912   fold_build1 (ex_form, typex,
>  913convert (typex,
>  914 TREE_OPERAND (expr, 
> 0;
> 
> so instead of
> u = (unsigned int) -l;
> it produced
> u = -(unsigned int) l;
> thus hiding the overflow.  Fixed by moving the recently added check a little
> bit above.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-09-04  Marek Polacek  
> 
>   PR sanitizer/82072
>   * convert.c (convert_to_integer_1) : Move the ubsan
>   check earlier.
> 
>   * c-c++-common/ubsan/pr82072-2.c: New test.
OK.
jeff

Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-09-05 Thread Christophe Lyon

Hi,

On 5 September 2017 at 13:40, Uros Bizjak  wrote:
> On Tue, Sep 5, 2017 at 12:28 PM, Alexander Monakov  wrote:
>> On Mon, 4 Sep 2017, Uros Bizjak wrote:
>>> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
>>>
>>> New failures:
>>> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
>>> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
>>
>> Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
>> fix for PR 71245 removed, and the PR reopened (it's a missed-optimization 
>> PR).
>> I can do all of the above if you agree.
>>
>> I think RTL peepholes are a poor way of fixing the original problem, which
>> actually exists on all targets with separate int/fp registers.  For instance,
>> trunk (without my patch) still gets a far simpler testcase wrong (-O2, 
>> 64-bit):
>
> Please note that 32bit x86 implements atomic DImode access with
> fild/fistp combination, so the mentioned peephole avoids quite costly
> instruction sequence.
>
> For reference, attached patch implements additional peephole2 patterns
> that also handle sequences with blockages.
>
> Uros.

On arm, we also have a similar regression:
FAIL: gcc.target/arm/stl-cond.c scan-assembler stlne

Before the patch, we generated:
ldr r3, [r0]
cmp r3, #0
addne   r3, r0, #4
movne   r2, #0
stlne   r2, [r3]
bx  lr
and now:
ldr r3, [r0]
cmp r3, #0
bxeqlr
mov r2, #0
add r3, r0, #4
stl r2, [r3]
bx  lr

Christophe

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Richard Sandiford

Uros Bizjak  writes:
> On Tue, Sep 5, 2017 at 12:35 PM, Alexander Monakov  wrote:
>> On Tue, 5 Sep 2017, Uros Bizjak wrote:
>>> This patch allows to emit memory_blockage pattern instead of default
>>> asm volatile as a memory blockage. This patch is needed, so targets
>>> (e.g. x86) can define and emit more optimal memory blockage pseudo
>>> insn.
>>
>> Optimal in what sense?  What pattern do you intend to use on x86, and
>> would any target be able to use the same?
>
> You don't have to emit a generic asm-like pattern. This is the same
> situation as with blockage insn, where targets can emit "blockage"
> instead of generic asm insn.
>
> x86 defines memory_blockage as:
>
> (define_expand "memory_blockage"
>   [(set (match_dup 0)
> (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
>   ""
> {
>   operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
>   MEM_VOLATILE_P (operands[0]) = 1;
> })
>
> However, this definition can't be generic, since unspec is used.

If all ports have switched over to define_c_enum for unspecs
(haven't checked), we could probably define something like this
in common.md.

Thanks,
Richard

Re: PING: [Updated, PATCH] i386: Avoid stack realignment if possible

2017-09-05 Thread H.J. Lu

On Fri, Sep 1, 2017 at 11:48 AM, H.J. Lu  wrote:
> On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu  wrote:
>> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
>>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
>>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
>>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>>  On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  
>>> > wrote:
>>> > > Hi!
>>> > >
>>> > > Honza recently changed the i?86 backend, so that it often doesn't
>>> > > do -maccumulate-outgoing-args by default on x86_64.
>>> > > Unfortunately, on some of the here included testcases this regressed
>>> > > quite a bit the generated code.  As AVX vectors are used, the 
>>> > > dynamic
>>> > > realignment code needs to assume e.g. that some of them will need 
>>> > > to be
>>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>>> > > need_drap early as well.  But in when emitting the 
>>> > > prologue/epilogue,
>>> > > if need_drap is set, we don't perform the optimization for leaf 
>>> > > functions
>>> > > which have zero size stack frame, thus we end up with uselessly 
>>> > > doing
>>> > > dynamic stack realignment, setting up DRAP that nothing uses and 
>>> > > later on
>>> > > restore everything back.
>>> > >
>>> > > This patch improves it, if the DRAP register isn't live at the 
>>> > > start of
>>> > > entry bb successor and we aren't going to realign the stack, we 
>>> > > don't
>>> > > need DRAP at all, and even if we need DRAP register, that can't be 
>>> > > the sole
>>> > > reason for doing stack realignment, the prologue code is able to 
>>> > > set up DRAP
>>> > > even without dynamic stack realignment.
>>> > >
>>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> > >
>>> > > 2013-12-20  Jakub Jelinek  
>>> > >
>>> > > PR target/59501
>>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>>> > > drap_reg
>>> > > if !crtl->stack_realign_needed.
>>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live 
>>> > > on entry
>>> > > and stack_realign_needed will be false, clear drap_reg and 
>>> > > need_drap.
>>> > > Optimize leaf functions that don't need stack frame even if
>>> > > crtl->need_drap.
>>> > >
>>> > > * gcc.target/i386/pr59501-1.c: New test.
>>> > > * gcc.target/i386/pr59501-1a.c: New test.
>>> > > * gcc.target/i386/pr59501-2.c: New test.
>>> > > * gcc.target/i386/pr59501-2a.c: New test.
>>> > > * gcc.target/i386/pr59501-3.c: New test.
>>> > > * gcc.target/i386/pr59501-3a.c: New test.
>>> > > * gcc.target/i386/pr59501-4.c: New test.
>>> > > * gcc.target/i386/pr59501-4a.c: New test.
>>> > > * gcc.target/i386/pr59501-5.c: New test.
>>> > > * gcc.target/i386/pr59501-6.c: New test.
>>> >
>>> > LGTM, assuming Jakub is OK with the patch.
>>> >
>>> > Thanks,
>>> > Uros.
>>>
>>> Jakub, can you take a look at this:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>>>
>>
>> Here is the updated patch to fix
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769
>>
>> OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> ---
>> ix86_finalize_stack_frame_flags has been extended to eliminate frame
>> pointer when the new stack frame isn't needed with and without
>> -maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
>> access with larger alignment may be optimized out, to decide if stack
>> realignment is needed, we need to not only check for stack frame access,
>> but also verify the alignment of stack frame access.  Since alignment of
>> memory access via arg_pointer is set up by caller, not by callee, we
>> should find the maximum stack alignment from the stack frame access
>> instructions via stack pointer and frame pointrer to avoid stack
>> realignment when stack alignment needed is less than incoming stack
>> boundary.
>>
>> gcc/
>>
>> PR target/59501
>> PR target/81624
>> PR target/81769
>> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
>> realign stack if stack alignment needed is less than incoming
>> stack boundary.
>>
>> gcc/testsuite/
>>
>> PR target/59501
>> PR target/81624
>> PR target/81769
>> * gcc.target/i386/pr59501-4a.c: Remove xfail.
>> * gcc.target/i386/pr81769-1a.c: New test.
>> * gcc.target/i386/pr81769-1b.c: Likewise.
>> * gcc.target/i386/pr81769-2.c: Likewise.
>> ---
>>  gcc/config/i386/i386.c | 143 
>> ++---
>>  gc

Re: [PATCH 0/2] add unique_ptr class

2017-09-05 Thread Manuel López-Ibáñez


On 05/08/17 20:05, Pedro Alves wrote:

That'd be an "obvious" choice, and I'm not terribly against it,
though I wonder whether it'd be taking over a name that has a wider
scope than intended?  I.e., GNU is a larger set of projects than the
GNU toolchain.  For example, there's Gnulib, which already compiles
as libgnu.a / -lgnu, which might be confusing.  GCC doesn't currently
use Gnulib, but GDB does, and, there was work going on a while ago to
make GCC use gnulib as well.


Unfortunately, that work was never committed, although there are parts that are 
ready to be committed and the rest of the conversion could be done incrementally:


https://gcc.gnu.org/wiki/replacelibibertywithgnulib

Cheers,

Manuel.

Re: [PATCH v2, rs6000] Fix PR81833

2017-09-05 Thread Segher Boessenkool

Hi!

On Tue, Aug 29, 2017 at 04:47:18PM -0500, Bill Schmidt wrote:
> Thanks for approving the previous patch with changes.  I've made those and 
> also
> modified the test case to require VSX hardware for execution.  I duplicated 
> the
> test so we get coverage on P7 BE 32/64 and P8 BE/LE.  I'd appreciate it if you
> could look over the dejagnu instructions once more on these.  Thanks!

> --- gcc/testsuite/gcc.target/powerpc/pr81833-1.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr81833-1.c  (working copy)
> @@ -0,0 +1,59 @@
> +/* PR81833: This used to fail due to improper implementation of vec_msum.  */
> +/* Test case relies on -mcpu=power7 or later.  Currently we don't have
> +   machinery to express that, so we have two separate tests for -mcpu=power7
> +   and -mcpu=power8 to catch 32-bit BE on P7 and 64-bit BE/LE on P8.  */
> +
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> +/* { dg-options "-mcpu=power8 -O2" } */

As I explained off-list, but here's for the record:

Since this is a run test and you use -mcpu=power8, you probably should
use p8vector_hw instead of vsx_hw.

Okay with that.  Thanks!


Segher

Re: [PATCH][aarch64] Enable ifunc resolver attribute by default

2017-09-05 Thread Steve Ellcey

On Mon, 2017-09-04 at 15:40 +0100, Szabolcs Nagy wrote:

> this is not the right default for bionic, uclibc and musl
> 
> (gcc does not distinguish between supporting ifunc in the
> compiler vs runtime, so when ifunc is enabled it is assumed
> the c runtime will have support too, hence libatomic and
> libgcc starts using ifuncs which breaks at runtime)
> 
> so don't change the default if target matches
> *-*-*android*|*-*-*uclibc*|*-*-*musl*)
>
> (i think the default should be kept "no" for these targets
> independently of cpu arch, so the current logic that is
> repeated many places in config.gcc is suboptimal.

I cleaned up config.gcc so default_gnu_indirect_function is set in a
single place now and has the right defaults for android/uclibc/musl.

> and i think the attribute syntax should be always supported
> and this setting should only mean that ifunc use is allowed
> in the runtime libraries.)

I think that might be a reasonable thing to do but should be a
separate patch from this change, so I have not done anything
with that.

I retested on aarch64 but I did not test any of the other platforms
where I moved the setting of default_gnu_indirect_function, but I
don't think I changed any defaults.

Steve Ellcey
sell...@cavium.com


2017-09-05  Steve Ellcey  

* config.gcc: Add new case statement to set
default_gnu_indirect_function.  Remove it from x86_64-*-linux*,
i[34567]86-*, powerpc*-*-linux*spe*, powerpc*-*-linux*, s390-*-linux*,
s390x-*-linux* case statements.   Added aarch64 to the list of
supported architectures.diff --git a/gcc/config.gcc b/gcc/config.gcc
index cc56c57..1a1b2fe 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1516,14 +1516,6 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-gnu* | i[34567]8
 	i[34567]86-*-linux*)
 		tm_file="${tm_file} linux.h linux-android.h"
 		extra_options="${extra_options} linux-android.opt"
-		# Assume modern glibc if not targeting Android nor uclibc.
-		case ${target} in
-		*-*-*android*|*-*-*uclibc*|*-*-*musl*)
-		  ;;
-		*)
-		  default_gnu_indirect_function=yes
-		  ;;
-		esac
 		if test x$enable_targets = xall; then
 			tm_file="${tm_file} i386/x86-64.h i386/gnu-user-common.h i386/gnu-user64.h i386/linux-common.h i386/linux64.h"
 			tm_defines="${tm_defines} TARGET_BI_ARCH=1"
@@ -1582,14 +1574,6 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
 	x86_64-*-linux*)
 		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
 		extra_options="${extra_options} linux-android.opt"
-		# Assume modern glibc if not targeting Android nor uclibc.
-		case ${target} in
-		*-*-*android*|*-*-*uclibc*|*-*-*musl*)
-		  ;;
-		*)
-		  default_gnu_indirect_function=yes
-		  ;;
-		esac
 	  	;;
 	x86_64-*-kfreebsd*-gnu)
 		tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"
@@ -2455,7 +2439,6 @@ powerpc*-*-linux*spe*)
 	tm_file="${tm_file} powerpcspe/linux.h glibc-stdint.h"
 	tmake_file="${tmake_file} powerpcspe/t-ppcos powerpcspe/t-linux"
 	tm_file="${tm_file} powerpcspe/linuxspe.h powerpcspe/e500.h"
-	default_gnu_indirect_function=yes
 	;;
 powerpc*-*-linux*)
 	tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h freebsd-spec.h rs6000/sysv4.h"
@@ -2535,14 +2518,6 @@ powerpc*-*-linux*)
 	if test x${enable_secureplt} = xyes; then
 		tm_file="rs6000/secureplt.h ${tm_file}"
 	fi
-	# Assume modern glibc if not targeting Android nor uclibc.
-	case ${target} in
-	*-*-*android*|*-*-*uclibc*|*-*-*musl*)
-		;;
-	*)
-		default_gnu_indirect_function=yes
-		;;
-	esac
 	;;
 powerpc-wrs-vxworksspe)
 	tm_file="${tm_file} elfos.h freebsd-spec.h powerpcspe/sysv4.h"
@@ -2664,7 +2639,6 @@ rx-*-elf*)
 	tmake_file="${tmake_file} rx/t-rx"
 	;;
 s390-*-linux*)
-	default_gnu_indirect_function=yes
 	tm_file="s390/s390.h dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h s390/linux.h"
 	c_target_objs="${c_target_objs} s390-c.o"
 	cxx_target_objs="${cxx_target_objs} s390-c.o"
@@ -2674,7 +2648,6 @@ s390-*-linux*)
 	tmake_file="${tmake_file} s390/t-s390"
 	;;
 s390x-*-linux*)
-	default_gnu_indirect_function=yes
 	tm_file="s390/s390x.h s390/s390.h dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h s390/linux.h"
 	tm_p_file="linux-protos.h s390/s390-protos.h"
 	c_target_objs="${c_target_objs} s390-c.o"
@@ -3120,6 +3093,20 @@ case ${target} in
 	;;
 esac
 
+# Assume the existence of indirect function support and allow the use of the
+# resolver attribute.
+case ${target} in
+*-*-linux*android*|*-*-linux*uclibc*|*-*-linux*musl*)
+;;
+*-*-linux*)
+	case ${target} in
+	aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | x86_64-*)
+		default_gnu_indirect_function=yes
+		;;
+	esac
+	;;
+esac
+
 # Build mkoffload tool
 case ${target} in
 *-intelmic-* | *-intelmicemul-*)

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-09-05 Thread Kyrill Tkachov



On 05/09/17 11:32, Wilco Dijkstra wrote:

Kyrill Tkachov wrote:


I like the simplifications in the selection logic here :)
However, changing the value for ARM from 6 to 4 looks a bit arbitrary to me.
There's probably a reason why default values for ARM and Thumb-2 are
different
(maybe not a good one) and I'd rather not change it without some code
size data measurements.

To quote myself from the thread:

Long conditional sequences are slow on modern cores - the value 6 for
max_insns_skipped is a few decades out of date as it was meant for ARM2!
Even with -Os the performance loss for larger values is not worth the
small codesize gain (there are many better options to reduce codesize
that actually improve performance at the same time). So using the same
code generation heuristics for ARM and Thumb-2 is a good idea.

A simple codesize comparison on CSiBE shows using 4 rather than 6 for
max_insns_skipped is just 0.07% larger on ARM with -Os. So it's not
obvious that increasing max_insns_skipped in -Os is a useful codesize
optimization...


So I'd rather not let that hold this cleanup patch though, so this is ok
   (assuming a normal bootstrap and testing cycle) without changing the 6
to a 4
and you can propose a change to 4 as a separate patch that can be
discussed on its own.

Based on the above is that really needed? What specific problem do you
expect to occur with the value 4?


Nothing in particular, thanks for providing some numbers.
I think unifying the heuristics for ARM and Thumb2 makes sense and a 
0.07% code size
hit on ARM (i.e. not code-size-optimised Thumb-2) mode seems acceptable 
to me if it allows more performant code.


So this is ok then.
Thanks,
Kyrill


Wilco

Re: [PATCH][aarch64] Enable ifunc resolver attribute by default

2017-09-05 Thread Szabolcs Nagy

On 05/09/17 18:09, Steve Ellcey wrote:
> On Mon, 2017-09-04 at 15:40 +0100, Szabolcs Nagy wrote:
> 
>> this is not the right default for bionic, uclibc and musl
>>
>> (gcc does not distinguish between supporting ifunc in the
>> compiler vs runtime, so when ifunc is enabled it is assumed
>> the c runtime will have support too, hence libatomic and
>> libgcc starts using ifuncs which breaks at runtime)
>>
>> so don't change the default if target matches
>> *-*-*android*|*-*-*uclibc*|*-*-*musl*)
>>
>> (i think the default should be kept "no" for these targets
>> independently of cpu arch, so the current logic that is
>> repeated many places in config.gcc is suboptimal.
> 
> I cleaned up config.gcc so default_gnu_indirect_function is set in a
> single place now and has the right defaults for android/uclibc/musl.
> 

thanks, it looks ok to me (but i cannot approve the patch).

>> and i think the attribute syntax should be always supported
>> and this setting should only mean that ifunc use is allowed
>> in the runtime libraries.)
> 
> I think that might be a reasonable thing to do but should be a
> separate patch from this change, so I have not done anything
> with that.
> 
> I retested on aarch64 but I did not test any of the other platforms
> where I moved the setting of default_gnu_indirect_function, but I
> don't think I changed any defaults.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-09-05  Steve Ellcey  
> 
> * config.gcc: Add new case statement to set
> default_gnu_indirect_function.  Remove it from x86_64-*-linux*,
> i[34567]86-*, powerpc*-*-linux*spe*, powerpc*-*-linux*, s390-*-linux*,
> s390x-*-linux* case statements.   Added aarch64 to the list of
> supported architectures.
>

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2017-09-05 Thread Jeff Law

On 09/05/2017 12:38 AM, Christophe Lyon wrote:
> Hi Jeff,
> 
> 
> On 3 September 2017 at 16:44, Jeff Law  wrote:
>> On 01/13/2016 05:30 AM, Richard Biener wrote:
>>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law  wrote:
 On 01/12/2016 08:11 AM, Richard Biener wrote:
>
> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law  wrote:
>>
>> On 01/11/2016 03:32 AM, Richard Biener wrote:
>>
>>>
>>> Yeah, reassoc is largely about canonicalization.
>>>
 Plus doing it in TER is almost certainly more complex than getting it
 right
 in reassoc to begin with.
>>>
>>>
>>>
>>> I guess canonicalizing differently is ok but you'll still create
>>> ((a & b) & 1) & c then if you only change the above place.
>>
>>
>> What's best for that expression would depend on factors like whether or
>> not
>> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
>> parallelism
>> while (((a & b) & c) & 1) is not good for parallelism, but does expose
>> the
>> bit test.
>>
>> reassoc currently generates ((a & 1) & b) & c which is dreadful as
>> there's
>> no ILP or chance of creating a bit test.  My patch shuffles things
>> around,
>> but still doesn't expose the ILP or bit test in the 4 operand case.
>> Based
>> on the comments in reassoc, it didn't seem like the author thought
>> anything
>> beyond the 3-operand case was worth handling. So my patch just handles
>> the
>> 3-operand case.
>>
>>
>>
>>>
>>> So I'm not sure what pattern the backend is looking for?
>>
>>
>> It just wants the constant last in the sequence.  That exposes bit clear,
>> set, flip, test, etc idioms.
>
>
> But those don't feed another bit operation, right?  Thus we'd like to see
> ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
> are designed to feed conditionals (aka CC consuming ops)?

 At the gimple level they could feed a conditional, or be part of a series 
 of
 ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
 level they'll feed CC consumers and bit manipulations of pseudos or memory.

 For the 3-op case, we always want the constant last.  For the 4-op case 
 it's
 less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 
 1)
 & c.
>>>
>>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
>>> to special-case bitwise ops.  The "real" fix to the sorting heuristic would 
>>> be
>>> to sort constants at the opposite end.
>>>
>>> That might be too invasive right now but there is another "convenient" 
>>> place:
>>>
>>>   /* If the operand vector is now empty, all operands were
>>>  consumed by the __builtin_powi optimization.  */
>>> ...
>>>   else
>>> {
>>>   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
>>>   int ops_num = ops.length ();
>>>   int width = get_reassociation_width (ops_num, rhs_code, 
>>> mode);
>>>   tree new_lhs = lhs;
>>>
>>>   if (dump_file && (dump_flags & TDF_DETAILS))
>>> fprintf (dump_file,
>>>  "Width = %d was chosen for
>>> reassociation\n", width);
>>>
>>> at this point you can check rhs_code and move the (single) constant
>>> entry in ops (if there is any constant entry) from .last () to the 
>>> beginning.
>>>
>>> That'll get the 4 operand case correct as well and properly models
>>> "constant last" for the specified operation kind.
>> Resurrecting an old thread...  Just now getting around to flushing this
>> out of the queue.
>>
>> To recap, given something like (x & y) & C reassociation will turn that
>> into (x & C) & y.  It's functionally equivalent, but it will inhibit
>> generation of bit test instructions.
>>
>> I originally hacked up swap_ops_for_binary_stmt.  You requested that
>> change be made in reassociate_bb so that it would apply to cases where
>> there are more than 3 args.
>>
>> So that's what this patch does.   OK for the trunk now?
>>
>> Bootstrapped and regression tested on x86_64.  Also tested the new
>> testcase on m68k.
>>
>>
>> commit c10ae0339674c27c89a1fa1904217a55bf530cb3
>> Author: Jeff Law 
>> Date:   Sun Sep 3 10:42:30 2017 -0400
>>
>> 2017-09-03  Jeff Law  
>>
>> PR tree-optimization/64910
>> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>> swap the first and last operand if the last is a constant.
>>
>> PR tree-optimization/64910
>> * gcc.dg/tree-ssa/pr64910-2.c: New test.
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 3f632ca31c2..2c9a8c8265a 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2017-09-03  Jeff Law  
>> +
>> +   PR tree-optimization/64910

Re: [PATCH 0/2] add unique_ptr class

2017-09-05 Thread Pedro Alves

On 09/05/2017 05:52 PM, Manuel López-Ibáñez wrote:
> On 05/08/17 20:05, Pedro Alves wrote:
>> That'd be an "obvious" choice, and I'm not terribly against it,
>> though I wonder whether it'd be taking over a name that has a wider
>> scope than intended?  I.e., GNU is a larger set of projects than the
>> GNU toolchain.  For example, there's Gnulib, which already compiles
>> as libgnu.a / -lgnu, which might be confusing.  GCC doesn't currently
>> use Gnulib, but GDB does, and, there was work going on a while ago to
>> make GCC use gnulib as well.
> 
> Unfortunately, that work was never committed, although there are parts
> that are ready to be committed and the rest of the conversion could be
> done incrementally:
> 
> https://gcc.gnu.org/wiki/replacelibibertywithgnulib

Yeah, ISTR it was close, though there were a couple things
that needed addressing still.

The wiki seems to miss a pointer to following iterations/review
of that patch (mailing list archives don't cross month
boundaries...).  You can find it starting here:
 https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01208.html
I think this was the latest version posted:
 https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01554.html

Thanks,
Pedro Alves

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Kyrill Tkachov


Hi Bernd,

On 05/09/17 15:25, Bernd Edlinger wrote:

Hi Christophe,

On 09/05/17 10:45, Christophe Lyon wrote:

Hi Bernd,


On 4 September 2017 at 16:52, Kyrill  Tkachov
 wrote:

On 29/04/17 18:45, Bernd Edlinger wrote:

Ping...

I attached a rebased version since there was a merge conflict in
the xordi3 pattern, otherwise the patch is still identical.
It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
early when the target has no neon or iwmmxt.


Thanks
Bernd.



On 11/28/16 20:42, Bernd Edlinger wrote:

On 11/25/16 12:30, Ramana Radhakrishnan wrote:

On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
 wrote:

Hi!

This improves the stack usage on the sha512 test case for the case
without hardware fpu and without iwmmxt by splitting all di-mode
patterns right while expanding which is similar to what the
shift-pattern
does.  It does nothing in the case iwmmxt and fpu=neon or vfp as well
as
thumb1.


I would go further and do this in the absence of Neon, the VFP unit
being there doesn't help with DImode operations i.e. we do not have 64
bit integer arithmetic instructions without Neon. The main reason why
we have the DImode patterns split so late is to give a chance for
folks who want to do 64 bit arithmetic in Neon a chance to make this
work as well as support some of the 64 bit Neon intrinsics which IIRC
map down to these instructions. Doing this just for soft-float doesn't
improve the default case only. I don't usually test iwmmxt and I'm not
sure who has the ability to do so, thus keeping this restriction for
iwMMX is fine.



Yes I understand, thanks for pointing that out.

I was not aware what iwmmxt exists at all, but I noticed that most
64bit expansions work completely different, and would break if we split
the pattern early.

I can however only look at the assembler outout for iwmmxt, and make
sure that the stack usage does not get worse.

Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
for the test cases, and vfp and soft-float at around 270 bytes stack
usage.


It reduces the stack usage from 2300 to near optimal 272 bytes (!).

Note this also splits many ldrd/strd instructions and therefore I will
post a followup-patch that mitigates this effect by enabling the
ldrd/strd
peephole optimization after the necessary reg-testing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.

What do you mean by arm-linux-gnueabihf - when folks say that I
interpret it as --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 or (--with-fpu=neon).

If you've really bootstrapped and regtested it on armhf, doesn't this
patch as it stand have no effect there i.e. no change ?
arm-linux-gnueabihf usually means to me someone has configured with
--with-float=hard, so there are no regressions in the hard float ABI
case,


I know it proves little.  When I say arm-linux-gnueabihf
I do in fact mean --enable-languages=all,ada,go,obj-c++
--with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
--with-float=hard.

My main interest in the stack usage is of course not because of linux,
but because of eCos where we have very small task stacks and in fact
no fpu support by the O/S at all, so that patch is exactly what we need.


Bootstrapped and reg-tested on arm-linux-gnueabihf
Is it OK for trunk?


The code is ok.
AFAICS testing this with --with-fpu=vfpv3-d16 does exercise the new code as
the splits
will happen for !TARGET_NEON (it is of course !TARGET_IWMMXT and
TARGET_IWMMXT2
is irrelevant here).

So this is ok for trunk.
Thanks, and sorry again for the delay.
Kyrill


This patch (r251663) causes a regression on armeb-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu vfpv3-d16-fp16
FAIL:gcc.dg/vect/vect-singleton_1.c (internal compiler error)
FAIL:gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects
(internal compiler error)

the test passes if gcc is configured --with-fpu neon-fp16


Thank you very much for what you do!

I am able to reproduce this.

Combine creates an invalid insn out of these two insns:

(insn 12 8 18 2 (parallel [
  (set (reg:DI 122)
  (plus:DI (reg:DI 116 [ aD.5331 ])
  (reg:DI 119 [ bD.5332 ])))
  (clobber (reg:CC 100 cc))
  ]) "vect-singleton_1.c":28 1 {*arm_adddi3}
   (expr_list:REG_DEAD (reg:DI 119 [ bD.5332 ])
  (expr_list:REG_DEAD (reg:DI 116 [ aD.5331 ])
  (expr_list:REG_UNUSED (reg:CC 100 cc)
  (nil)
(insn 18 12 19 2 (set (reg/i:DI 16 s0)
  (reg:DI 122)) "vect-singleton_1.c":28 650 {*movdi_vfp}
   (expr_list:REG_DEAD (reg:DI 122)
  (nil)))

=>

(insn 18 12 19 2 (parallel [
  (set (reg/i:DI 16 s0)
  (plus:DI (reg:DI 116 [ aD.5331 ])
  (reg:DI 119 [ bD.5332 ])))
  (clobber (reg:CC 100 cc))
  ]) "vect-singleton_1.c":28 1 {*arm_adddi3}
   (expr_list:REG_U

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Bernd Edlinger

On 09/05/17 17:02, Wilco Dijkstra wrote:
> Bernd Edlinger wrote:
> 
>> Combine creates an invalid insn out of these two insns:
> 
> Yes it looks like a latent bug. We need to use arm_general_register_operand
> as arm_adddi3/subdi3 only allow integer registers. You don't need a new 
> predicate
> s_register_operand_nv. Also I'd prefer something like 
> arm_general_adddi_operand.
> 

Thanks, attached is a patch following your suggestion.

> +  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"
> 
> The split condition for adddi3 now looks more accurate indeed, although we 
> could
> remove the !TARGET_NEON from the split condition as this is always true given
> arm_adddi3 uses "TARGET_32BIT && !TARGET_NEON".
> 

No, the split condition does not begin with "&& TARGET_32BIT...".
Therefore the split is enabled in TARGET_NEON after reload_completed.
And it is invoked from adddi3_neon for all alternatives without vfp
registers:

   switch (which_alternative)
 {
 case 0: /* fall through */
 case 3: return "vadd.i64\t%P0, %P1, %P2";
 case 1: return "#";
 case 2: return "#";
 case 4: return "#";
 case 5: return "#";
 case 6: return "#";



> Also there are more cases, a quick grep suggests *anddi_notdi_di has the same 
> issue.
> 

Yes, that pattern can be cleaned up in a follow-up patch.
Note this splitter is invoked from bicdi3_neon as well.
However I think anddi_notdi_di should be safe as long as it is enabled
after reload_completed (which is probably a bug).


Bernd.

> Wilco
> 
2017-09-05  Bernd Edlinger  

	PR target/77308
	* config/arm/predicates.md (arm_general_adddi_operand): Create new
	non-vfp predicate.
	* config/arm/arm.md (*arm_adddi3, *arm_subdi3): Use new predicates.

Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md	(revision 251663)
+++ gcc/config/arm/arm.md	(working copy)
@@ -457,14 +457,13 @@
 )
 
 (define_insn_and_split "*arm_adddi3"
-  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r,&r,&r,&r")
-	(plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r")
-		 (match_operand:DI 2 "arm_adddi_operand"  "r,  0, r, Dd, Dd")))
+  [(set (match_operand:DI  0 "arm_general_register_operand" "=&r,&r,&r,&r,&r")
+	(plus:DI (match_operand:DI 1 "arm_general_register_operand" "%0, 0, r, 0, r")
+		 (match_operand:DI 2 "arm_general_adddi_operand""r,  0, r, Dd, Dd")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT && !TARGET_NEON"
   "#"
-  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)
-   && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
+  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"
   [(parallel [(set (reg:CC_C CC_REGNUM)
 		   (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
  (match_dup 1)))
@@ -1263,9 +1262,9 @@
 )
 
 (define_insn_and_split "*arm_subdi3"
-  [(set (match_operand:DI   0 "s_register_operand" "=&r,&r,&r")
-	(minus:DI (match_operand:DI 1 "s_register_operand" "0,r,0")
-		  (match_operand:DI 2 "s_register_operand" "r,0,0")))
+  [(set (match_operand:DI   0 "arm_general_register_operand" "=&r,&r,&r")
+	(minus:DI (match_operand:DI 1 "arm_general_register_operand" "0,r,0")
+		  (match_operand:DI 2 "arm_general_register_operand" "r,0,0")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT && !TARGET_NEON"
   "#"  ; "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2"
Index: gcc/config/arm/predicates.md
===
--- gcc/config/arm/predicates.md	(revision 251663)
+++ gcc/config/arm/predicates.md	(working copy)
@@ -82,6 +82,11 @@
 	  || REGNO (op) >= FIRST_PSEUDO_REGISTER));
 })
 
+(define_predicate "arm_general_adddi_operand"
+  (ior (match_operand 0 "arm_general_register_operand")
+   (and (match_code "const_int")
+	(match_test "const_ok_for_dimode_op (INTVAL (op), PLUS)"
+
 (define_predicate "vfp_register_operand"
   (match_code "reg,subreg")
 {

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Kyrill Tkachov



On 05/09/17 18:48, Bernd Edlinger wrote:

On 09/05/17 17:02, Wilco Dijkstra wrote:

Bernd Edlinger wrote:


Combine creates an invalid insn out of these two insns:

Yes it looks like a latent bug. We need to use arm_general_register_operand
as arm_adddi3/subdi3 only allow integer registers. You don't need a new 
predicate
s_register_operand_nv. Also I'd prefer something like arm_general_adddi_operand.


Thanks, attached is a patch following your suggestion.


+  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"

The split condition for adddi3 now looks more accurate indeed, although we could
remove the !TARGET_NEON from the split condition as this is always true given
arm_adddi3 uses "TARGET_32BIT && !TARGET_NEON".


No, the split condition does not begin with "&& TARGET_32BIT...".
Therefore the split is enabled in TARGET_NEON after reload_completed.
And it is invoked from adddi3_neon for all alternatives without vfp
registers:

switch (which_alternative)
  {
  case 0: /* fall through */
  case 3: return "vadd.i64\t%P0, %P1, %P2";
  case 1: return "#";
  case 2: return "#";
  case 4: return "#";
  case 5: return "#";
  case 6: return "#";




Also there are more cases, a quick grep suggests *anddi_notdi_di has the same 
issue.


Yes, that pattern can be cleaned up in a follow-up patch.
Note this splitter is invoked from bicdi3_neon as well.
However I think anddi_notdi_di should be safe as long as it is enabled
after reload_completed (which is probably a bug).



Thanks, that's what I had in mind in my other reply.
This is ok if testing comes back ok.

Kyrill



Bernd.


Wilco

Re: [PATCH] scheduler bug fix for AArch64 insn fusing SCHED_GROUP usage

2017-09-05 Thread Jim Wilson

ping^2

On Fri, Jul 21, 2017 at 3:09 PM, Jim Wilson  wrote:
> Ping.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00779.html
>
> On Thu, Jul 13, 2017 at 3:00 PM, Jim Wilson  wrote:
>> The AArch64 port uses SCHED_GROUP to mark instructions that get fused
>> at issue time, to ensure that they will be issued together.  However,
>> in the scheduler, use of a SCHED_GROUP forces all other instructions
>> to issue in the next cycle.  This is wrong for AArch64 ports using
>> insn fusing which can issue multiple insns per cycle, as aarch64
>> SCHED_GROUP insns can all issue in the same cycle, and other insns can
>> issue in the same cycle also.
>>
>> I put a testcase and some info in bug 81434.
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81434
>>
>> The attached patch fixes the problem.  The behavior in pass == 0 is
>> same as now.  All non sched group insns are ignored, and all sched
>> group insns are checked to see if they need to be queued for a latter
>> cycle.  The difference is in the second pass where non sched group
>> insns are queued for a latter cycle only if there is a sched group
>> insn that got queued.  Since sched group insns always sort to the top
>> of the list of insns to schedule, all sched group insns still get
>> scheduled together as before.
>>
>> This has been tested with an Aarch64 bootstrap and make check.
>>
>> OK?
>>
>> Jim

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Christophe Lyon

On 5 September 2017 at 19:53, Kyrill  Tkachov
 wrote:
>
> On 05/09/17 18:48, Bernd Edlinger wrote:
>>
>> On 09/05/17 17:02, Wilco Dijkstra wrote:
>>>
>>> Bernd Edlinger wrote:
>>>
 Combine creates an invalid insn out of these two insns:
>>>
>>> Yes it looks like a latent bug. We need to use
>>> arm_general_register_operand
>>> as arm_adddi3/subdi3 only allow integer registers. You don't need a new
>>> predicate
>>> s_register_operand_nv. Also I'd prefer something like
>>> arm_general_adddi_operand.
>>>
>> Thanks, attached is a patch following your suggestion.
>>
>>> +  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) ||
>>> reload_completed)"
>>>
>>> The split condition for adddi3 now looks more accurate indeed, although
>>> we could
>>> remove the !TARGET_NEON from the split condition as this is always true
>>> given
>>> arm_adddi3 uses "TARGET_32BIT && !TARGET_NEON".
>>>
>> No, the split condition does not begin with "&& TARGET_32BIT...".
>> Therefore the split is enabled in TARGET_NEON after reload_completed.
>> And it is invoked from adddi3_neon for all alternatives without vfp
>> registers:
>>
>> switch (which_alternative)
>>   {
>>   case 0: /* fall through */
>>   case 3: return "vadd.i64\t%P0, %P1, %P2";
>>   case 1: return "#";
>>   case 2: return "#";
>>   case 4: return "#";
>>   case 5: return "#";
>>   case 6: return "#";
>>
>>
>>
>>> Also there are more cases, a quick grep suggests *anddi_notdi_di has the
>>> same issue.
>>>
>> Yes, that pattern can be cleaned up in a follow-up patch.
>> Note this splitter is invoked from bicdi3_neon as well.
>> However I think anddi_notdi_di should be safe as long as it is enabled
>> after reload_completed (which is probably a bug).
>>
>
> Thanks, that's what I had in mind in my other reply.
> This is ok if testing comes back ok.
>

I've submitted the patch for testing, I'll let you know about the results.

Christophe

> Kyrill
>
>
>> Bernd.
>>
>>> Wilco
>>>
>

Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch (5/8)]

2017-09-05 Thread Tamar Christina

> 
> 
> From: James Greenhalgh 
> Sent: Monday, September 4, 2017 12:01 PM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch (5/8)]
> 
> On Fri, Sep 01, 2017 at 02:22:17PM +0100, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds the instructions for Dot Product to AArch64 along
> > with the intrinsics and vectorizer pattern.
> >
> > Armv8.2-a dot product supports 8-bit element values both
> > signed and unsigned.
> >
> > Dot product is available from Arm8.2-a and onwards.
> >
> > Regtested and bootstrapped on aarch64-none-elf and no issues.
> >
> > Ok for trunk?
> >
> > gcc/
> > 2017-09-01  Tamar Christina  
> >
> >   * config/aarch64/aarch64-builtins.c
> >   (aarch64_types_quadopu_lane_qualifiers): New.
> >   (TYPES_QUADOPU_LANE): New.
> >   * config/aarch64/aarch64-simd.md (aarch64_dot): New.
> >   (dot_prod, aarch64_dot_lane): New.
> >   (aarch64_dot_laneq): New.
> >   * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New.
> >   (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New.
> >   * config/aarch64/iterators.md (UNSPEC_SDOT, UNSPEC_UDOT): New.
> >   (DOT_MODE, dot_mode, Vdottype, DOTPROD): New.
> >   (sur): Add SDOT and UDOT.
> >
> > --
> 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 
> > f3e084f8778d70c82823b92fa80ff96021ad26db..21d46c84ab317c2d62afdf8c48117886aaf483b0
> >  100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -386,6 +386,87 @@
> >  }
> >  )
> >
> > +;; These instructions map to the __builtins for the Dot Product operations.
> > +(define_insn "aarch64_dot"
> > +  [(set (match_operand:VS 0 "register_operand" "=w")
> > + (unspec:VS [(match_operand:VS 1 "register_operand" "0")
> > + (match_operand: 2 "register_operand" "w")
> > + (match_operand: 3 "register_operand" "w")]
> > + DOTPROD))]
> > +  "TARGET_DOTPROD"
> > +  "dot\\t%0., %2., %3."
> > +  [(set_attr "type" "neon_dot")]
> 
> Would there be a small benefit in modelling this as:
> 
>   [(set (match_operand:VS 0 "register_operand" "=w")
> (add:VS ((match_operand:VS 1 "register_operand" "0")
>  (unsepc:VS [(match_operand: 2 "register_operand" 
> "w")
> (match_operand: 3 "register_operand" "w")]
> DOTPROD)))]
> 

Maybe, I can't think of anything at the moment, but it certainly won't hurt.

> 
> > +)
> > +
> > +;; These expands map to the Dot Product optab the vectorizer checks for.
> > +;; The auto-vectorizer expects a dot product builtin that also does an
> > +;; accumulation into the provided register.
> > +;; Given the following pattern
> > +;;
> > +;; for (i=0; i > +;; c = a[i] * b[i];
> > +;; r += c;
> > +;; }
> > +;; return result;
> > +;;
> > +;; This can be auto-vectorized to
> > +;; r  = a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
> > +;;
> > +;; given enough iterations.  However the vectorizer can keep unrolling the 
> > loop
> > +;; r += a[4]*b[4] + a[5]*b[5] + a[6]*b[6] + a[7]*b[7];
> > +;; r += a[8]*b[8] + a[9]*b[9] + a[10]*b[10] + a[11]*b[11];
> > +;; ...
> > +;;
> > +;; and so the vectorizer provides r, in which the result has to be 
> > accumulated.
> > +(define_expand "dot_prod"
> > +  [(set (match_operand:VS 0 "register_operand")
> > + (unspec:VS [(match_operand: 1 "register_operand")
> > + (match_operand: 2 "register_operand")
> > + (match_operand:VS 3 "register_operand")]
> > + DOTPROD))]
> 
> This is just an expand that always ends in a DONE, so doesn't need the
> full description here, just:
> 
>   [(match_operand:VS 0 "register_operand)
>(match_operand: 1 "register_operand")
>(match_operand: 2 "register_operand")
>(match_operand:VS 3 "register_operand")]

yes but I use the unspec to match the  iterator to generate the signed and 
unsigned
versions of the optab.

> 
> > diff --git a/gcc/config/aarch64/iterators.md 
> > b/gcc/config/aarch64/iterators.md
> > index 
> > cceb57525c7aa44933419bd317b1f03a7b76f4c4..533c12cca916669195e9b094527ee0de31542b12
> >  100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -354,6 +354,8 @@
> >  UNSPEC_SQRDMLSH ; Used in aarch64-simd.md.
> >  UNSPEC_FMAXNM   ; Used in aarch64-simd.md.
> >  UNSPEC_FMINNM   ; Used in aarch64-simd.md.
> > +UNSPEC_SDOT  ; Used in aarch64-simd.md.
> > +UNSPEC_UDOT  ; Used in aarch64-simd.md.
> >  ])
> >
> >  ;; --
> > @@ -810,6 +812,13 @@
> >  (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
> >  (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
> >
> > +;; Mapping attribute for Dot

Re: [PATCH v2] Python testcases to check DWARF output

2017-09-05 Thread Mike Stump

On Sep 4, 2017, at 2:22 AM, Pierre-Marie de Rodat  wrote:
> 
> I would like to ping for the patch I submitted at 
> . Thank you in 
> advance!

I've included the dwarf people on the cc list.  Seems like they may have an 
opinion on the direction or the patch itself.  I was fine with the patch from 
the larger testsuite perspective.

[C++ PATCH] two cleanups

2017-09-05 Thread Nathan Sidwell

I noticed an  'if (cond)' nested inside an 'if (exact-same-cond)'. Fixed 
thusly.  Also, pt.c was using '"\' to deal with a long error 
message, and not 5 lines further just having an overly long line.  Fixed 
by using strinc constant concatenation.  I also added a %<...%> around a 
fragment that we're elling the user to insert.


applied to trunk.

nathan
--
Nathan Sidwell
2017-09-05  Nathan Sidwell  

	* class.c (unreverse_member_declarations): Remove extraneous if.
	* pt.c (push_template_decl_real): Use string concatenation, not
	\.  Add %<..%>.

Index: class.c
===
--- class.c	(revision 251722)
+++ class.c	(working copy)
@@ -7070,8 +7001,7 @@ unreverse_member_declarations (tree t)
   if (prev)
 {
   DECL_CHAIN (TYPE_FIELDS (t)) = x;
-  if (prev)
-	TYPE_FIELDS (t) = prev;
+  TYPE_FIELDS (t) = prev;
 }
 }
 
Index: pt.c
===
--- pt.c	(revision 251722)
+++ pt.c	(working copy)
@@ -5572,11 +5572,11 @@ push_template_decl_real (tree decl, bool
 	  (TI_ARGS (tinfo),
 	   TI_ARGS (get_template_info (DECL_TEMPLATE_RESULT (tmpl)
 	{
-	  error ("\
-template arguments to %qD do not match original template %qD",
-		 decl, DECL_TEMPLATE_RESULT (tmpl));
+	  error ("template arguments to %qD do not match original"
+		 "template %qD", decl, DECL_TEMPLATE_RESULT (tmpl));
 	  if (!uses_template_parms (TI_ARGS (tinfo)))
-	inform (input_location, "use template<> for an explicit specialization");
+	inform (input_location, "use %%> for"
+		" an explicit specialization");
 	  /* Avoid crash in import_export_decl.  */
 	  DECL_INTERFACE_KNOWN (decl) = 1;
 	  return error_mark_node;

[C++ PATCH] CONV_OP accessors

2017-09-05 Thread Nathan Sidwell

the conv op accessor macros are overly conservative.  We never call 
DECL_CONV_FN_P when DECL_NAME is null, and we never call 
DECL_CONV_FN_TYPE when the fn isn't a conversion operator.  Removed 
those checks from these macros.


Further, we never use DECL_TEMPLATE_CONV_FN_P, execpt to set that flag. 
So that can simply be nuked.


We reuse the underlying bitfield for static data member arrays of 
unknown bound.  I renamed the field.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-09-05  Nathan Sidwell  

	* cp-tree.h (lang_decl_base): Rename template_conv_p to
	unknown_bound_p.
	(DECL_CONV_FN_P): Don't check NULL DECL_NAME.
	(DECL_CONV_FN_TYPE): FN must be conv op.
	(DECL_TEMPLATE_CONV_FN_P): Delete.
	(VAR_HAD_UNKNOWN_BOUND, SET_VAR_HAD_UNKNOWN_BOUND): Adjust.
	* pt.c (push_template_decl_real): Delete DECL_TEMPLATE_CONV_FN_P
	setting.

Index: cp-tree.h
===
--- cp-tree.h	(revision 251722)
+++ cp-tree.h	(working copy)
@@ -2451,7 +2451,7 @@ struct GTY(()) lang_decl_base {
   unsigned anticipated_p : 1;		   /* fn, type or template */
   /* anticipated_p reused as DECL_OMP_PRIVATIZED_MEMBER in var */
   unsigned friend_or_tls : 1;		   /* var, fn, type or template */
-  unsigned template_conv_p : 1;		   /* var or template */
+  unsigned unknown_bound_p : 1;		   /* var */
   unsigned odr_used : 1;		   /* var or fn */
   unsigned u2sel : 1;
   unsigned concept_p : 1;  /* applies to vars and functions */
@@ -2807,28 +2807,20 @@ struct GTY(()) lang_decl {
|| DECL_BASE_DESTRUCTOR_P (NODE)))
 
 /* Nonzero if NODE is a user-defined conversion operator.  */
-#define DECL_CONV_FN_P(NODE) \
-  (DECL_NAME (NODE) && IDENTIFIER_CONV_OP_P (DECL_NAME (NODE)))
+#define DECL_CONV_FN_P(NODE) IDENTIFIER_CONV_OP_P (DECL_NAME (NODE))
 
-/* If FN is a conversion operator, the type to which it converts.
-   Otherwise, NULL_TREE.  */
+/* The type to which conversion operator FN converts to.   */
 #define DECL_CONV_FN_TYPE(FN) \
-  (DECL_CONV_FN_P (FN) ? TREE_TYPE (DECL_NAME (FN)) : NULL_TREE)
-
-/* Nonzero if NODE, which is a TEMPLATE_DECL, is a template
-   conversion operator to a type dependent on the innermost template
-   args.  */
-#define DECL_TEMPLATE_CONV_FN_P(NODE) \
-  (DECL_LANG_SPECIFIC (TEMPLATE_DECL_CHECK (NODE))->u.base.template_conv_p)
+  TREE_TYPE ((gcc_checking_assert (DECL_CONV_FN_P (FN)), DECL_NAME (FN)))
 
 /* Nonzero if NODE, a static data member, was declared in its class as an
array of unknown bound.  */
 #define VAR_HAD_UNKNOWN_BOUND(NODE)			\
   (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))		\
-   ? DECL_LANG_SPECIFIC (NODE)->u.base.template_conv_p	\
+   ? DECL_LANG_SPECIFIC (NODE)->u.base.unknown_bound_p	\
: false)
 #define SET_VAR_HAD_UNKNOWN_BOUND(NODE) \
-  (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.template_conv_p = true)
+  (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.unknown_bound_p = true)
 
 /* Set the overloaded operator code for NODE to CODE.  */
 #define SET_OVERLOADED_OPERATOR_CODE(NODE, CODE) \
Index: pt.c
===
--- pt.c	(revision 251724)
+++ pt.c	(working copy)
@@ -5608,25 +5608,13 @@ push_template_decl_real (tree decl, bool
   if (is_primary)
 {
   tree parms = DECL_TEMPLATE_PARMS (tmpl);
-  int i;
 
   DECL_PRIMARY_TEMPLATE (tmpl) = tmpl;
-  if (DECL_CONV_FN_P (tmpl))
-	{
-	  int depth = TMPL_PARMS_DEPTH (parms);
-
-	  /* It is a conversion operator. See if the type converted to
-	 depends on innermost template operands.  */
-
-	  if (uses_template_parms_level (TREE_TYPE (TREE_TYPE (tmpl)),
-	 depth))
-	DECL_TEMPLATE_CONV_FN_P (tmpl) = 1;
-	}
 
   /* Give template template parms a DECL_CONTEXT of the template
 	 for which they are a parameter.  */
   parms = INNERMOST_TEMPLATE_PARMS (parms);
-  for (i = TREE_VEC_LENGTH (parms) - 1; i >= 0; --i)
+  for (int i = TREE_VEC_LENGTH (parms) - 1; i >= 0; --i)
 	{
 	  tree parm = TREE_VALUE (TREE_VEC_ELT (parms, i));
 	  if (TREE_CODE (parm) == TEMPLATE_DECL)

[Ping^2][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c

2017-09-05 Thread Jiong Wang

2017-08-22 9:18 GMT+01:00 Jiong Wang :
> On 10/08/17 17:39, Jiong Wang wrote:
>>
>> Hi,
>>
>>   A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for
>> ARMv8.3-A
>> return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA
>> vendor
>> extension space.
>>
>>   This patch adds necessary code to make it available to external, the GDB
>> patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is
>> intended
>> to use it.
>>
>>   A new DW_CFA_DUP for it is added in dwarf2.def.  The use of DW_CFA_DUP
>> is to
>> avoid duplicated case value issue when included in libiberty/dwarfnames.
>>
>>   Native x86 builds OK to make sure no macro expanding errors.
>>
>>   OK for trunk?
>>
>> 2017-08-10  Jiong Wang  
>>
>> include/
>> * dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
>> * dwarf2.h (DW_CFA_DUP): New define.
>>
>> libiberty/
>> * dwarfnames.c (DW_CFA_DUP): New define.
>>
>
> Ping~

Ping^2

[C++ PATCH] more conversion operator changes

2017-09-05 Thread Nathan Sidwell

This patch moves add_method's management of METHOD_VEC into a new 
function in name-lookup.c.  That function will create a slot, if there 
isn't one with the requested name -- we know in that case that 
add_method will succeed in adding the new method.  The one quirk is the 
IDENTIFIER_BINDING machinery, which will perform a lookup in that case, 
so we have to protect lookup_fnfields_nolazy.


I moved the TYPE_HAS_CONVERSION setting into 
grok_special_member_properties, as that seemed a more appropriate location.


Finally I gave the conversion-operator marker function a FUNCTION_TYPE. 
that stops the tree_dumping machinery from barfing, which is annoying 
inside GDB.


Applied to trunk.

nathan

--
Nathan Sidwell
2017-09-05  Nathan Sidwell  

	* class.c (add_method): Move slot search and insertion to ...
	* name-lookup.c (get_method_slot): ... this new function.
	(lookup_fnfields_slot_nolazy): Cope with NULL slot.
	* name-lookup.h (get_method_slot): Declare.
	* decl.c (cxx_init_decl_processinng): Give conv_op_marker a more
	realistic type.
	(grok_special_member_properties): Set
	TYPE_HAS_CONVERSION. Expicitly look at DECL_NAME for specialness.
	Improve TYPE_HAS_CONSTEXPR_CTOR setting.	

Index: class.c
===
--- class.c	(revision 251724)
+++ class.c	(working copy)
@@ -1011,57 +1011,12 @@ add_method (tree type, tree method, bool
   if (method == error_mark_node)
 return false;
 
-  vec *method_vec = CLASSTYPE_METHOD_VEC (type);
-  if (!method_vec)
-{
-  /* Make a new method vector.  We start with 8 entries.  */
-  vec_alloc (method_vec, 8);
-  CLASSTYPE_METHOD_VEC (type) = method_vec;
-}
-
   /* Maintain TYPE_HAS_USER_CONSTRUCTOR, etc.  */
   grok_special_member_properties (method);
 
-  bool insert_p = true;
-  tree method_name = DECL_NAME (method);
-  bool complete_p = COMPLETE_TYPE_P (type);
-  bool conv_p = IDENTIFIER_CONV_OP_P (method_name);
-
-  if (conv_p)
-method_name = conv_op_identifier;
-
-  /* See if we already have an entry with this name.  */
-  unsigned slot;
-  tree m;
-  for (slot = 0; vec_safe_iterate (method_vec, slot, &m); ++slot)
-{
-  m = DECL_NAME (OVL_FIRST (m));
-  if (m == method_name)
-	{
-	  insert_p = false;
-	  break;
-	}
-  if (complete_p && m > method_name)
-	break;
-}
-  tree current_fns = insert_p ? NULL_TREE : (*method_vec)[slot];
+  tree *slot = get_method_slot (type, DECL_NAME (method));
+  tree current_fns = *slot;
 
-  tree conv_marker = NULL_TREE;
-  if (conv_p)
-{
-  /* For conversion operators, we prepend a dummy overload
-	 pointing at conv_op_marker.  That function's DECL_NAME is
-	 conv_op_identifier, so we can use identifier equality to
-	 locate it.  */
-  if (current_fns)
-	{
-	  gcc_checking_assert (OVL_FUNCTION (current_fns) == conv_op_marker);
-	  conv_marker = current_fns;
-	  current_fns = OVL_CHAIN (current_fns);
-	}
-  else
-	conv_marker = ovl_make (conv_op_marker, NULL_TREE);
-}
   gcc_assert (!DECL_EXTERN_C_P (method));
 
   /* Check to see if we've already got this method.  */
@@ -1209,36 +1164,11 @@ add_method (tree type, tree method, bool
 
   current_fns = ovl_insert (method, current_fns, via_using);
 
-  if (conv_p)
-{
-  TYPE_HAS_CONVERSION (type) = 1;
-  /* Prepend the marker function.  */
-  OVL_CHAIN (conv_marker) = current_fns;
-  current_fns = conv_marker;
-}
-  else if (!complete_p && !IDENTIFIER_CDTOR_P (DECL_NAME (method)))
+  if (!DECL_CONV_FN_P (method) && !COMPLETE_TYPE_P (type))
 push_class_level_binding (DECL_NAME (method), current_fns);
 
-  if (insert_p)
-{
-  bool reallocated;
+  *slot = current_fns;
 
-  /* We only expect to add few methods in the COMPLETE_P case, so
-	 just make room for one more method in that case.  */
-  if (complete_p)
-	reallocated = vec_safe_reserve_exact (method_vec, 1);
-  else
-	reallocated = vec_safe_reserve (method_vec, 1);
-  if (reallocated)
-	CLASSTYPE_METHOD_VEC (type) = method_vec;
-  if (slot == method_vec->length ())
-	method_vec->quick_push (current_fns);
-  else
-	method_vec->quick_insert (slot, current_fns);
-}
-  else
-/* Replace the current slot.  */
-(*method_vec)[slot] = current_fns;
   return true;
 }
 
Index: decl.c
===
--- decl.c	(revision 251722)
+++ decl.c	(working copy)
@@ -4073,13 +4073,6 @@ cxx_init_decl_processing (void)
   noexcept_deferred_spec = build_tree_list (make_node (DEFERRED_NOEXCEPT),
 	NULL_TREE);
 
-  /* Create the conversion operator marker.  This operator's DECL_NAME
- is in the identifier table, so we can use identifier equality to
- find it.  This has no type and no context, so we can't
- accidentally think it a real function.  */
-  conv_op_marker = build_lang_decl (FUNCTION_DECL, conv_op_identifier,
-NULL_TREE);
-
 #if 0
   record_builtin_type (RID_MAX, NULL, string_type

[C++ PATCH] class scope using decls

2017-09-05 Thread Nathan Sidwell

I found do_class_using_decl confusing because of its old-style early 
declarations and use of read-once local variables.


Committed the attached, which I found easier to understand.

nathan
--
Nathan Sidwell
2017-09-05  Nathan Sidwell  

	* name-lookup.c (do_class_using_decl): Elide read-once temps.
	Move declarations to initializations.

Index: name-lookup.c
===
--- name-lookup.c	(revision 251737)
+++ name-lookup.c	(working copy)
@@ -4560,20 +4560,6 @@ push_class_level_binding (tree name, tre
 tree
 do_class_using_decl (tree scope, tree name)
 {
-  /* The USING_DECL returned by this function.  */
-  tree value;
-  /* The declaration (or declarations) name by this using
- declaration.  NULL if we are in a template and cannot figure out
- what has been named.  */
-  tree decl;
-  /* True if SCOPE is a dependent type.  */
-  bool scope_dependent_p;
-  /* True if SCOPE::NAME is dependent.  */
-  bool name_dependent_p;
-  /* True if any of the bases of CURRENT_CLASS_TYPE are dependent.  */
-  bool bases_dependent_p;
-  tree binfo;
-
   if (name == error_mark_node)
 return NULL_TREE;
 
@@ -4589,6 +4575,7 @@ do_class_using_decl (tree scope, tree na
   error ("%<%T::%D%> names destructor", scope, name);
   return NULL_TREE;
 }
+
   /* Using T::T declares inheriting ctors, even if T is a typedef.  */
   if (MAYBE_CLASS_TYPE_P (scope)
   && (name == TYPE_IDENTIFIER (scope)
@@ -4598,6 +4585,8 @@ do_class_using_decl (tree scope, tree na
   name = ctor_identifier;
   CLASSTYPE_NON_AGGREGATE (current_class_type) = true;
 }
+
+  /* Cannot introduce a constructor name.  */
   if (constructor_name_p (name, current_class_type))
 {
   error ("%<%T::%D%> names constructor in %qT",
@@ -4605,15 +4594,6 @@ do_class_using_decl (tree scope, tree na
   return NULL_TREE;
 }
 
-  scope_dependent_p = dependent_scope_p (scope);
-  name_dependent_p = (scope_dependent_p
-		  || (IDENTIFIER_CONV_OP_P (name)
-			  && dependent_type_p (TREE_TYPE (name;
-
-  bases_dependent_p = any_dependent_bases_p ();
-
-  decl = NULL_TREE;
-
   /* From [namespace.udecl]:
 
A using-declaration used as a member-declaration shall refer to a
@@ -4624,14 +4604,18 @@ do_class_using_decl (tree scope, tree na
  class type. Morover, if SCOPE is dependent, it might match a
  non-dependent base.  */
 
-  if (!scope_dependent_p)
+  tree decl = NULL_TREE;
+  if (!dependent_scope_p (scope))
 {
   base_kind b_kind;
-  binfo = lookup_base (current_class_type, scope, ba_any, &b_kind,
-			   tf_warning_or_error);
+  tree binfo = lookup_base (current_class_type, scope, ba_any, &b_kind,
+tf_warning_or_error);
   if (b_kind < bk_proper_base)
 	{
-	  if (!bases_dependent_p || b_kind == bk_same_type)
+	  /* If there are dependent bases, scope might resolve at
+	 instantiation time, even if it isn't exactly one of the
+	 dependent bases.  */
+	  if (b_kind == bk_same_type || !any_dependent_bases_p ())
 	{
 	  error_not_base_type (scope, current_class_type);
 	  return NULL_TREE;
@@ -4642,7 +4626,8 @@ do_class_using_decl (tree scope, tree na
 	  error ("cannot inherit constructors from indirect base %qT", scope);
 	  return NULL_TREE;
 	}
-  else if (!name_dependent_p)
+  else if (!IDENTIFIER_CONV_OP_P (name)
+	   || !dependent_type_p (TREE_TYPE (name)))
 	{
 	  decl = lookup_member (binfo, name, 0, false, tf_warning_or_error);
 	  if (!decl)
@@ -4651,13 +4636,14 @@ do_class_using_decl (tree scope, tree na
 		 scope);
 	  return NULL_TREE;
 	}
+
 	  /* The binfo from which the functions came does not matter.  */
 	  if (BASELINK_P (decl))
 	decl = BASELINK_FUNCTIONS (decl);
 	}
 }
 
-  value = build_lang_decl (USING_DECL, name, NULL_TREE);
+  tree value = build_lang_decl (USING_DECL, name, NULL_TREE);
   USING_DECL_DECLS (value) = decl;
   USING_DECL_SCOPE (value) = scope;
   DECL_DEPENDENT_P (value) = !decl;

[PATCH] Fix ICE in categorize_decl_for_section with TLS decl (PR middle-end/82095)

2017-09-05 Thread Jakub Jelinek

Hi!

If a DECL_THREAD_LOCAL_P decl has NULL DECL_INITIAL and
-fzero-initialized-in-bss (the default), we ICE starting with
r251602, which changed bss_initializer_p:
+  /* Do not put constants into the .bss section, they belong in a readonly
+ section.  */
+  return (!TREE_READONLY (decl)
+ &&
to:
  (DECL_INITIAL (decl) == NULL
  /* In LTO we have no errors in program; error_mark_node is used
 to mark offlined constructors.  */
  || (DECL_INITIAL (decl) == error_mark_node
  && !in_lto_p)
  || (flag_zero_initialized_in_bss
  && initializer_zerop (DECL_INITIAL (decl
Previously because bss_initializer_p for these returned true, ret was
SECCAT_BSS and therefore we set it to SECCAT_TBSS as intended, but now ret
is not SECCAT_BSS, but as TLS has only tbss and tdata possibilities, we
still want to use tbss.  DECL_INITIAL NULL for a decl means implicit zero
initialization.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-09-05  Jakub Jelinek  

PR middle-end/82095
* varasm.c (categorize_decl_for_section): Use SECCAT_TBSS for TLS vars 
with
NULL DECL_INITIAL.

* gcc.dg/tls/pr82095.c: New test.

--- gcc/varasm.c.jj 2017-09-01 18:43:29.0 +0200
+++ gcc/varasm.c2017-09-04 12:29:10.166564776 +0200
@@ -6562,8 +6562,9 @@ categorize_decl_for_section (const_tree
   /* Note that this would be *just* SECCAT_BSS, except that there's
 no concept of a read-only thread-local-data section.  */
   if (ret == SECCAT_BSS
-  || (flag_zero_initialized_in_bss
-  && initializer_zerop (DECL_INITIAL (decl
+ || DECL_INITIAL (decl) == NULL
+ || (flag_zero_initialized_in_bss
+ && initializer_zerop (DECL_INITIAL (decl
ret = SECCAT_TBSS;
   else
ret = SECCAT_TDATA;
--- gcc/testsuite/gcc.dg/tls/pr82095.c.jj   2017-09-04 12:44:16.650538220 
+0200
+++ gcc/testsuite/gcc.dg/tls/pr82095.c  2017-09-04 12:44:08.0 +0200
@@ -0,0 +1,16 @@
+/* PR middle-end/82095 */
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-tree-ccp" } */
+/* { dg-require-effective-target tls } */
+/* { dg-add-options tls } */
+
+static int b;
+static __thread int c;
+
+void
+foo (void)
+{
+  if (b)
+if (c)
+  b = 1;
+}

Jakub

[PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)

2017-09-05 Thread Jakub Jelinek

Hi!

CCing Andrew, because powerpcspe needs the same thing.

On powerpc with sysv4 -fPIC we emit something like
.LCL0:
.long .LCTOC1-.LCF0
before we start emitting the function, and in the prologue we emit
.LCF0:
and some code.  This fails to assemble if the prologue is emitted in a
different partition from the start of the function, as e.g. the following
testcase, where the start of the function is hot, i.e. in .text section,
but the shrink-wrapped prologue is cold, emitted in .text.unlikely section.
.LCL0 is still emitted in the section the function starts, thus .text, and
there is no relocation for subtraction of two symbols in other sections
(the second - operand has to be in the current section so that a PC-relative
relocation can be used).  This probably never worked, but is now more
severe, as we enable hot/cold partitioning in GCC 8, where it
has been previously only enabled for -fprofile-use.

Fixed thusly, bootstrapped on powerpc64-linux, regtested with
--target_board=unix\{,-m32\}, ok for trunk?

2017-09-05  Jakub Jelinek  

PR target/81979
* config/rs6000/rs6000.c (uses_TOC): Return 2 if
NOTE_INSN_SWITCH_TEXT_SECTIONS is seen before finding load_toc_* insn.
(rs6000_elf_declare_function_name): If uses_TOC returned 2, switch
to the other text partition before emitting LCL label and switch back
after emitting the word after it.

* gcc.dg/pr81979.c: New test.

--- gcc/config/rs6000/rs6000.c.jj   2017-09-04 09:55:28.0 +0200
+++ gcc/config/rs6000/rs6000.c  2017-09-04 16:36:49.033213325 +0200
@@ -25248,12 +25248,15 @@ get_TOC_alias_set (void)
 
 /* This returns nonzero if the current function uses the TOC.  This is
determined by the presence of (use (unspec ... UNSPEC_TOC)), which
-   is generated by the ABI_V4 load_toc_* patterns.  */
+   is generated by the ABI_V4 load_toc_* patterns.
+   Return 2 instead of 1 if the load_toc_* pattern is in the function
+   partition that doesn't start the function.  */
 #if TARGET_ELF
 static int
 uses_TOC (void)
 {
   rtx_insn *insn;
+  int ret = 1;
 
   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
 if (INSN_P (insn))
@@ -25270,10 +25273,14 @@ uses_TOC (void)
  sub = XEXP (sub, 0);
  if (GET_CODE (sub) == UNSPEC
  && XINT (sub, 1) == UNSPEC_TOC)
-   return 1;
+   return ret;
}
}
   }
+else if (crtl->has_bb_partition
+&& NOTE_P (insn)
+&& NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+  ret = 2;
   return 0;
 }
 #endif
@@ -33304,14 +33311,20 @@ rs6000_elf_declare_function_name (FILE *
   return;
 }
 
+  int uses_toc;
   if (DEFAULT_ABI == ABI_V4
   && (TARGET_RELOCATABLE || flag_pic > 1)
   && !TARGET_SECURE_PLT
   && (!constant_pool_empty_p () || crtl->profile)
-  && uses_TOC ())
+  && (uses_toc = uses_TOC ()))
 {
   char buf[256];
 
+  if (uses_toc == 2)
+   {
+ in_cold_section_p = !in_cold_section_p;
+ switch_to_section (current_function_section ());
+   }
   (*targetm.asm_out.internal_label) (file, "LCL", rs6000_pic_labelno);
 
   fprintf (file, "\t.long ");
@@ -33321,6 +4,11 @@ rs6000_elf_declare_function_name (FILE *
   ASM_GENERATE_INTERNAL_LABEL (buf, "LCF", rs6000_pic_labelno);
   assemble_name (file, buf);
   putc ('\n', file);
+  if (uses_toc == 2)
+   {
+ in_cold_section_p = !in_cold_section_p;
+ switch_to_section (current_function_section ());
+   }
 }
 
   ASM_OUTPUT_TYPE_DIRECTIVE (file, name, "function");
--- gcc/testsuite/gcc.dg/pr81979.c.jj   2017-09-04 16:49:08.839334897 +0200
+++ gcc/testsuite/gcc.dg/pr81979.c  2017-09-04 16:48:54.0 +0200
@@ -0,0 +1,32 @@
+/* PR target/81979 */
+/* { dg-do link } */
+/* { dg-options "-O2 -w" } */
+/* { dg-additional-options "-fPIC" { target fpic } } */
+/* { dg-additional-options "-freorder-blocks-and-partition" { target freorder 
} } */
+
+int d;
+
+__attribute__((noinline, noclone)) void
+foo (int x)
+{
+  int c;
+  while (c < 1)
+{
+  int o;
+  for (o = 0; o < 4; ++o)
+   c /= (x != 0) ? 2 : x;
+}
+
+  d = 1;
+  for (;;)
+;
+}
+
+int
+main ()
+{
+  asm volatile ("" : : "r" (&d) : "memory");
+  foo (d);
+  asm volatile ("" : : "r" (&d) : "memory");
+  return 0;
+}

Jakub

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Wilco Dijkstra

Bernd Edlinger wrote:
> No, the split condition does not begin with "&& TARGET_32BIT...".
> Therefore the split is enabled in TARGET_NEON after reload_completed.
> And it is invoked from adddi3_neon for all alternatives without vfp
> registers:

Hmm that's a huge mess. I'd argue that any inst_and_split should only split
it's own instruction, never other instructions (especially if they are from
different md files, which is extremely confusing). Otherwise we should use
a separate split and explicitly list which instructions it splits.

So then the next question is whether the neon_adddi3 still needs the
arm_adddi3 splitter in some odd corner cases?

> > Also there are more cases, a quick grep suggests *anddi_notdi_di has the 
> > same issue. 

> Yes, that pattern can be cleaned up in a follow-up patch.

And there are a lot more instructions that need the same treatment and split
early (probably best at expand time). I noticed none of the zero/sign extends
split before regalloc for example.

> Note this splitter is invoked from bicdi3_neon as well.
> However I think anddi_notdi_di should be safe as long as it is enabled
> after reload_completed (which is probably a bug).

Since we should be splitting and/bic early now I don't think you can get 
anddi_notdi
anymore. So it could be removed completely assuming Neon already does the right
thing.

It looks like we need to do a full pass over all DI mode instructions and clean 
up
all the mess.

Wilco

[committed] Fix OpenMP simd expansion ICE (PR middle-end/81768)

2017-09-05 Thread Jakub Jelinek

Hi!

On the following testcase when trying to gimplify a COND_EXPR that we expect
to stay a COND_EXPR in GIMPLE, the gimplifier actually produces code with
branches and labels, which is of course invalid inside of a bb.  The reason
for that is that a former global VAR_DECL is being replaced through
DECL_VALUE_EXPR with a target mapping thereof and so is considered as
potentially trapping.  Fixed by pre-gimplifying the operand. 
Bootstrapped/regtested on x86_64-linux and i686-linux, committed so far to
trunk.

2017-09-05  Jakub Jelinek  

PR middle-end/81768
* omp-expand.c (expand_omp_simd): Force second operands of COND_EXPR
into gimple val before gimplification fo the COND_EXPR.

* gcc.dg/gomp/pr81768-1.c: New test.

--- gcc/omp-expand.c.jj 2017-09-01 09:26:27.0 +0200
+++ gcc/omp-expand.c2017-09-05 13:13:49.303159003 +0200
@@ -4730,24 +4730,28 @@ expand_omp_simd (struct omp_region *regi
  tree itype2 = TREE_TYPE (fd->loops[i - 1].v);
  if (POINTER_TYPE_P (itype2))
itype2 = signed_type_for (itype2);
+ t = fold_convert (itype2, fd->loops[i - 1].step);
+ t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+   GSI_SAME_STMT);
  t = build3 (COND_EXPR, itype2,
  build2 (fd->loops[i].cond_code, boolean_type_node,
  fd->loops[i].v,
  fold_convert (itype, fd->loops[i].n2)),
- build_int_cst (itype2, 0),
- fold_convert (itype2, fd->loops[i - 1].step));
+ build_int_cst (itype2, 0), t);
  if (POINTER_TYPE_P (TREE_TYPE (fd->loops[i - 1].v)))
t = fold_build_pointer_plus (fd->loops[i - 1].v, t);
  else
t = fold_build2 (PLUS_EXPR, itype2, fd->loops[i - 1].v, t);
  expand_omp_build_assign (&gsi, fd->loops[i - 1].v, t);
 
+ t = fold_convert (itype, fd->loops[i].n1);
+ t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+   GSI_SAME_STMT);
  t = build3 (COND_EXPR, itype,
  build2 (fd->loops[i].cond_code, boolean_type_node,
  fd->loops[i].v,
  fold_convert (itype, fd->loops[i].n2)),
- fd->loops[i].v,
- fold_convert (itype, fd->loops[i].n1));
+ fd->loops[i].v, t);
  expand_omp_build_assign (&gsi, fd->loops[i].v, t);
}
}
--- gcc/testsuite/gcc.dg/gomp/pr81768-1.c.jj2017-09-05 13:59:57.821977911 
+0200
+++ gcc/testsuite/gcc.dg/gomp/pr81768-1.c   2017-09-05 13:50:24.0 
+0200
@@ -0,0 +1,15 @@
+/* PR middle-end/81768 */
+/* { dg-do compile } */
+
+float b[10][15][10];
+
+void
+foo (void)
+{
+  float *i;
+#pragma omp target parallel for simd schedule(static, 32) collapse(3)
+  for (i = &b[0][0][0]; i < &b[0][0][10]; i++)
+for (float *j = &b[0][15][0]; j > &b[0][0][0]; j -= 10)
+  for (float *k = &b[0][0][10]; k > &b[0][0][0]; --k)
+   b[i - &b[0][0][0]][(j - &b[0][0][0]) / 10 - 1][(k - &b[0][0][0]) - 1] 
-= 3.5;
+}

Jakub

[committed] Fix OpenMP lowering related ICE (PR middle-end/81768)

2017-09-05 Thread Jakub Jelinek

Hi!

On the following testcase we ICE because during OpenMP lowering we failed
to recompute ADDR_EXPR invariants after a VAR_DECL has been replaced by a
target mapping.  Normally this is done in lower_omp_regimplify_p, but
the OMP_FOR initial/final trees don't go through that.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed so far to trunk.

2017-09-05  Jakub Jelinek  

PR middle-end/81768
* omp-low.c (lower_omp_for): Recompute tree invariant if
gimple_omp_for_initial/final is ADDR_EXPR.

* gcc.dg/gomp/pr81768-2.c: New test.

--- gcc/omp-low.c.jj2017-09-01 09:25:46.0 +0200
+++ gcc/omp-low.c   2017-09-05 15:07:10.086125109 +0200
@@ -6923,10 +6923,14 @@ lower_omp_for (gimple_stmt_iterator *gsi
   rhs_p = gimple_omp_for_initial_ptr (stmt, i);
   if (!is_gimple_min_invariant (*rhs_p))
*rhs_p = get_formal_tmp_var (*rhs_p, &body);
+  else if (TREE_CODE (*rhs_p) == ADDR_EXPR)
+   recompute_tree_invariant_for_addr_expr (*rhs_p);
 
   rhs_p = gimple_omp_for_final_ptr (stmt, i);
   if (!is_gimple_min_invariant (*rhs_p))
*rhs_p = get_formal_tmp_var (*rhs_p, &body);
+  else if (TREE_CODE (*rhs_p) == ADDR_EXPR)
+   recompute_tree_invariant_for_addr_expr (*rhs_p);
 
   rhs_p = &TREE_OPERAND (gimple_omp_for_incr (stmt, i), 1);
   if (!is_gimple_min_invariant (*rhs_p))
--- gcc/testsuite/gcc.dg/gomp/pr81768-2.c.jj2017-09-05 15:08:15.989343325 
+0200
+++ gcc/testsuite/gcc.dg/gomp/pr81768-2.c   2017-09-05 14:25:43.0 
+0200
@@ -0,0 +1,15 @@
+/* PR middle-end/81768 */
+/* { dg-do compile } */
+
+float b[10][15][10];
+
+void
+foo (void)
+{
+  float *i;
+#pragma omp target parallel for schedule(static, 32) collapse(3)
+  for (i = &b[0][0][0]; i < &b[0][0][10]; i++)
+for (float *j = &b[0][15][0]; j > &b[0][0][0]; j -= 10)
+  for (float *k = &b[0][0][10]; k > &b[0][0][0]; --k)
+b[i - &b[0][0][0]][(j - &b[0][0][0]) / 10 - 1][(k - &b[0][0][0]) - 1] 
-= 3.5;
+}

Jakub

Re: Add support to trace comparison instructions and switch statements

2017-09-05 Thread Jakub Jelinek

On Tue, Sep 05, 2017 at 09:03:52PM +0800, 吴潍浠(此彼) wrote:
> Attachment is my updated path.
> The implementation of parse_sanitizer_options is not elegance enough. Mixing 
> handling flags of fsanitize is easy to make mistakes.

To avoid too many further iterations, I took the liberty to tweak your
patch.  From https://clang.llvm.org/docs/SanitizerCoverage.html
I've noticed that since 2017-08-11 clang/llvm wants to emit
__sanitizer_cov_trace_const_cmpN with the first argument a constant
if one of the comparison operands is a constant, so the patch implements
that too.
I wonder about the __sanitizer_cov_trace_cmp{f,d} entry-points, because
I can't find them on that page nor in llvm sources.
I've also added handling of COND_EXPRs and added some documentation.

I've bootstrapped/regtested the patch on x86_64-linux and i686-linux.
Can you test it on whatever you want to use the patch for?

2017-09-05  Wish Wu  
Jakub Jelinek  

* asan.c (initialize_sanitizer_builtins): Add
BT_FN_VOID_UINT8_UINT8, BT_FN_VOID_UINT16_UINT16,
BT_FN_VOID_UINT32_UINT32, BT_FN_VOID_UINT64_UINT64,
BT_FN_VOID_FLOAT_FLOAT, BT_FN_VOID_DOUBLE_DOUBLE and
BT_FN_VOID_UINT64_PTR variables.
* builtin-types.def (BT_FN_VOID_UINT8_UINT8): New fn type.
(BT_FN_VOID_UINT16_UINT16): Likewise.
(BT_FN_VOID_UINT32_UINT32): Likewise.
(BT_FN_VOID_FLOAT_FLOAT): Likewise.
(BT_FN_VOID_DOUBLE_DOUBLE): Likewise.
(BT_FN_VOID_UINT64_PTR): Likewise.
* common.opt (flag_sanitize_coverage): New variable.
(fsanitize-coverage=trace-pc): Remove.
(fsanitize-coverage=): Add.
* flag-types.h (enum sanitize_coverage_code): New enum.
* fold-const.c (fold_range_test): Disable non-short-circuit
optimization if flag_sanitize_coverage.
(fold_truth_andor): Likewise.
* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
* opts.c (COVERAGE_SANITIZER_OPT): Define.
(coverage_sanitizer_opts): New array.
(get_closest_sanitizer_option): Add OPTS argument, handle also
OPT_fsanitize_coverage_.
(parse_sanitizer_options): Adjusted to also handle
OPT_fsanitize_coverage_.
(common_handle_option): Add OPT_fsanitize_coverage_.
* sancov.c (instrument_comparison, instrument_switch): New function.
(sancov_pass): Add trace-cmp support.
* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1,
BUILT_IN_SANITIZER_COV_TRACE_CMP2, BUILT_IN_SANITIZER_COV_TRACE_CMP4,
BUILT_IN_SANITIZER_COV_TRACE_CMP8,
BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1,
BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2,
BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4,
BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8,
BUILT_IN_SANITIZER_COV_TRACE_CMPF, BUILT_IN_SANITIZER_COV_TRACE_CMPD,
BUILT_IN_SANITIZER_COV_TRACE_SWITCH): New builtins.
* doc/invoke.texi: Document -fsanitize-coverage=trace-cmp.

* gcc.dg/sancov/cmp0.c: New test.

--- gcc/asan.c.jj   2017-09-04 09:55:26.600687479 +0200
+++ gcc/asan.c  2017-09-05 15:39:32.452612728 +0200
@@ -2709,6 +2709,29 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_SIZE_CONST_PTR_INT
 = build_function_type_list (size_type_node, const_ptr_type_node,
integer_type_node, NULL_TREE);
+
+  tree BT_FN_VOID_UINT8_UINT8
+= build_function_type_list (void_type_node, unsigned_char_type_node,
+   unsigned_char_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT16_UINT16
+= build_function_type_list (void_type_node, uint16_type_node,
+   uint16_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT32_UINT32
+= build_function_type_list (void_type_node, uint32_type_node,
+   uint32_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_UINT64
+= build_function_type_list (void_type_node, uint64_type_node,
+   uint64_type_node, NULL_TREE);
+  tree BT_FN_VOID_FLOAT_FLOAT
+= build_function_type_list (void_type_node, float_type_node,
+   float_type_node, NULL_TREE);
+  tree BT_FN_VOID_DOUBLE_DOUBLE
+= build_function_type_list (void_type_node, double_type_node,
+   double_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_PTR
+= build_function_type_list (void_type_node, uint64_type_node,
+   ptr_type_node, NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
   tree BT_FN_IX_CONST_VPTR_INT[5];
   tree BT_FN_IX_VPTR_IX_INT[5];
--- gcc/builtin-types.def.jj2017-06-28 09:05:45.249396972 +0200
+++ gcc/builtin-types.def   2017-09-05 15:39:32.453612716 +0200
@@ -338,8 +338,20 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRMODE_
 BT_VOID, BT_PTRMODE, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 BT_VOID, BT_PTR, BT_PTRMODE)
+DEF_FUNCTION_TYPE_2 (BT

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 06/08 - V3

2017-09-05 Thread Segher Boessenkool

Hi!

On Sat, Sep 02, 2017 at 12:31:16AM -0600, Jeff Law wrote:
> On 08/29/2017 05:14 PM, Segher Boessenkool wrote:
> > Actually, everywhere it is used it has a Pmode == SImode wart before
> > it, to emit the right update_stack insn...  So fold that into this
> > function, name it rs6000_emit_allocate_stack_1?
> Agreed.  That seems like a nice cleanup.  Look at the call from
> rs6000_emit_allocate_stack.  Instead of a Pmode == SImode, it tests
> TARGET_32BIT instead.  But I think that one is still safe to convert
> based on how we set Pmode in rs6000_option_override_internal.

TARGET_32BIT is exactly the same as Pmode == SImode.  Sometimes the
latter is more expressive (if it describes more directly what is being
tested).

> > You don't describe what the return value here is (but neither does
> > rs6000_emit_allocate_stack); it is the single instruction that actually
> > changes the stack pointer?  For the split stack support?  (Does the stack
> > clash code actually work with split stack, has that been tested?)
> As far as I was able to ascertain (and essentially duplicate) we return
> the insn that allocates the stack, but only if the allocation was
> handled a single insn.  Otherwise we return NULL.
> 
> But that was determined largely by guesswork.  It interacts with split
> stack support, but I'm not entirely what it's meant to do.  If you have
> insights here, I'll happily add comments to the routines -- when I was
> poking at this stuff I was rather distressed to not have any real
> guidance on what the return value should be.
> 
> I have not tested stack clash and split stacks. ISTM they ought to be
> able to work together, but I haven't though real deep about it.

(To test, I think testing Go on 64-bit is the best you can do).

rs6000_emit_prologue has a comment:

  /* sp_adjust is the stack adjusting instruction, tracked so that the
 insn setting up the split-stack arg pointer can be emitted just
 prior to it, when r12 is not used here for other purposes.  */

> > Maybe we should keep track of that sp_adjust insn in a global, or in
> > machine_function, or even in the stack info struct.
> Maybe.  It's certainly not obvious to me what is does and how it does
> it.  THe physical distance between where it gets set and modified and
> its use point doesn't help.

Yes, and it complicates control flow.

> > I'm not sure r0 is always free to use here, it's not obvious.  You can't
> > use the START_USE etc. macros to check, those only work inside
> > rs6000_emit_prologue.  I'll figure something out.
> Should I just leave it as-is until you have a suggestion for the right
> way to find a temporary at this point?

Sure, that's fine.

> > The way more common case is no probes at all, but that is handled in the
> > caller; maybe move it to this function instead?
> The comment is correct for what's handled by that code.  As you note the
> most common case is no probing needed and in those cases we never call
> rs6000_emit_probe_stack_range_stack_clash.
> 
> The code was written that way so that the common case (no explicit
> probes) would just fall into the old code.  That minimized the
> possibility that I mucked anything up :-)

I think it's cleaner if you don't handle that check externally, but
you decide.

> > It seems the only thing preventing the probes from being elided is that
> > GCC does not realise the probe stores are not actually needed?  I guess
> > that should be enough, but maybe there should be a test for this.
> I'm not sure what you're asking here.  In the code above we're
> allocating a multiple of PROBE_INTERVAL bytes.  We must probe every
> PROBE_INTERVAL bytes.

Yes, but what prevents later passes from getting rid of the stores?
Maybe some volatile is needed?  The stack pointer updates cannot easily
be removed by anything, but relying on that to also protect the stores
is a bit fragile.

> >> +  {
> >> +emit_move_insn (end_addr, GEN_INT (-rounded_size));
> >> +insn = emit_insn (gen_add3_insn (end_addr, end_addr,
> >> + stack_pointer_rtx));
> >> +add_reg_note (insn, REG_FRAME_RELATED_EXPR,
> >> +  gen_rtx_SET (end_addr,
> >> +   gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> >> + rs)));
> >> +  }
> >> +  RTX_FRAME_RELATED_P (insn) = 1;
> > 
> > What are these notes for?  Not totally obvious...  could use a comment :-)
> At a high level we're describing to the CFI machinery how to compute the
> CFA.
> 
> ie we have
> 
> (set (endreg) (-rounded_size))
> (set (endreg) (endreg) (stack_pointer_rtx))
> 
> But only the second insn actually participates in the CFA computation
> because it's the only one marked as RTX_FRAME_RELATED.  But the CFI
> machinery isn't going to be able to figure out what that insn does.  So
> we describe it with a note.  Think of it as a REG_EQUAL note for the CFI
> machinery.

Is end_addr used for calculating the CFA in any way?

Re: [Ping^2][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c

2017-09-05 Thread Pedro Alves

On 09/05/2017 09:05 PM, Jiong Wang wrote:
> 2017-08-22 9:18 GMT+01:00 Jiong Wang :
>> On 10/08/17 17:39, Jiong Wang wrote:
>>>
>>> Hi,
>>>
>>>   A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for
>>> ARMv8.3-A
>>> return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA
>>> vendor
>>> extension space.
>>>
>>>   This patch adds necessary code to make it available to external, the GDB
>>> patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is
>>> intended
>>> to use it.
>>>
>>>   A new DW_CFA_DUP for it is added in dwarf2.def.  The use of DW_CFA_DUP
>>> is to
>>> avoid duplicated case value issue when included in libiberty/dwarfnames.
>>>
>>>   Native x86 builds OK to make sure no macro expanding errors.
>>>
>>>   OK for trunk?
>>>
>>> 2017-08-10  Jiong Wang  
>>>
>>> include/
>>> * dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
>>> * dwarf2.h (DW_CFA_DUP): New define.
>>>
>>> libiberty/
>>> * dwarfnames.c (DW_CFA_DUP): New define.
>>>

I'd like to add a +1 vote for this patch.  I was confused more than once
in the iterations of the pending gdb patch that were adding references
to "DW_CFA_GNU_window_save" in Aarch64 code that has absolutely nothing
to do with SPARC register windows, and asked Jiong whether we could add
an Aarch64-specific name for the constant.

Thanks,
Pedro Alves

Re: [PATCH, gcc-7-branch] Backport PR80038

2017-09-05 Thread Xi Ruoyao

On 2017-08-24 20:17 +0800, Xi Ruoyao wrote:
> On 2017-08-22 10:17 +0200, Richard Biener wrote:
> > 
> > Ok for the gcc 7 branch.
> > 
> 
> Well, I think I should say I don't have SVN write access...

Still not installed.  Make a rediff.

We don't have a Cilk maintainer now...  Someone please install the patch
for gcc 7 branch.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian UniversityFrom cfa279f562759ced4274c0426da9f5d791e146b5 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Mon, 21 Aug 2017 09:41:59 +0800
Subject: [PATCH] Destroy temps for _Cilk_spawn calling in the child (PR
 c++/80038)

Backport r247446 and r247508 from trunk.

gcc/ChangeLog:

2017-08-21  Xi Ruoyao  

	PR c++/80038
	* cilk_common.c (expand_builtin_cilk_detach): Move pedigree
	operations here.
	* gimplify.c (gimplify_cilk_detach): New function.
	(gimplify_call_expr, gimplify_modify_expr): Call it as needed.
	* tree-core.h: Document EXPR_CILK_SPAWN.
	* tree.h (EXPR_CILK_SPAWN): Define.

gcc/c-family/ChangeLog:

2017-08-21  Xi Ruoyao 

	PR c++/80038
	* c-common.h (cilk_gimplify_call_params_in_spawned_fn): Remove
	prototype.
	(cilk_install_body_pedigree_operations): Likewise.
	* cilk.c (cilk_set_spawn_marker): Mark functions that should be
	detatched.
	(cilk_gimplify_call_params_in_spawned_fn): Remove.
	(cilk_install_body_pedigree_operations): Likewise.
	(gimplify_cilk_spawn): Add EXPR_STMT and CLEANUP_POINT_EXPR
	unwrapping.

gcc/c/ChangeLog:

2017-08-21  Xi Ruoyao 

	PR c++/80038
	* c-gimplify.c (c_gimplify_expr): Remove calls to
	cilk_gimplify_call_params_in_spawned_fn.

gcc/cp/ChangeLog:

2017-08-21  Xi Ruoyao 

	PR c++/80038
	* cp-cilkplus.c (cilk_install_body_with_frame_cleanup): Don't
	add pedigree operation and detach call here.
	* cp-gimplify.c (cp_gimplify_expr): Remove the calls to
	cilk_cp_gimplify_call_params_in_spawned_fn.
	(cilk_cp_gimplify_call_params_in_spawned_fn): Remove function.
	* semantics.c (simplify_aggr_init_expr): Copy EXPR_CILK_SPAWN.

gcc/lto/ChangeLog:

2017-08-21  Xi Ruoyao 

	PR c++/80038
	* lto-lang.c (lto_init): Set in_lto_p earlier.

gcc/testsuite/ChangeLog:

2017-08-21  Xi Ruoyao 

	PR c++/80038
	* g++.dg/cilk-plus/CK/pr80038.cc: New test.
---
 gcc/c-family/c-common.h  |   2 -
 gcc/c-family/c-gimplify.c|  10 +--
 gcc/c-family/cilk.c  | 102 +++
 gcc/c/c-typeck.c |   8 +--
 gcc/cilk-common.c|  49 +
 gcc/cp/cp-cilkplus.c |   6 +-
 gcc/cp/cp-gimplify.c |  40 ++-
 gcc/cp/semantics.c   |   2 +
 gcc/gimplify.c   |  21 ++
 gcc/lto/lto-lang.c   |   6 +-
 gcc/testsuite/g++.dg/cilk-plus/CK/pr80038.cc |  47 
 gcc/tree-core.h  |   4 ++
 gcc/tree.h   |   6 ++
 13 files changed, 150 insertions(+), 153 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cilk-plus/CK/pr80038.cc

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b933342..138a0a6 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1463,7 +1463,6 @@ extern bool is_cilkplus_vector_p (tree);
 extern tree insert_cilk_frame (tree);
 extern void cilk_init_builtins (void);
 extern int gimplify_cilk_spawn (tree *);
-extern void cilk_gimplify_call_params_in_spawned_fn (tree *, gimple_seq *);
 extern void cilk_install_body_with_frame_cleanup (tree, tree, void *);
 extern bool cilk_detect_spawn_and_unwrap (tree *);
 extern bool cilk_set_spawn_marker (location_t, tree);
@@ -1471,7 +1470,6 @@ extern tree build_cilk_sync (void);
 extern tree build_cilk_spawn (location_t, tree);
 extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
-extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
 extern bool contains_cilk_spawn_stmt (tree);
 extern tree cilk_for_number_of_iterations (tree);
diff --git a/gcc/c-family/c-gimplify.c b/gcc/c-family/c-gimplify.c
index 57edb41..1ae75d2 100644
--- a/gcc/c-family/c-gimplify.c
+++ b/gcc/c-family/c-gimplify.c
@@ -280,10 +280,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED,
 		 && cilk_detect_spawn_and_unwrap (expr_p));
 
   if (!seen_error ())
-	{
-	  cilk_gimplify_call_params_in_spawned_fn (expr_p, pre_p);
-	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
-	}
+return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
   return GS_ERROR;
 
 case MODIFY_EXPR:
@@ -295,10 +292,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED,
 	 original expression (MODIFY/INIT/CALL_EXPR) is processes as
 	 it is supposed to be.  */
 	  && !seen_error ())
-	{
-	  cilk_gimplify_call_params_in_spawned_fn (expr_p, pre_p);
-	  return (enum gimplify_status) gimplify_cilk

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2017-09-05 Thread Jeff Law

On 09/05/2017 11:26 AM, Jeff Law wrote:
> On 09/05/2017 12:38 AM, Christophe Lyon wrote:
>> Hi Jeff,
>>
>>
>> On 3 September 2017 at 16:44, Jeff Law  wrote:
>>> On 01/13/2016 05:30 AM, Richard Biener wrote:
 On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law  wrote:
> On 01/12/2016 08:11 AM, Richard Biener wrote:
>>
>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law  wrote:
>>>
>>> On 01/11/2016 03:32 AM, Richard Biener wrote:
>>>

 Yeah, reassoc is largely about canonicalization.

> Plus doing it in TER is almost certainly more complex than getting it
> right
> in reassoc to begin with.



 I guess canonicalizing differently is ok but you'll still create
 ((a & b) & 1) & c then if you only change the above place.
>>>
>>>
>>> What's best for that expression would depend on factors like whether or
>>> not
>>> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
>>> parallelism
>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose
>>> the
>>> bit test.
>>>
>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as
>>> there's
>>> no ILP or chance of creating a bit test.  My patch shuffles things
>>> around,
>>> but still doesn't expose the ILP or bit test in the 4 operand case.
>>> Based
>>> on the comments in reassoc, it didn't seem like the author thought
>>> anything
>>> beyond the 3-operand case was worth handling. So my patch just handles
>>> the
>>> 3-operand case.
>>>
>>>
>>>

 So I'm not sure what pattern the backend is looking for?
>>>
>>>
>>> It just wants the constant last in the sequence.  That exposes bit 
>>> clear,
>>> set, flip, test, etc idioms.
>>
>>
>> But those don't feed another bit operation, right?  Thus we'd like to see
>> ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the 
>> instructions
>> are designed to feed conditionals (aka CC consuming ops)?
>
> At the gimple level they could feed a conditional, or be part of a series 
> of
> ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
> level they'll feed CC consumers and bit manipulations of pseudos or 
> memory.
>
> For the 3-op case, we always want the constant last.  For the 4-op case 
> it's
> less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 
> 1)
> & c.

 Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
 to special-case bitwise ops.  The "real" fix to the sorting heuristic 
 would be
 to sort constants at the opposite end.

 That might be too invasive right now but there is another "convenient" 
 place:

   /* If the operand vector is now empty, all operands were
  consumed by the __builtin_powi optimization.  */
 ...
   else
 {
   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
   int ops_num = ops.length ();
   int width = get_reassociation_width (ops_num, rhs_code, 
 mode);
   tree new_lhs = lhs;

   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file,
  "Width = %d was chosen for
 reassociation\n", width);

 at this point you can check rhs_code and move the (single) constant
 entry in ops (if there is any constant entry) from .last () to the 
 beginning.

 That'll get the 4 operand case correct as well and properly models
 "constant last" for the specified operation kind.
>>> Resurrecting an old thread...  Just now getting around to flushing this
>>> out of the queue.
>>>
>>> To recap, given something like (x & y) & C reassociation will turn that
>>> into (x & C) & y.  It's functionally equivalent, but it will inhibit
>>> generation of bit test instructions.
>>>
>>> I originally hacked up swap_ops_for_binary_stmt.  You requested that
>>> change be made in reassociate_bb so that it would apply to cases where
>>> there are more than 3 args.
>>>
>>> So that's what this patch does.   OK for the trunk now?
>>>
>>> Bootstrapped and regression tested on x86_64.  Also tested the new
>>> testcase on m68k.
>>>
>>>
>>> commit c10ae0339674c27c89a1fa1904217a55bf530cb3
>>> Author: Jeff Law 
>>> Date:   Sun Sep 3 10:42:30 2017 -0400
>>>
>>> 2017-09-03  Jeff Law  
>>>
>>> PR tree-optimization/64910
>>> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>>> swap the first and last operand if the last is a constant.
>>>
>>> PR tree-optimization/64910
>>> * gcc.dg/tree-ssa/pr64910-2.c: New test.
>>>
>>> diff --git a/gcc/ChangeL

Re: [PATCH] Fix ICE in categorize_decl_for_section with TLS decl (PR middle-end/82095)

2017-09-05 Thread Richard Biener

On September 5, 2017 11:16:53 PM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>If a DECL_THREAD_LOCAL_P decl has NULL DECL_INITIAL and
>-fzero-initialized-in-bss (the default), we ICE starting with
>r251602, which changed bss_initializer_p:
>+  /* Do not put constants into the .bss section, they belong in a
>readonly
>+ section.  */
>+  return (!TREE_READONLY (decl)
>+&&
>to:
>  (DECL_INITIAL (decl) == NULL
>/* In LTO we have no errors in program; error_mark_node is used
> to mark offlined constructors.  */
>  || (DECL_INITIAL (decl) == error_mark_node
>  && !in_lto_p)
>  || (flag_zero_initialized_in_bss
>  && initializer_zerop (DECL_INITIAL (decl
>Previously because bss_initializer_p for these returned true, ret was
>SECCAT_BSS and therefore we set it to SECCAT_TBSS as intended, but now
>ret
>is not SECCAT_BSS, but as TLS has only tbss and tdata possibilities, we
>still want to use tbss.  DECL_INITIAL NULL for a decl means implicit
>zero
>initialization.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>2017-09-05  Jakub Jelinek  
>
>   PR middle-end/82095
>   * varasm.c (categorize_decl_for_section): Use SECCAT_TBSS for TLS vars
>with
>   NULL DECL_INITIAL.
>
>   * gcc.dg/tls/pr82095.c: New test.
>
>--- gcc/varasm.c.jj2017-09-01 18:43:29.0 +0200
>+++ gcc/varasm.c   2017-09-04 12:29:10.166564776 +0200
>@@ -6562,8 +6562,9 @@ categorize_decl_for_section (const_tree
>  /* Note that this would be *just* SECCAT_BSS, except that there's
>no concept of a read-only thread-local-data section.  */
>   if (ret == SECCAT_BSS
>- || (flag_zero_initialized_in_bss
>- && initializer_zerop (DECL_INITIAL (decl
>+|| DECL_INITIAL (decl) == NULL
>+|| (flag_zero_initialized_in_bss
>+&& initializer_zerop (DECL_INITIAL (decl
>   ret = SECCAT_TBSS;
>   else
>   ret = SECCAT_TDATA;
>--- gcc/testsuite/gcc.dg/tls/pr82095.c.jj  2017-09-04 12:44:16.650538220
>+0200
>+++ gcc/testsuite/gcc.dg/tls/pr82095.c 2017-09-04 12:44:08.0
>+0200
>@@ -0,0 +1,16 @@
>+/* PR middle-end/82095 */
>+/* { dg-do compile } */
>+/* { dg-options "-Og -fno-tree-ccp" } */
>+/* { dg-require-effective-target tls } */
>+/* { dg-add-options tls } */
>+
>+static int b;
>+static __thread int c;
>+
>+void
>+foo (void)
>+{
>+  if (b)
>+if (c)
>+  b = 1;
>+}
>
>   Jakub

Re: [PATCH], Enable -mfloat128 by default on PowerPC VSX systems

2017-09-05 Thread Michael Meissner

Here is a respin of the patch to enable -mfloat128 on PowerPC Linux systems now
that the libquadmath patch has been applied.  I rebased the patches against the
top of the trunk on Tuesday (subversion id 251609).

I tweaked the documentation a bit based on your comments.

I built the patch on the following systems.  There are no regressions, and the
tests float128-type-{1,2}.c now pass (previously they had regressed due to
other float128 changes).

*   Power7, bootstrap, big endian, --with-cpu=power7
*   Power7, bootstrap, big endian, --with-cpu=power5
*   Power8, bootstrap, little endian, --with-cpu=power8
*   Power9 prototype bootstrap, little endian, --with-cpu=power9

Can I check these patches into the trunk?

[gcc]
2017-09-06  Michael Meissner  

* config/rs6000/rs6000-cpus.def (OTHER_VSX_VECTOR_MASKS): Delete
OPTION_MASK_FLOAT128_KEYWORD.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Delete
support for the -mfloat128-type option, and make -mfloat128
default on PowerPC Linux systems.  Define or undefine
__FLOAT128__ and  __FLOAT128_HARDWARE__ for the current options.
Define __float128 to be __ieee128 if IEEE 128-bit support is
enabled, or undefine it.
(rs6000_cpu_cpp_builtins): Delete defining __FLOAT128__ here.
Delete defining __FLOAT128_TYPE__.
* config/rs6000/rs6000.opt (x_TARGET_FLOAT128_TYPE): Delete the
-mfloat128-type option and make -mfloat128 default on PowerPC
Linux systems.
(TARGET_FLOAT128_TYPE): Likewise.
(-mfloat128-type): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Delete the -mfloat128-type option and make -mfloat128 default on
PowerPC Linux systems.  Always use __ieee128 to be the keyword for
the IEEE 128-bit type, and map __float128 to __ieee128 if IEEE
128-bit floating point is enabled.  Change tests from using
-mfloat128-type to -mfloat128.
(rs6000_mangle_type): Use the correct mangling for the __float128
type even if normal long double is restricted to 64-bits.
(floatn_mode): Enable the _Float128 type by default on VSX Linux
systems.
* config/rs6000/rs6000.h (MASK_FLOAT128_TYPE): Delete.
(MASK_FLOAT128_KEYWORD): Define new shortcut macro.
(RS6000BTM_FLOAT128): Define in terms of -mfloat128, not
-mfloat128-type.
* doc/invoke.texi (RS/6000 and PowerPC Options): Update
documentation for -mfloat128.

[gcc/testsuite]
2017-09-06  Michael Meissner  

* gcc.target/powerpc/float128-1.c: Update options to know that
-mfloat128 is now on by default on PowerPC VSX systems.  Remove
-static-libgcc option which is no longer needed.  Use -mvsx or
-mpower9-vector to enable VSX or hardware IEEE support, rather
than specifying a particular CPU.
* gcc.target/powerpc/float128-2.c: Likewise.
* gcc.target/powerpc/float128-cmp.c: Likewise.
* gcc.target/powerpc/float128-complex-1.c: Likewise.
* gcc.target/powerpc/float128-complex-2.c: Likewise.
* gcc.target/powerpc/float128-hw.c: Likewise.
* gcc.target/powerpc/float128-mix.c: Likewise.
* gcc.target/powerpc/float128-type-1.c: Likewise.
* gcc.target/powerpc/float128-type-2.c: Likewise.
* gcc.target/powerpc/float128-3.c: New test.
* gcc.target/powerpc/float128-4.c: Likewise.
* gcc.target/powerpc/float128-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 251721)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -86,7 +86,6 @@
 #define OTHER_VSX_VECTOR_MASKS (OTHER_P8_VECTOR_MASKS  \
 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
 | OPTION_MASK_FLOAT128_KEYWORD \
-| OPTION_MASK_FLOAT128_TYPE\
 | OPTION_MASK_P8_VECTOR)
 
 #define POWERPC_7400_MASK  (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
@@ -112,7 +111,6 @@
 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
 | OPTION_MASK_FLOAT128_HW  \
 | OPTION_MASK_FLOAT128_KEYWORD \
-| OPTION_MASK_FLOAT128_TYPE\
 | OPTION_MASK_FPRND\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
Index: gcc/config/rs6000/rs6000-c.c

98 matches

Mail list logo