date:20160229


On 02/28/2016 07:34 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

  when run in repos other than gcc mklog fails to find ChangeLog files
because it looks for $0/../$dir/ChangeLog, but of course if the diff is
for a project other than gcc that might not exist.  It should be fine to
also look for $cwd/$dir/ChangeLog, and use that if we find it.  This
means that for example in binutils-gdb.git you can do git commit,
and then in your editor read git diff HEAD~ | mklog - to generate a
template ChangeLog for that commit.

I've tested mklog still generates the write ChangeLog entries for gcc and 
binutils repos, ok?

Trev

contrib/ChangeLog:

2016-02-28  Trevor Saunders  

* mklog: Look for the ChangeLog file in $cwd.

OK.
jeff

hf not implying hf (was: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c)

2016-02-29 Thread Hans-Peter Nilsson

> From: Alan Lawrence 
> Date: Tue, 19 Jan 2016 13:22:13 +

(Regarding some incidentally failing tests)

> Hmm, I still see these passing, both natively on arm-none-linux-gnueabihf and 
> with a cross-build. hf implies --with-float=hard, right?

(Since you mention it...)

Oddly, it doesn't; you have to pass --with-float=hard too for
--target=arm-none-linux-gnueabihf to actually default to "hf".

IMHO this should obviously change but maybe it's too late?

And, supposedly all packagers know this wart by now and new ones
re-discover it soon enough...

brgds, H-P

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-29 Thread Segher Boessenkool

On Mon, Feb 29, 2016 at 11:11:11AM -0800, Richard Henderson wrote:
> >>> If you check for fixed registers as well here, does that work for you?
> >>
> >> Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
> >> occur via arithmetic on locally allocated arrays.
> > 
> > Where are these canonicalization rules described?
> 
> swap_commutative_operands_p?

That function never swaps reg+reg, or I don't see it.

> > It is stage 4.  This rs6000 change has almost 100% chance of introducing
> > regressions.
> 
> Really?  Nearly 100%?
> 
> Ignoring the change to subf<>3_carry_in_m1 for a moment, how do any of the
> other changes result in the non-recognition of rtl that was previously valid?
> As far as I can see we only accept more rtl.

It's changing a lot of backend patterns.  There can and will be side
effects.  You're replacing a lot of RTL generation by open-coded stuff,
that's the wrong direction.

You're putting all the risk on a different backend for fixing a minor
regression in x86.

Segher

Re: [PATCH] Fix PR70011 (backlevel test case)

2016-02-29 Thread David Edelsohn

On Mon, Feb 29, 2016 at 11:49 AM, Bill Schmidt
 wrote:
> Hi,
>
> PR70011 identifies an old vectorization test that recently started
> failing on GCC 6 with POWER8 hardware.  This "failure" is that we now
> find vectorization of the test case to be profitable, where it didn't
> used to be.  A combination of two factors allowed this to become
> profitable here:  First, the POWER8 feature that unaligned vector
> accesses are supported by hardware; and second, some improvement in the
> vectorizer itself (vect_recog_mult_pattern now kicks in).
>
> The proposed fix herein is to XFAIL the test for vectorization failure
> for POWER subtargets that support efficient unaligned vector accesses.
> Since this also requires the vectorization improvement that only occurs
> in GCC 6, it makes sense to only make this change on trunk.
>
> I've verified the modified test on powerpc64le-unknown-linux-gnu
> (POWER8) and on powerpc64-unknown-linux-gnu (both POWER7 and POWER8) and
> everything works as expected.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2016-02-29  Bill Schmidt  
>
> PR target/70011
> * gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr299925.c:
> XFAIL when hardware supports efficient unaligned storage access.

Okay.

Thanks, David

[committed] PR preprocessor/69985: fix ICE with long lines in -Wformat

2016-02-29 Thread David Malcolm

PR preprocessor/69985 reports an ICE due to the failure of:
   linemap_assert_fails (map == linemap_lookup (set, r)))

The root cause seems to be the range-packing optimization I added in
r230331; it looks like I forgot to update
linemap_position_for_loc_and_offset accordingly.  Sorry.

The following patch updates linemap_position_for_loc_and_offset, making
it clear that it accepts a *column* offset, and updating the function
to use m_range_bits of the appropiate ordinary maps accordingly.

I also found the:
  offset += column;
to be confusing, so I changed it to:
  column += offset;

Successfully bootstrapped on x86_64-pc-linux-gnu.
Adds 2 PASS results to gcc.sum

Committed to trunk as r233836.

gcc/testsuite/ChangeLog:
PR preprocessor/69985
* gcc.dg/cpp/pr69985.c: New test case.

libcpp/ChangeLog:
PR preprocessor/69985
(linemap_position_for_loc_and_offset): Rename param from "offset"
to "column_offset".  Right-shift the column_offset by m_range_bits
of the pertinent ordinary map whenever offsetting a
source_location.  For clarity, offset the column by the column
offset, rather than the other way around.
---
 gcc/testsuite/gcc.dg/cpp/pr69985.c |  7 +++
 libcpp/line-map.c  | 17 +
 2 files changed, 16 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr69985.c

diff --git a/gcc/testsuite/gcc.dg/cpp/pr69985.c 
b/gcc/testsuite/gcc.dg/cpp/pr69985.c
new file mode 100644
index 000..28f17e9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/pr69985.c
@@ -0,0 +1,7 @@
+/* { dg-options "-Wformat" } */
+extern int printf (const char *__restrict __format, ...);
+void test (void)
+{
+  /* A very long line, so that we start a new line map.  */
+  printf 
("%llu01233456789012334567890123345678901233456789012334567890123345678901233456789012334567890123345678901233456789012334567890123345678901233456789");
 /* { dg-warning "15: format .%llu. expects a matching" } */
+}
diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index c05a001..264ae20 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -864,13 +864,13 @@ linemap_position_for_line_and_column (line_maps *set,
 }
 
 /* Encode and return a source_location starting from location LOC and
-   shifting it by OFFSET columns.  This function does not support
+   shifting it by COLUMN_OFFSET columns.  This function does not support
virtual locations.  */
 
 source_location
 linemap_position_for_loc_and_offset (struct line_maps *set,
 source_location loc,
-unsigned int offset)
+unsigned int column_offset)
 {
   const line_map_ordinary * map = NULL;
 
@@ -882,7 +882,7 @@ linemap_position_for_loc_and_offset (struct line_maps *set,
   (!linemap_location_from_macro_expansion_p (set, loc)))
 return loc;
 
-  if (offset == 0
+  if (column_offset == 0
   /* Adding an offset to a reserved location (like
 UNKNOWN_LOCATION for the C/C++ FEs) does not really make
 sense.  So let's leave the location intact in that case.  */
@@ -894,7 +894,7 @@ linemap_position_for_loc_and_offset (struct line_maps *set,
   /* The new location (loc + offset) should be higher than the first
  location encoded by MAP.  This can fail if the line information
  is messed up because of line directives (see PR66415).  */
-  if (MAP_START_LOCATION (map) >= loc + offset)
+  if (MAP_START_LOCATION (map) >= loc + (column_offset << map->m_range_bits))
 return loc;
 
   linenum_type line = SOURCE_LINE (map, loc);
@@ -905,7 +905,8 @@ linemap_position_for_loc_and_offset (struct line_maps *set,
  the next line map of the set.  Otherwise, we try to encode the
  location in the next map.  */
   while (map != LINEMAPS_LAST_ORDINARY_MAP (set)
-&& loc + offset >= MAP_START_LOCATION ([1]))
+&& (loc + (column_offset << map->m_range_bits)
+>= MAP_START_LOCATION ([1])))
 {
   map = [1];
   /* If the next map starts in a higher line, we cannot encode the
@@ -914,12 +915,12 @@ linemap_position_for_loc_and_offset (struct line_maps 
*set,
return loc;
 }
 
-  offset += column;
-  if (linemap_assert_fails (offset < (1u << map->m_column_and_range_bits)))
+  column += column_offset;
+  if (linemap_assert_fails (column < (1u << map->m_column_and_range_bits)))
 return loc;
 
   source_location r = 
-linemap_position_for_line_and_column (set, map, line, offset);
+linemap_position_for_line_and_column (set, map, line, column);
   if (linemap_assert_fails (r <= set->highest_location)
   || linemap_assert_fails (map == linemap_lookup (set, r)))
 return loc;
-- 
1.8.5.3

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-02-29 Thread Jason Merrill


On 01/27/2016 10:39 AM, H.J. Lu wrote:

Here is the updated patch with new testcases.  OK for trunk?


This is not a complete patch.

Please update type_is_empty_record_p to use the definition from the 
recent discussion:



An empty type is a type where it and all of its subobjects (recursively)
are of class, structure, union, or array type.


This shouldn't need to rely on the front-end setting a flag.  Be sure to 
ignore unnamed bit-fields, as they are not subobjects.


I would also suggest having this function abort if the type is 
TREE_ADDRESSABLE.


Jason

Re: [SPARC] Fix PR target/69706

> +/* Number of words (partially) occupied for a given size in units.  */
> +#define NWORDS_UP(SIZE) (((SIZE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
> 
> -#define ROUND_ADVANCE(SIZE) (((SIZE) + UNITS_PER_WORD - 1) /
> UNITS_PER_WORD)
> 
> You can use CEIL macro from system.h here.

Good idea, thanks, applied on the mainline.


PR target/69706
* config/sparc/sparc.c (NWORDS_UP): Rename to...
(CEIL_NWORDS): ...this.  Use CEIL macro.
(compute_fp_layout): Adjust to above renaming.
(function_arg_union_value): Likewise.
(sparc_arg_partial_bytes): Likewise.
(sparc_function_arg_advance): Likewise.

-- 
Eric BotcazouIndex: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 233808)
+++ config/sparc/sparc.c	(working copy)
@@ -6086,7 +6086,7 @@ conventions.  */
 /* Maximum number of fp regs for args.  */
 #define SPARC_FP_ARG_MAX 16
 /* Number of words (partially) occupied for a given size in units.  */
-#define NWORDS_UP(SIZE) (((SIZE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
+#define CEIL_NWORDS(SIZE) CEIL((SIZE), UNITS_PER_WORD)
 
 /* Handle the INIT_CUMULATIVE_ARGS macro.
Initialize a variable CUM of type CUMULATIVE_ARGS
@@ -6429,7 +6429,7 @@ compute_fp_layout (const_tree field, HOS
   else
 nregs = 1;
 
-  nslots = NWORDS_UP (nregs * GET_MODE_SIZE (mode));
+  nslots = CEIL_NWORDS (nregs * GET_MODE_SIZE (mode));
 
   if (nslots > SPARC_FP_ARG_MAX - this_slotno)
 {
@@ -6661,7 +6661,7 @@ static rtx
 function_arg_union_value (int size, machine_mode mode, int slotno,
 			  int regno)
 {
-  int nwords = NWORDS_UP (size), i;
+  int nwords = CEIL_NWORDS (size), i;
   rtx regs;
 
   /* See comment in previous function for empty structures.  */
@@ -6893,8 +6893,8 @@ sparc_arg_partial_bytes (cumulative_args
   if (TARGET_ARCH32)
 {
   if ((slotno + (mode == BLKmode
-		 ? NWORDS_UP (int_size_in_bytes (type))
-		 : NWORDS_UP (GET_MODE_SIZE (mode
+		 ? CEIL_NWORDS (int_size_in_bytes (type))
+		 : CEIL_NWORDS (GET_MODE_SIZE (mode
 	  > SPARC_INT_ARG_MAX)
 	return (SPARC_INT_ARG_MAX - slotno) * UNITS_PER_WORD;
 }
@@ -7004,8 +7004,8 @@ sparc_function_arg_advance (cumulative_a
 
   if (TARGET_ARCH32)
 cum->words += (mode == BLKmode
-		   ? NWORDS_UP (int_size_in_bytes (type))
-		   : NWORDS_UP (GET_MODE_SIZE (mode)));
+		   ? CEIL_NWORDS (int_size_in_bytes (type))
+		   : CEIL_NWORDS (GET_MODE_SIZE (mode)));
   else
 {
   if (type && AGGREGATE_TYPE_P (type))
@@ -7021,8 +7021,8 @@ sparc_function_arg_advance (cumulative_a
 	}
   else
 	cum->words += (mode == BLKmode
-		   ? NWORDS_UP (int_size_in_bytes (type))
-		   : NWORDS_UP (GET_MODE_SIZE (mode)));
+		   ? CEIL_NWORDS (int_size_in_bytes (type))
+		   : CEIL_NWORDS (GET_MODE_SIZE (mode)));
 }
 }

[PATCH][PR tree-optimization/70005] Fix uncprop's handling of boolean ranged objects compared to non-boolean values



This mirrors a fix that was made to DOM a while back.  Essentially we've 
got a test of a boolean ranged object against a value outside the range 
of a boolean.  That's not correctly handled by uncprop at the moment, 
and is fixed by this patch.


There's clearly a missed optimization there (and I'll file a separate 
bug for that).  But even so, uncprop should handle that case gracefully.


Bootstrapped and regression tested on x86-64-linux-gnu.  Installed on 
the trunk.


Jeff
commit e1c1832e8e5d338b1b4357be0dcf98cadeda9b2e
Author: law 
Date:   Tue Mar 1 00:04:48 2016 +

PR tree-optimization/70005
* tree-ssa-uncprop.c (associate_equivalences_with_edges): Handle case
where an object with a boolean range is compared against a value
outside [0..1].

PR tree-optimization/70005
* gcc.c-torture/execute/pr70005.c New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233829 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 653b51e..ccbcfe8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2016-02-28  Jeff Law  
 
+   PR tree-optimization/70005
+   * tree-ssa-uncprop.c (associate_equivalences_with_edges): Handle case
+   where an object with a boolean range is compared against a value
+   outside [0..1].
+
PR tree-optimization/6
* gimple-ssa-split-paths.c (split_paths): When duplicating a block
with an outgoing edge marked with EDGE_IRREDUCIBLE_LOOP, schedule
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 49577ee..3743d34 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2016-02-29  Jeff Law  
 
+   PR tree-optimization/70005
+   * gcc.c-torture/execute/pr70005.c New test.
+
PR tree-optimization/6
* gcc.c-torture/compile/pr6.c: New test.
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr70005.c 
b/gcc/testsuite/gcc.c-torture/execute/pr70005.c
new file mode 100644
index 000..bc37efe
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr70005.c
@@ -0,0 +1,25 @@
+
+unsigned char a = 6;
+int b, c;
+
+static void
+fn1 ()
+{
+  int i = a > 1 ? 1 : a, j = 6 & (c = a && (b = a));
+  int d = 0, e = a, f = ~c, g = b || a;
+  unsigned char h = ~a;
+  if (a)
+f = j;
+  if (h && g)
+d = a;
+  i = -~(f * d * h) + c && (e || i) ^ f;
+  if (i != 1) 
+__builtin_abort (); 
+}
+
+int
+main ()
+{
+  fn1 ();
+  return 0; 
+}
diff --git a/gcc/tree-ssa-uncprop.c b/gcc/tree-ssa-uncprop.c
index 307bb1f..e2e8212 100644
--- a/gcc/tree-ssa-uncprop.c
+++ b/gcc/tree-ssa-uncprop.c
@@ -95,7 +95,8 @@ associate_equivalences_with_edges (void)
  if (TREE_CODE (op0) == SSA_NAME
  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op0)
  && ssa_name_has_boolean_range (op0)
- && is_gimple_min_invariant (op1))
+ && is_gimple_min_invariant (op1)
+ && (integer_zerop (op1) || integer_onep (op1)))
{
  tree true_val = constant_boolean_node (true, TREE_TYPE (op0));
  tree false_val = constant_boolean_node (false,

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-02-29 Thread Evandro Menezes


On 02/29/16 12:07, Wilco Dijkstra wrote:

Evandro Menezes  wrote:

Please, verify the new "simd" and "fp" attributes for SF and DF.

Both movsf and movdf should be:

(set_attr "simd" "*,yes,*,*,*,*,*,*,*,*")
(set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")

Did you check that with -mcpu=generic+nosimd you get fmov s0, wzr?
In my version I kept the Y on the fmov and placed the neon_mov first.


The meaning of these attributes are not clear to me.  Is there a 
reference somewhere about which insns are FP or SIMD or neither?


Indeed, I had to add the Y for the f_mcr insn to match it with nosimd.  
However, I didn't feel that it should be moved to the right, since it's 
already disparaged.  Am I missing something detail?


Thank you for the review,

--
Evandro Menezes

>From 952c0f74da98efd7fcb37b2cfe3c17518a619088 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 19 Oct 2015 18:31:48 -0500
Subject: [PATCH] Replace insn to zero up SIMD registers

gcc/
	* config/aarch64/aarch64.md
	(*movhf_aarch64): Add "movi %0, #0" to zero up register.
	(*movsf_aarch64): Likewise and add "simd" and "fp" attributes.
	(*movdf_aarch64): Likewise.
---
 gcc/config/aarch64/aarch64.md | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 68676c9..416e065 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1163,12 +1163,13 @@
 )
 
 (define_insn "*movhf_aarch64"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=w, ?r,w,w,m,r,m ,r")
-	(match_operand:HF 1 "general_operand"  "?rY, w,w,m,w,m,rY,r"))]
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=w, w,?r,w,w,m,r,m ,r")
+	(match_operand:HF 1 "general_operand"  "?rY,Y, w,w,m,w,m,rY,r"))]
   "TARGET_FLOAT && (register_operand (operands[0], HFmode)
 || aarch64_reg_or_fp_zero (operands[1], HFmode))"
   "@
mov\\t%0.h[0], %w1
+   movi\\t%0.4h, #0
umov\\t%w0, %1.h[0]
mov\\t%0.h[0], %1.h[0]
ldr\\t%h0, %1
@@ -1176,19 +1177,20 @@
ldrh\\t%w0, %1
strh\\t%w1, %0
mov\\t%w0, %w1"
-  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
+  [(set_attr "type" "neon_from_gp,neon_move,neon_to_gp,neon_move,\
  f_loads,f_stores,load1,store1,mov_reg")
-   (set_attr "simd" "yes,yes,yes,*,*,*,*,*")
-   (set_attr "fp"   "*,*,*,yes,yes,*,*,*")]
+   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,*,yes,yes,*,*,*")]
 )
 
 (define_insn "*movsf_aarch64"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
-	(match_operand:SF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=w, w,?r,w,w  ,w,m,r,m ,r")
+	(match_operand:SF 1 "general_operand"  "?rY,Y, w,w,Ufc,m,w,m,rY,r"))]
   "TARGET_FLOAT && (register_operand (operands[0], SFmode)
 || aarch64_reg_or_fp_zero (operands[1], SFmode))"
   "@
fmov\\t%s0, %w1
+   movi\\t%0.2s, #0
fmov\\t%w0, %s1
fmov\\t%s0, %s1
fmov\\t%s0, %1
@@ -1197,17 +1199,20 @@
ldr\\t%w0, %1
str\\t%w1, %0
mov\\t%w0, %w1"
-  [(set_attr "type" "f_mcr,f_mrc,fmov,fconsts,\
- f_loads,f_stores,load1,store1,mov_reg")]
+  [(set_attr "type" "f_mcr,neon_move,f_mrc,fmov,fconsts,\
+ f_loads,f_stores,load1,store1,mov_reg")
+   (set_attr "simd" "*,yes,*,*,*,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")]
 )
 
 (define_insn "*movdf_aarch64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
-	(match_operand:DF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, w,?r,w,w  ,w,m,r,m ,r")
+	(match_operand:DF 1 "general_operand"  "?rY,Y, w,w,Ufc,m,w,m,rY,r"))]
   "TARGET_FLOAT && (register_operand (operands[0], DFmode)
 || aarch64_reg_or_fp_zero (operands[1], DFmode))"
   "@
fmov\\t%d0, %x1
+   movi\\t%d0, #0
fmov\\t%x0, %d1
fmov\\t%d0, %d1
fmov\\t%d0, %1
@@ -1216,8 +1221,10 @@
ldr\\t%x0, %1
str\\t%x1, %0
mov\\t%x0, %x1"
-  [(set_attr "type" "f_mcr,f_mrc,fmov,fconstd,\
- f_loadd,f_stored,load1,store1,mov_reg")]
+  [(set_attr "type" "f_mcr,neon_move,f_mrc,fmov,fconstd,\
+ f_loadd,f_stored,load1,store1,mov_reg")
+   (set_attr "simd" "*,yes,*,*,*,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")]
 )
 
 (define_insn "*movtf_aarch64"
-- 
2.6.3

[PATCH][PR tree-optimization/69999] Schedule loop fixups when duplicating certain blocks for path splitting




We've recently started looking at getting loop fixups scheduled when 
removing edges to address a regression.


This BZ is a closely related problem, namely that duplicating a block 
can turn an irreducible loop into a natural loop as well.  After the 
problems with tying into the low level CFG manipulation code for the 
edge removal code, I don't want to go down that path again for gcc-6 in 
the block duplication code.


Thus this patch targets the specific pass (path splitting) that is 
duplicating the blocks rather than doing something more general in the 
block duplication code.  Hence this only fixes the one known instance of 
this problem.  I wouldn't be terribly surprised if folks are able to 
coerce other passes that duplicate blocks into generating similar problems.


I suspect for gcc-7 and beyond we're going to want to look at how the 
CFG and loop codes need to communicate/cooperate better on this stuff.


Bootstrapped and regression tested on x86_64-linux-gnu.   Installing on 
the trunk momentarily.


Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 966e06d..653b51e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-02-28  Jeff Law  
+
+   PR tree-optimization/6
+   * gimple-ssa-split-paths.c (split_paths): When duplicating a block
+   with an outgoing edge marked with EDGE_IRREDUCIBLE_LOOP, schedule
+   loop cleanups.
+
 2016-02-29  Richard Biener  
 
PR tree-optimization/69994
diff --git a/gcc/gimple-ssa-split-paths.c b/gcc/gimple-ssa-split-paths.c
index ac6de81..0a0bef8 100644
--- a/gcc/gimple-ssa-split-paths.c
+++ b/gcc/gimple-ssa-split-paths.c
@@ -294,6 +294,24 @@ split_paths ()
  basic_block pred0 = EDGE_PRED (bb, 0)->src;
  transform_duplicate (pred0, bb);
  changed = true;
+
+ /* If BB has an outgoing edge marked as IRREDUCIBLE, then
+duplicating BB may result in an irreducible region turning
+into a natural loop.
+
+Long term we might want to hook this into the block
+duplication code, but as we've seen with similar changes
+for edge removal, that can be somewhat risky.  */
+ if (EDGE_SUCC (bb, 0)->flags & EDGE_IRREDUCIBLE_LOOP
+ || EDGE_SUCC (bb, 1)->flags & EDGE_IRREDUCIBLE_LOOP)
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+  "Join block %d has EDGE_IRREDUCIBLE_LOOP set.  "
+  "Scheduling loop fixups.\n",
+  bb->index);
+ loops_state_set (LOOPS_NEED_FIXUP);
+   }
}
 }
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index a97af64..49577ee 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2016-02-29  Jeff Law  
+
+   PR tree-optimization/6
+   * gcc.c-torture/compile/pr6.c: New test.
+
 2016-02-29  Yuri Rumyantsev  
 
PR tree-optimization/69652
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr6.c 
b/gcc/testsuite/gcc.c-torture/compile/pr6.c
new file mode 100644
index 000..5659ce4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr6.c
@@ -0,0 +1,16 @@
+int uh;
+
+void
+ha(void)
+{
+  while (uh) {
+for (uh = 0; uh < 1; ++uh) {
+  uh = 0;
+  if (uh != 0)
+ ts:
+uh %= uh;
+}
+++uh;
+  }
+  goto ts;
+}

[wwwdocs] extensions.html -- update sourceforge.net link

2016-02-29 Thread Gerald Pfeifer

sourceforge.net now uses https.

Committed.

Gerald

Index: extensions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/extensions.html,v
retrieving revision 1.57
diff -u -r1.57 extensions.html
--- extensions.html 28 Feb 2016 19:50:07 -  1.57
+++ extensions.html 29 Feb 2016 22:40:11 -
@@ -61,7 +61,7 @@
 
 
 Bounds checking patches for
-http://sourceforge.net/projects/boundschecking/;>GCC releases
+https://sourceforge.net/projects/boundschecking/;>GCC releases
 
 
 These patches add a -fbounds-checking flag that

[wwwdocs] msdn.microsoft.com now uses https (gcc-4.7/changes.html)

2016-02-29 Thread Gerald Pfeifer

Committed.

Gerald

Index: gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.142
diff -u -r1.142 changes.html
--- gcc-4.7/changes.html14 Nov 2015 23:27:24 -  1.142
+++ gcc-4.7/changes.html28 Feb 2016 20:53:16 -
@@ -635,7 +635,7 @@
   depends on the user environment settings; see the ulimit -c
   setting for POSIX shells, limit coredumpsize for C shells,
   and the http://msdn.microsoft.com/en-us/library/bb787181%28v=vs.85%29.aspx;
+  
href="https://msdn.microsoft.com/en-us/library/bb787181%28v=vs.85%29.aspx;
   >WER user-mode dumps settings on Windows.
 The https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gfortran/Debugging-Options.html#index-g_t_0040code_007bfno_002dbacktrace_007d-183;

[RFC][PR69708] IPA inline not working for function reference in static const struc

2016-02-29 Thread kugan


Hi,

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69708 and 
corresponding mailing list discussion, IPA CP is not detecting  a 
jump-function with the sq function as value.



static int sq(int x) {
  return x * x;
}

static const F f = {sq};
...
dosomething (g(f, x));
...

I added a check at  determine_locally_known_aggregate_parts to detect 
this. This fixes the testcase and passes x86-64-linux-gnu lto bootstrap 
and regression testing with no new regression. Does this look sensible 
place to fix this?


Thanks,
Kugan

gcc/ChangeLog:



2016-03-01  Kugan Vivekanandarajah  



* ipa-prop.c (determine_locally_known_aggregate_parts): Determine jump

 function for static constant initialization.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 72c2fed..22da097 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1562,6 +1562,57 @@ determine_locally_known_aggregate_parts (gcall *call, 
tree arg,
   jfunc->agg.by_ref = by_ref;
   build_agg_jump_func_from_list (list, const_count, arg_offset, jfunc);
 }
+  else if ((TREE_CODE (arg) == VAR_DECL)
+  && is_global_var (arg))
+{
+  /* PR69708:  Figure out aggregate jump-function with constant init
+value.  */
+  struct ipa_known_agg_contents_list *n, **p;
+  HOST_WIDE_INT offset = 0, size, max_size;
+  varpool_node *node = varpool_node::get (arg);
+  if (node
+ && DECL_INITIAL (node->decl)
+ && TREE_READONLY (node->decl)
+ && TREE_CODE (DECL_INITIAL (node->decl)) == CONSTRUCTOR)
+   {
+ tree exp = DECL_INITIAL (node->decl);
+ unsigned HOST_WIDE_INT ix;
+ tree field, val;
+ bool reverse;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), ix, field, val)
+   {
+ bool already_there = false;
+ if (!field)
+   break;
+ get_ref_base_and_extent (field, , ,
+  _size, );
+ if (max_size == -1
+ || max_size != size)
+   break;
+ p = get_place_in_agg_contents_list (, offset, size,
+ _there);
+ if (!p)
+   break;
+ n = XALLOCA (struct ipa_known_agg_contents_list);
+ n->size = size;
+ n->offset = offset;
+ if (is_gimple_ip_invariant (val))
+   {
+ n->constant = val;
+ const_count++;
+   }
+ else
+   n->constant = NULL_TREE;
+ n->next = *p;
+ *p = n;
+   }
+   }
+  if (const_count)
+   {
+ jfunc->agg.by_ref = by_ref;
+ build_agg_jump_func_from_list (list, const_count, arg_offset, jfunc);
+   }
+}
 }
 
 static tree

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Michael Matz

Hi,

On Mon, 29 Feb 2016, Mikael Pettersson wrote:

> Well, almost.  While it is true that a signal handler cannot 
> *accidentally* clobber the register state of the interrupted thread, it 
> can in fact access and update any part of that state via the ucontext_t 
> passed to it.  Doing so is uncommon, but not unheard of and not even 
> that difficult -- I've done it myself in several different runtime 
> systems.

Yeah, well, sure.  That's not clobbering the registers directly though, 
but setting it up so that the kernel does it on return :)  If you do that, 
you have to have a special sig-handler anyway, lest it clobbers other 
registers that are currently in use by the interrupted piece of code.

> The code in a signal handler cannot assume that global register 
> variables are in sync with the interrupted thread, or that plain 
> assignments to them are reflected back, but that's not GCC's fault, nor 
> is it GCC's job to make that happen.

And it's documented to not happen (reliably anyway), so all is fine.

Ciai,
Michael.

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Mikael Pettersson

Michael Matz writes:
 > > > FWIW: signal handlers need no consideration (if they were allowed to
 > > > inspect/alter global reg vars we would have lost and no improvement on
 > > > fixed_regs[] could be done).  They are explicitely documented to not be
 > > > able to access global reg vars.  (They already can't accidentally clobber
 > > > the register in question because signal handlers don't do that)
 > > 
 > > Oh, they can't modify the register in question because the OS would 
 > > restore it?
 > 
 > Yep.

Well, almost.  While it is true that a signal handler cannot *accidentally*
clobber the register state of the interrupted thread, it can in fact access
and update any part of that state via the ucontext_t passed to it.  Doing so
is uncommon, but not unheard of and not even that difficult -- I've done it
myself in several different runtime systems.

The code in a signal handler cannot assume that global register variables
are in sync with the interrupted thread, or that plain assignments to them
are reflected back, but that's not GCC's fault, nor is it GCC's job to make
that happen.

/Mikael

Re: PPC libgcc IEEE128 soft-fp exception/rounding fixes

2016-02-29 Thread Bill Schmidt

Hi David,

Please ignore this request.  I had asked Paul to do this, but was
confused that it relied on other patches that are not in GCC 5.  My
fault.

Thanks,
Bill

On Mon, 2016-02-29 at 11:33 -0600, Paul E. Murphy wrote:
> Hi David,
> 
> Bill merged this into trunk last week.
> 
> Is it okay to backport this to GCC 5?
> 
> Thanks,
> Paul
> 
> On 02/22/2016 05:51 PM, David Edelsohn wrote:
> > libgcc
> > * config/rs6000/sfp-machine.h:
> > (_FP_DECL_EX): Declare _fpsr as a union of u64 and double.
> > (FP_TRAPPING_EXCEPTIONS): Return a bitmask of trapping
> > exceptions.
> > (FP_INIT_ROUNDMODE): Read the fpscr instead of writing
> > a mystery value.
> > (FP_ROUNDMODE): Update the usage of _fpscr.
> > 
> > Okay.
> > 
> > Thanks, David
> >

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-29 Thread Richard Henderson

On 02/26/2016 01:52 PM, Segher Boessenkool wrote:
> On Fri, Feb 26, 2016 at 01:35:10PM -0800, Richard Henderson wrote:
>> On 02/26/2016 01:03 PM, Segher Boessenkool wrote:
>>> On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
 +  /* Perform rematerialization if only all operands are registers and
 + all operations are PLUS.  */
 +  for (i = 0; i < n_ops; i++)
 +  if (ops[i].neg || !REG_P (ops[i].op))
 +return NULL_RTX;
 +  goto gen_result;
 +}
>>>
>>> If you check for fixed registers as well here, does that work for you?
>>
>> Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
>> occur via arithmetic on locally allocated arrays.
> 
> Where are these canonicalization rules described?

swap_commutative_operands_p?


> It is stage 4.  This rs6000 change has almost 100% chance of introducing
> regressions.

Really?  Nearly 100%?

Ignoring the change to subf<>3_carry_in_m1 for a moment, how do any of the
other changes result in the non-recognition of rtl that was previously valid?
As far as I can see we only accept more rtl.


r~

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Michael Matz

Hi,

On Mon, 29 Feb 2016, Bernd Schmidt wrote:

> On 02/29/2016 06:07 PM, Michael Matz wrote:
> 
> > %rbx would have to be implicitly used/clobbered by the asm.  In addition
> > it would have to be used by all function entries and exits (so that a
> > function body where the global reg var is merely visible but not used
> > doesn't accidentally clobber that register) and of course by calls.
> 
> Nearly all this exists as of today. From df-scan.c:

Okay, that looks good, I agree (modulo the asms).

> > FWIW: signal handlers need no consideration (if they were allowed to
> > inspect/alter global reg vars we would have lost and no improvement on
> > fixed_regs[] could be done).  They are explicitely documented to not be
> > able to access global reg vars.  (They already can't accidentally clobber
> > the register in question because signal handlers don't do that)
> 
> Oh, they can't modify the register in question because the OS would 
> restore it?

Yep.

> Ok so maybe reopen and apply my patch for gcc-7, with a tweak for asms.

That seems workable.  At least I can't imagine other implicit uses of such 
registers.


Ciao,
Michael.

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-02-29 Thread Wilco Dijkstra

Evandro Menezes  wrote:
> 
> Please, verify the new "simd" and "fp" attributes for SF and DF.

Both movsf and movdf should be:

(set_attr "simd" "*,yes,*,*,*,*,*,*,*,*")
(set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")

Did you check that with -mcpu=generic+nosimd you get fmov s0, wzr?
In my version I kept the Y on the fmov and placed the neon_mov first.

Wilco

[PATCH][ARM] Error out of arm_neon.h if compiling for soft-float ABI


Hi all,

Now that we've moved to pragmas guarding the various intrinsics in arm_neon.h I 
think we should still
throw a #error if someone tries to include the header while compiling for 
-mfloat-abi=soft.

This gives a more helpful error message when someone has a compiler configured 
--with-float=soft
and forgets to add an appropriate -mfloat-abi option on the command line.
Currently we'll just give tons of error messages whereas with this patch we just
show a simple clean message.

Tested on arm. This could be argued to be a user experience regression fix.
Ok for trunk?

Thanks,
Kyrill

2016-02-29  Kyrylo Tkachov  

* config/arm/arm_neon.h: Show error if using with soft-float ABI.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 47816d52187b979b92d7592991d29e4cbe8f9357..6a880235d24759e9938fb08365eaddff77d60f0e 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -27,6 +27,10 @@
 #ifndef _GCC_ARM_NEON_H
 #define _GCC_ARM_NEON_H 1
 
+#ifndef __ARM_FP
+#error "NEON intrinsics not available with the soft-float ABI.  Please use -mfloat-abi=softp or -mfloat-abi=hard"
+#else
+
 #pragma GCC push_options
 #pragma GCC target ("fpu=neon")
 
@@ -14833,3 +14837,4 @@ vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
 #pragma GCC pop_options
 
 #endif
+#endif

Re: [PATCH] Fix PR69951

2016-02-29 Thread James Greenhalgh

On Fri, Feb 26, 2016 at 09:32:53AM +0100, Richard Biener wrote:
> 
> The following fixes PR69951, hopefully the last case of decl alias
> issues with alias analysis.  This time it's points-to and the DECL_UIDs
> used in points-to sets not being canonicalized.
> 
> The simplest (and cheapest) fix is to make aliases refer to the
> ultimate alias target via their DECL_PT_UID which we conveniently
> have available.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> 
> Richard.
> 
> 2016-02-26  Richard Biener  
> 
>   PR tree-optimization/69551
>   * tree-ssa-structalias.c (get_constraint_for_ssa_var): When
>   looking through aliases adjust DECL_PT_UID to refer to the
>   ultimate alias target.
> 
>   * gcc.dg/torture/pr69951.c: New testcase.

I see this new testcase failing on an ARM target as so:

/tmp/ccChjoFc.s: Assembler messages:
/tmp/ccChjoFc.s:21: Warning: [-mwarn-syms]: Assignment makes a symbol match 
an ARM instruction: b

FAIL: gcc.dg/torture/pr69951.c   -O0  (test for excess errors)

But I haven't managed to reproduce it outside of the test environment.

The fix looks trivial, rename b to anything else you fancy (well... stay
clear of add and ldr). I'll put a fix in myself if I can manage to get
this to reproduce - though if anyone else wants to do it I won't be
offended :-).

Thanks,
James

Re: PPC libgcc IEEE128 soft-fp exception/rounding fixes

2016-02-29 Thread Paul E. Murphy

Hi David,

Bill merged this into trunk last week.

Is it okay to backport this to GCC 5?

Thanks,
Paul

On 02/22/2016 05:51 PM, David Edelsohn wrote:
> libgcc
> * config/rs6000/sfp-machine.h:
> (_FP_DECL_EX): Declare _fpsr as a union of u64 and double.
> (FP_TRAPPING_EXCEPTIONS): Return a bitmask of trapping
> exceptions.
> (FP_INIT_ROUNDMODE): Read the fpscr instead of writing
> a mystery value.
> (FP_ROUNDMODE): Update the usage of _fpscr.
> 
> Okay.
> 
> Thanks, David
>

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Bernd Schmidt


On 02/29/2016 06:07 PM, Michael Matz wrote:


%rbx would have to be implicitly used/clobbered by the asm.  In addition
it would have to be used by all function entries and exits (so that a
function body where the global reg var is merely visible but not used
doesn't accidentally clobber that register) and of course by calls.


Nearly all this exists as of today. From df-scan.c:

static void
df_get_call_refs (struct df_collection_rec *collection_rec,
[...]
  else if (global_regs[i])
{
  /* Calls to const functions cannot access any global 
registers and

 calls to pure functions cannot set them.  All other calls may
 reference any of the global registers, so they are recorded as
 used. */
  if (!RTL_CONST_CALL_P (insn_info->insn))

and

static void
df_get_entry_block_def_set (bitmap entry_block_defs)
{
[...]
  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
{
  if (global_regs[i])
bitmap_set_bit (entry_block_defs, i);

and

  /* Mark all global registers, and all registers used by the
 epilogue as being live at the end of the function since they
 may be referenced by our caller.  */
  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
if (global_regs[i] || EPILOGUE_USES (i))
  bitmap_set_bit (exit_block_uses, i);


FWIW: signal handlers need no consideration (if they were allowed to
inspect/alter global reg vars we would have lost and no improvement on
fixed_regs[] could be done).  They are explicitely documented to not be
able to access global reg vars.  (They already can't accidentally clobber
the register in question because signal handlers don't do that)


Oh, they can't modify the register in question because the OS would 
restore it? Hadn't thought of that.


Ok so maybe reopen and apply my patch for gcc-7, with a tweak for asms.


Bernd

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Michael Matz

Hi,

On Fri, 26 Feb 2016, Bernd Schmidt wrote:

> Calls do, asms currently don't AFAICT. Not sure whether it's allowed to 
> use them, but I think it should be straightforward to adjust df-scan.
> 
> > Some jit-like code uses global reg vars to reserve registers for the 
> > generated code.  It would have to add -ffixed- compilation 
> > options instead of only the global vars after the patch.  I think the 
> > advantages of having "good RA" with global reg vars are dubious enough 
> > to not warrant breaking backward compatibility.
> 
> I don't see the need for new options - if we have proper live info for 
> calls and asms, what other reason could there be for incomplete liveness 
> information?

[now moot, since wontfix, but still:]

Well, but we don't have proper live info, as you already said yourself 
above.  What I mean to say is, the following is currently proper use of 
global reg vars:

register uint64_t ugh __asm__("rbx"); //r11, whatever
void write_into_ugh (void)
{
  asm volatile ("mov 1, %%rbx" :::);
  assert (ugh == 1);
}

%rbx would have to be implicitly used/clobbered by the asm.  In addition 
it would have to be used by all function entries and exits (so that a 
function body where the global reg var is merely visible but not used 
doesn't accidentally clobber that register) and of course by calls.

AFAICS this would fix the code in push_flag_into_global_reg_var, because 
then indeed you'd have proper live info, including proper deaths.  But for 
everything a little more complicated than this it'd basically be the same 
as putting the reg into fixed_regs[].

FWIW: signal handlers need no consideration (if they were allowed to 
inspect/alter global reg vars we would have lost and no improvement on 
fixed_regs[] could be done).  They are explicitely documented to not be 
able to access global reg vars.  (They already can't accidentally clobber 
the register in question because signal handlers don't do that)

So, I think it can be made to work, but not something for GCC 6, and if 
the additional bit for all calls/asms/function-entry-exits really is worth 
it ... I just don't know.

Ciao,
Michael.

[PATCH] Fix PR70011 (backlevel test case)

2016-02-29 Thread Bill Schmidt

Hi,

PR70011 identifies an old vectorization test that recently started
failing on GCC 6 with POWER8 hardware.  This "failure" is that we now
find vectorization of the test case to be profitable, where it didn't
used to be.  A combination of two factors allowed this to become
profitable here:  First, the POWER8 feature that unaligned vector
accesses are supported by hardware; and second, some improvement in the
vectorizer itself (vect_recog_mult_pattern now kicks in).

The proposed fix herein is to XFAIL the test for vectorization failure
for POWER subtargets that support efficient unaligned vector accesses.
Since this also requires the vectorization improvement that only occurs
in GCC 6, it makes sense to only make this change on trunk.

I've verified the modified test on powerpc64le-unknown-linux-gnu
(POWER8) and on powerpc64-unknown-linux-gnu (both POWER7 and POWER8) and
everything works as expected.  Is this ok for trunk?

Thanks,
Bill


2016-02-29  Bill Schmidt  

PR target/70011
* gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr299925.c:
XFAIL when hardware supports efficient unaligned storage access.


Index: 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c
===
--- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c  
(revision 233813)
+++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c  
(working copy)
@@ -35,5 +35,5 @@ int main()
return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { 
xfail { vect_hw_misalign } } } } */

[PATCH]Replace -shared with -r -nostdlib in gcc.dg/lto/pr61526 pr54709 pr64415 test cases.

2016-02-29 Thread Renlin Li


Hi all,

The gcc.dg/lto/pr54709, pr61526, pr64415 linking testcases keep failing on
arm/aarch64 bare-metal target.

It's because statically built newlib library is used to link with shared object.
And the linker complains about relocations which cannot be used in
shared object.

For example, the following errors are produced:

crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be
used when making a shared object; recompile with -fPIC

crtbegin.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not
be used when making a shared object; recompile with -fPIC

librdimon.a(rdimon-syscalls.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against
external symbol `_impure_ptr' can not be used when making a shared object;
recompile with -fPIC

Presumably, bare-metal toolchain for other architecture have those test case
failures as well?

In this patch, -shared option is replace by -r -nostdlib. So that the standard
system startup files or libraries are not used when linking.


arm-none-eabi, aarch64-none-elf regression test OK, OK for trunk?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-02-29  Renlin Li

* gcc.dg/lto/pr54709_0.c: Replace -shard with -r -nostdlib.
* gcc.dg/lto/pr61526_0.c: Ditto.
* gcc.dg/lto/pr64415_0.c: Ditto.

diff --git a/gcc/testsuite/gcc.dg/lto/pr54709_0.c b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
index f3db5dc..12a10e0 100644
--- a/gcc/testsuite/gcc.dg/lto/pr54709_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
@@ -1,7 +1,7 @@
 /* { dg-lto-do link } */
 /* { dg-require-visibility "hidden" } */
 /* { dg-require-effective-target fpic } */
-/* { dg-extra-ld-options { -shared } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
 /* { dg-lto-options { { -fPIC -fvisibility=hidden -flto } } } */
 
 void foo (void *p, void *q, unsigned s)
diff --git a/gcc/testsuite/gcc.dg/lto/pr61526_0.c b/gcc/testsuite/gcc.dg/lto/pr61526_0.c
index 8a631f0..5e2f7acf 100644
--- a/gcc/testsuite/gcc.dg/lto/pr61526_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr61526_0.c
@@ -1,7 +1,7 @@
 /* { dg-require-effective-target fpic } */
 /* { dg-lto-do link } */
 /* { dg-lto-options { { -fPIC -flto -flto-partition=1to1 } } } */
-/* { dg-extra-ld-options { -shared } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
 
 static void *master;
 void *foo () { return master; }
diff --git a/gcc/testsuite/gcc.dg/lto/pr64415_0.c b/gcc/testsuite/gcc.dg/lto/pr64415_0.c
index 4faab2b..0f583a5 100644
--- a/gcc/testsuite/gcc.dg/lto/pr64415_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr64415_0.c
@@ -1,7 +1,7 @@
 /* { dg-lto-do link } */
 /* { dg-require-effective-target fpic } */
 /* { dg-lto-options { { -O -flto -fpic } } } */
-/* { dg-extra-ld-options { -shared } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
 /* { dg-extra-ld-options "-Wl,-undefined,dynamic_lookup" { target *-*-darwin* } } */
 
 extern void bar(char *, int);

Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-29 Thread Renlin Li


Hi Vladimir,

Thank you for explain it.
I have a few comments inlined.

On 26/02/16 23:54, Vladimir Makarov wrote:


Thanks for working on this and providing a good description of the 
problem.  Could you fill a PR and provide a test even if you can not 
reduce it.


I will fill a PR. Try to reduce a test case. As it's triggered by my 
local change to gcc, I cannot guarantee it.

Anyway, I am quite happy to test your fix when you have one.

As for the scratch.  As I understand the scratch was introduced for 
operands which will not require any resources (memory or a new 
register) for some insn alternatives.  If we use pseudo for this, it 
will always need memory or a register.  The typical constraint for 
scratch is "r,X" or "0r".  So I guess using just "" for scratch is a 
bad practice.  Still for compatibility I think we should implement the 
same reload behaviour for this case too.


Actually (clobber (match_scratch:MODE x "=r")) also triggers this ICE. 
The early clobber modifier here doesn't really matter.
the purpose of this pattern is to reserve a pseudo register for use as a 
temporary.
The "=" modifier is required for MATCH_SCRATCH expression. Otherwise, it 
will error "missing output reload"

That why (set scratch, RXX) is generated.



I believe we should use the same technique -- changing scratches to 
pseudo and back at the end of LRA if they don't need a register.  It 
will solve also a possible problem for correct scratch generation 
during LRA.


I am going to work on this problem on the next week.  A test case 
would be a help for me.


gcc/ChangeLog:

2016-02-26  Renlin Li

* lra-constraints.c (curr_insn_transform): Don't generate reload for
output scratch operand.

Sorry, I can not accept the patch as I'd like to provide a better 
solution I described above.  The patch is also wrong for unused 
non-scratch operands.  They still should be reloaded if they do not 
satisfy their constraints even if they are not used later.




I think still it will be reload according to the code logic here:

if (get_reload_reg (type, mode, old, goal_alt[i],
  loc != curr_id->operand_loc[i], "", _reg)
  && type != OP_OUT)
{
  push_to_sequence (before);
  lra_emit_move (new_reg, old);
  before = get_insns ();
  end_sequence ();
}
  *loc = new_reg; -> the original operand will 
be replaced by a reload reg.

  if (type != OP_IN
  /* Don't generate reload for output scratch operand.  */
  && GET_CODE (old) != SCRATCH
  && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)


a reload register will be generated to replace the old operand in the 
original rtx pattern to satisfy their constraints.
Later, it will check, if this operand is an ouput operand which will be 
used later, another insn will be generated to

move newly generate pseudo into old operand.

The patch is to add one more condition to this final insn generation.

Regards,
Renlin

[PATCH][ARM] Reduce size of arm1020e automaton


Hi all,

I've had this one sitting in my tree for some time.
The arm1020e automaton has no business being as large as it is (3185 states).
Most of the bloat is due to overly large reservation durations for calls and FP 
division.

This patch reduces the durations to something more sensible.
This brings down the number of states from 3185 states to 320 states.
There are bigger fish to fry on that front, but every little bit helps as we're
already approaching a gigabyte of memory required for genautomata processing.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk or GCC 7?

Thanks,
Kyrill

2016-02-29  Kyrylo Tkachov  

* config/arm/arm1020e.md (1020call_op): Reduce reservation
duration.
(v10_fdivs): Likewise.
(v10_fdivd): Likewise.
diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md
index 7cdab57ddb34346fa21f2935d2bc29c4f0b827d8..84a300d804541d63e82c08f517f4af136df2d642 100644
--- a/gcc/config/arm/arm1020e.md
+++ b/gcc/config/arm/arm1020e.md
@@ -246,13 +246,14 @@ (define_insn_reservation "1020branch_op" 0
   (eq_attr "type" "branch"))
  "1020a_e")
 
-;; The latency for a call is not predictable.  Therefore, we use 32 as
-;; roughly equivalent to positive infinity.
+;; The latency for a call is not predictable.  Therefore, we model as blocking
+;; execution for a number of cycles but we can't do anything more accurate
+;; than that.
 
 (define_insn_reservation "1020call_op" 32
  (and (eq_attr "tune" "arm1020e,arm1022e")
   (eq_attr "type" "call"))
- "1020a_e*32")
+ "1020a_e*4")
 
 
 ;; VFP
@@ -300,12 +301,12 @@ (define_insn_reservation "v10_fmul" 6
 (define_insn_reservation "v10_fdivs" 18
  (and (eq_attr "vfp10" "yes")
   (eq_attr "type" "fdivs, fsqrts"))
- "1020a_e+v10_ds*14")
+ "1020a_e+v10_ds*4")
 
 (define_insn_reservation "v10_fdivd" 32
  (and (eq_attr "vfp10" "yes")
   (eq_attr "type" "fdivd, fsqrtd"))
- "1020a_e+v10_fmac+v10_ds*28")
+ "1020a_e+v10_fmac+v10_ds*4")
 
 (define_insn_reservation "v10_floads" 4
  (and (eq_attr "vfp10" "yes")

Re: [PATCH] Fix PR69983

2016-02-29 Thread Jakub Jelinek

On Mon, Feb 29, 2016 at 04:26:12PM +0100, Richard Biener wrote:
> *** get_unary_op (tree name, enum tree_code
> *** 621,626 
> --- 641,680 
> return NULL_TREE;
>   }
>   
> + /* Return true if OP1 and OP2 have the same value if casted to either type. 
>  */
> + 
> + static bool
> + ops_equal_values_p (tree op1, tree op2)
> + {
> +   if (op1 == op2)
> + return true;
> + 
> +   if (TREE_CODE (op1) == SSA_NAME)
> + {
> +   gimple *stmt = SSA_NAME_DEF_STMT (op1);
> +   if (gimple_nop_conversion_p (stmt))
> + {
> +   op1 = gimple_assign_rhs1 (stmt);
> +   if (op1 == op2)
> + return true;
> + }
> + }
> + 
> +   if (TREE_CODE (op2) == SSA_NAME)
> + {
> +   gimple *stmt = SSA_NAME_DEF_STMT (op2);
> +   if (gimple_nop_conversion_p (stmt))
> + {
> +   op2 = gimple_assign_rhs1 (stmt);
> +   if (op1 == op2)
> + return true;
> + }
> + }

This will not work if you have:
  x_1 = (nop) x_0;
  x_2 = (nop) x_1;
and op1 is x_1 and op2 is x_2 (but will work
if op1 is x_2 and op1 is x_1).
Wouldn't it be better to also remember the original
tree orig_op1 = op1;
at the beginning and in the last comparison do
if (op1 == op2 || orig_op1 == op2)
?

Jakub

Re: [PATCH][SPARC] sparc: switch -fasynchronous-unwind-tables on by default.

2016-02-29 Thread Jose E. Marchesi


Hi Eric.

> In sparc systems glibc uses libgcc's unwinder to implement the
> backtrace(3) function, defaulting to a simple non-dwarf unwinder if
> libgcc_s doesn't provide a working _Unwind_Backtrace.
> 
> However, libgcc's unwinder uses .eh_frame instead of .frame_debug, and
> .eh_frame is fully populated only if applications are built with
> -fexceptions or -fasynchronous-unwind-tables.
> 
> This patch changes GCC to assume -fasynchronous-unwind-tables by default
> in sparcv9 and sparc64, like other ports (notably x86) do.

eric@polaris:~/svn/gcc/gcc/common/config> grep -r 
x_flag_asynchronous_unwind_tables .
./tilegx/tilegx-common.c:  opts->x_flag_asynchronous_unwind_tables = 1;
./tilepro/tilepro-common.c:  opts->x_flag_asynchronous_unwind_tables = 1;
./i386/i386-common.c:  opts->x_flag_asynchronous_unwind_tables = 2;
./s390/s390-common.c:  opts->x_flag_asynchronous_unwind_tables = 1;

In particular, the 2 means that it's overridden by USE_IX86_FRAME_POINTER, 
i.e. the frame pointer is always enabled instead (e.g on Solaris).

Ah, so I guess the right value to set in sparc-*-* is 1.

What's the problem exactly here?  Simple non-DWARF unwinders usually work 
fine 
with the SPARC architecture thanks to the calling conventions.

Consider the attached test program.  When built with -g in sparc64-*-*
the resulting binary contains:

- A .eh_frame segment containing CFA information for __libc_csu_init and
  __libc_csu_fini.

- A .debug_frame segment containing CFA information for func2, func1 and
  main.

The backtrace(3) implementation for sparc contains a simple unwinder
that works well in most cases, but that unwinder is not used if
libgcc_s.so can be dlopened and it provides _Unwind_Backtrace.  Now,
_Unwind_Backtrace uses .eh_frame but not .debug_frame.  Thus,
backtrace(3) is only useful in programs built with
-fasynchronous-unwind-tables even if -g provides CFA info in
.debug_frame.

I see three solutions to this:
- To change glibc in order to not use libgcc's DWARF unwinder.
- To expand the libgcc unwinder to use the CFA in .frame_debug.
- To change GCC in sparc-*-* to generate fully populated .eh_frame
  sections by default. (The patch I attempted.)

#include 
#include 

int func2()
{
  const int MAXFRAME = 10;
  void *buffer[MAXFRAME];
  int nframes = backtrace(buffer, MAXFRAME);
  printf("backtrace returned %d frames\n", nframes);
  return nframes;
}

void func1()
{
  int num = func2();
  printf("func2 returned %d\n", num);
  return;
}

int main()
{
  func1();
  return 0;
}

Re: [WWWDocs] Deprecate support for non-thumb ARM devices

2016-02-29 Thread Joel Sherrill

On 2/29/2016 5:37 AM, Kyrill Tkachov wrote:

On 28/02/16 21:34, Joel Sherrill wrote:

On February 28, 2016 3:20:24 PM CST, Gerald Pfeifer wrote:

On Wed, 24 Feb 2016, Richard Earnshaw (lists) wrote:

I propose to commit this patch later this week.

+ Support for revisions of the ARM architecture prior to ARMv4t
has
+ been deprecated and will be removed in a future GCC release.
+ This affects ARM6, ARM7 (but not ARM7TDMI), ARM8, StrongARM,
and
+ Faraday fa526 and fa626 devices, which do not have support for
+ the Thumb execution state.

I am wondering whether this may be confusing for those not
intricately familiar with the older history of ARM platforms.

ARMv8 is pretty new, googling for it has
http://www.arm.com/products/processors/armv8-architecture.php
as first hit, for example, and the only difference versus ARM8
is that little lower-case "v".

I assume this means a number of values for the various -mXXX arguments will be
removed. Would it be more helpful to list those values?

I have to agree with Gerald. I think this will obsolete a few older RTEMS BSPs
but based on that wording, I don't know which.

ARM8 is a processor, whereas ARMv8-A is an architecture.
I think Richard's link earlier in the thread:

https://community.arm.com/groups/processors/blog/2011/11/02/arm-fundamentals-introduction-to-understanding-arm-processors

That you referred to code to know the impact seems to confirm my concern that
this is not something most users would realize.

arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,
arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,
strongarm1100,strongarm1110,fa526,fa626.

The arguments to -march that would be deprecated are:
armv2,armv2a,armv3,armv3m,armv4.

I personally think that list is a bit too long for changes.html.

It didn't seem that long and makes a nice checklist.

FWIW I am one of the original RTEMS developers, 25+ years of embedded work,
gcc, etc. and I couldn't have have evaluated the impact of the original
statement easily. Those with more knowledge ARM GCC specifics (like you) gave a
precise detailed answer with what sounds like just a few minutes.

Do you think it would add more clarity for people who are not familiar with the
situation?

Absolutely. That's an authoritative list. From that list, anyone can grep their
build system to see which boards and configurations would be impacted.

And honestly, when I saw the initial statement, I was concerned about how many
older ARM RTEMS BSPs would be obsoleted. But seeing the specific list, I don't
think we have any that are impacted.

The extra information just makes it very precise and clear what's going away.

FWIW I am on a standards group and one of the things I repeatedly say is that
if we leave room for someone to ask a question, then they have a chance to get
an answer we didn't intend. So try to avoid letting someone with less knowledge
ask the question. :)

--joel

Thanks,
Kyrill

Gerald

--joel

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Richard Biener

On Mon, Feb 29, 2016 at 2:38 PM, Bernd Schmidt  wrote:
> On 02/27/2016 08:12 PM, Richard Biener wrote:
>>
>>
>>
>> Am Freitag, 26. Februar 2016 schrieb Jeff Law :
>>
>> The other case that came to mind was signal handlers.  What happens
>> if we're using the global register as a scratch, we hit a memory
>> reference that faults and inside the signal handler the original
>> code expects to be able to use the global register the user set up?
>>
>> If that's a valid use scenario, then there's probably all kinds of
>> DF stuff that we'd need to expose.
>>
>> I'd say that's a valid assumption.  Though maybe we want to be able to
>> change semantics with a flag here.
>
>
> A flag seems like overkill, I don't think people are likely to enable that
> ever.
>
> So what's the consensus here, closed wontfix?

I think so, and maybe update documentation to reflect the discussion.

Richard.

>
> Bernd
>
>

Re: [PATCH][ARM] PR target/70008

2016-02-29 Thread Richard Earnshaw (lists)

On 29/02/16 11:21, Michael Collison wrote:
> 
> 
> On 2/29/2016 4:06 AM, Kyrill Tkachov wrote:
>> Hi Michael,
>>
>> On 29/02/16 04:47, Michael Collison wrote:
>>> This patches address PR 70008, where a reverse subtract with carry
>>> instruction can be generated in thumb2 mode. It was tested with no
>>> regressions in arm and thumb modes on the following targets:
>>>
>>> arm-none-linux-gnueabi
>>> arm-none-linux-gnuabihf
>>> armeb-none-linux-gnuabihf
>>> arm-none-eabi
>>>
>>> Okay for trunk?
>>>
>>> 2016-02-28  Michael Collison 
>>>
>>> PR target/70008
>>> * config/arm/arm.md (*subsi3_carryin): Only match pattern if
>>> TARGET_ARM due to 'rsc' instruction alternative.
>>> * config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.
>>>
>>>
>>
>> The *subsi3_carrying pattern has the arch attribute:
>>(set_attr "arch" "*,a")
>>
>> That means that the second alternative that generates the RSC
>> instruction is only enabled
>> for ARM mode. Do you have a testcase where this doesn't happen and
>> this pattern generates
>> the second alternative for Thumb2?
> 
> No I don't have a test case; i noticed the pattern when working on the
> overflow project. I did not realize
> that an attribute could affect the matching of an alternative. I will
> close the bug.
> 
>>
>>
>> Thanks,
>> Kyrill
> 

This is all true, but there is a potential performance issue with this
pattern though, that could lead to sub-optimal code.

The predicate accepts reg-or-int, but in ARM state only simple
'const-ok-for-arm' immediates are permitted by the predicates, and in
thumb code, no immediates are permitted at all.  This could potentially
result in sub-optimal code due to late splitting of the pattern.  It
would be better if the predicate understood these limitations and
restricted immediates accordingly.

R.

[PATCH] Fix PR69983

2016-02-29 Thread Richard Biener


This fixes fallout of my SCEV correctness change where reassoc no longer
sees the ~A + A simplification opportunity due to casts that are in the 
way.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-02-29  Richard Biener  

PR tree-optimization/69994
* tree-ssa-reassoc.c (gimple_nop_conversion_p): New function.
(get_unary_op): Look through nop conversions.
(ops_equal_values_p): New function, look for equality diregarding
nop conversions.
(eliminate_plus_minus_pair): Use ops_equal_values_p
(repropagate_negates): Do not use get_unary_op here.

Index: gcc/tree-ssa-reassoc.c
===
*** gcc/tree-ssa-reassoc.c  (revision 233803)
--- gcc/tree-ssa-reassoc.c  (working copy)
*** is_reassociable_op (gimple *stmt, enum t
*** 605,610 
--- 605,625 
  }
  
  
+ /* Return true if STMT is a nop-conversion.  */
+ 
+ static bool
+ gimple_nop_conversion_p (gimple *stmt)
+ {
+   if (gassign *ass = dyn_cast  (stmt))
+ {
+   if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (ass))
+ && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (ass)),
+   TREE_TYPE (gimple_assign_rhs1 (ass
+   return true;
+ }
+   return false;
+ }
+ 
  /* Given NAME, if NAME is defined by a unary operation OPCODE, return the
 operand of the negate operation.  Otherwise, return NULL.  */
  
*** get_unary_op (tree name, enum tree_code
*** 613,618 
--- 628,638 
  {
gimple *stmt = SSA_NAME_DEF_STMT (name);
  
+   /* Look through nop conversions (sign changes).  */
+   if (gimple_nop_conversion_p (stmt)
+   && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME)
+ stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
+ 
if (!is_gimple_assign (stmt))
  return NULL_TREE;
  
*** get_unary_op (tree name, enum tree_code
*** 621,626 
--- 641,680 
return NULL_TREE;
  }
  
+ /* Return true if OP1 and OP2 have the same value if casted to either type.  
*/
+ 
+ static bool
+ ops_equal_values_p (tree op1, tree op2)
+ {
+   if (op1 == op2)
+ return true;
+ 
+   if (TREE_CODE (op1) == SSA_NAME)
+ {
+   gimple *stmt = SSA_NAME_DEF_STMT (op1);
+   if (gimple_nop_conversion_p (stmt))
+   {
+ op1 = gimple_assign_rhs1 (stmt);
+ if (op1 == op2)
+   return true;
+   }
+ }
+ 
+   if (TREE_CODE (op2) == SSA_NAME)
+ {
+   gimple *stmt = SSA_NAME_DEF_STMT (op2);
+   if (gimple_nop_conversion_p (stmt))
+   {
+ op2 = gimple_assign_rhs1 (stmt);
+ if (op1 == op2)
+   return true;
+   }
+ }
+ 
+   return false;
+ }
+ 
+ 
  /* If CURR and LAST are a pair of ops that OPCODE allows us to
 eliminate through equivalences, do so, remove them from OPS, and
 return true.  Otherwise, return false.  */
*** eliminate_plus_minus_pair (enum tree_cod
*** 731,739 
 && oe->rank >= curr->rank - 1 ;
 i++)
  {
!   if (oe->op == negateop)
{
- 
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Equivalence: ");
--- 785,793 
 && oe->rank >= curr->rank - 1 ;
 i++)
  {
!   if (negateop
! && ops_equal_values_p (oe->op, negateop))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Equivalence: ");
*** eliminate_plus_minus_pair (enum tree_cod
*** 750,756 
  
  return true;
}
!   else if (oe->op == notop)
{
  tree op_type = TREE_TYPE (oe->op);
  
--- 804,811 
  
  return true;
}
!   else if (notop
!  && ops_equal_values_p (oe->op, notop))
{
  tree op_type = TREE_TYPE (oe->op);
  
*** eliminate_plus_minus_pair (enum tree_cod
*** 772,780 
}
  }
  
!   /* CURR->OP is a negate expr in a plus expr: save it for later
!  inspection in repropagate_negates().  */
!   if (negateop != NULL_TREE)
  plus_negates.safe_push (curr->op);
  
return false;
--- 827,836 
}
  }
  
!   /* If CURR->OP is a negate expr without nop conversion in a plus expr:
!  save it for later inspection in repropagate_negates().  */
!   if (negateop != NULL_TREE
!   && gimple_assign_rhs_code (SSA_NAME_DEF_STMT (curr->op)) == NEGATE_EXPR)
  plus_negates.safe_push (curr->op);
  
return false;
*** repropagate_negates (void)
*** 4211,4217 
  if (gimple_assign_rhs2 (user) == negate)
{
  tree rhs1 = gimple_assign_rhs1 (user);
! tree rhs2 = get_unary_op (negate, NEGATE_EXPR);
  gimple_stmt_iterator gsi = gsi_for_stmt (user);
  gimple_assign_set_rhs_with_ops (, MINUS_EXPR, rhs1, rhs2);

[HSA,PATCH] reduce dump output w/o -details flag

2016-02-29 Thread Martin Liška

Hello.

Following patch limits the number of dump information which is printed
to *.hsagen dump file. Patch has been pre-approved by Martin Jambor
and survives regbootstrap on x86_64-linux-gnu.

Installed as r233814.

Thanks,
Martin
>From 30e91f90196fcf4c2180bd907ad0f775611f7135 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 26 Feb 2016 11:16:45 +0100
Subject: [PATCH] HSA: reduce dump output w/o -details flag

gcc/ChangeLog:

2016-02-26  Martin Liska  

	* hsa-gen.c (gen_body_from_gimple): Dump only if TDF_DETAILS
	is presented in dump flags.
	* hsa-regalloc.c (linear_scan_regalloc): Likewise.
	(hsa_regalloc): Likewise.
---
 gcc/hsa-gen.c  | 2 +-
 gcc/hsa-regalloc.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 28e8b6f..370a699 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5488,7 +5488,7 @@ gen_body_from_gimple ()
 	  gen_hsa_phi_from_gimple_phi (gsi_stmt (gsi), hbb);
 }
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "--- Generated SSA form ---\n");
   dump_hsa_cfun (dump_file);
diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index f8e83ecf..037c269 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -606,7 +606,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
 	spill_at_interval (reg, active);
 
   /* Some interesting dumping as we go.  */
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "  reg%d: [%5d, %5d)->",
 		   reg->m_order, reg->m_lr_begin, reg->m_lr_end);
@@ -638,7 +638,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
   BITMAP_FREE (work);
   free (bbs);
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "--- After liveness: ---\n");
   dump_hsa_cfun_regalloc (dump_file);
@@ -703,7 +703,7 @@ hsa_regalloc (void)
 {
   naive_outof_ssa ();
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "--- After out-of-SSA: ---\n");
   dump_hsa_cfun (dump_file);
@@ -711,7 +711,7 @@ hsa_regalloc (void)
 
   regalloc ();
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "--- After register allocation: ---\n");
   dump_hsa_cfun (dump_file);
-- 
2.7.1

Re: Fix PR44281 (bad RA with global regs)


On 02/29/2016 08:18 AM, Richard Biener wrote:

On Mon, Feb 29, 2016 at 2:38 PM, Bernd Schmidt  wrote:

On 02/27/2016 08:12 PM, Richard Biener wrote:




Am Freitag, 26. Februar 2016 schrieb Jeff Law :

 The other case that came to mind was signal handlers.  What happens
 if we're using the global register as a scratch, we hit a memory
 reference that faults and inside the signal handler the original
 code expects to be able to use the global register the user set up?

 If that's a valid use scenario, then there's probably all kinds of
 DF stuff that we'd need to expose.

I'd say that's a valid assumption.  Though maybe we want to be able to
change semantics with a flag here.



A flag seems like overkill, I don't think people are likely to enable that
ever.

So what's the consensus here, closed wontfix?


I think so, and maybe update documentation to reflect the discussion.
Agreed (closed/wontfix).  I think the signal handler case essentially 
kills the ability to use global regs as scratches.


We could create another style declaration for a global-like register 
which has different semantics, but I don't think that's wise.


I'd either add a comment summarizing the discussion or a pointer back to 
the discussion in the archives.


jeff

Re: [PATCH 0/9] S/390 rework shift count handling - v3

2016-02-29 Thread Ulrich Weigand

Andreas Krebbel wrote:

>   S/390: Use enabled attribute overrides to disable alternatives.
>   S/390: Get rid of Y constraint in rotate patterns.
>   S/390: Get rid of Y constraint in left and logical right shift
> patterns.
>   S/390: Get rid of Y constraint in arithmetic right shift patterns.
>   S/390: Get rid of Y constraint in tabort.
>   S/390: Get rid of Y constraint in vector.md.
>   S/390: Use define_subst for the setmem patterns.
>   S/390: Disallow SImode in s390_decompose_address

Except for a few minor comments for the vector.md patch (separate mail),
this all looks now very good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-02-29 Thread Ulrich Weigand

Andreas Krebbel wrote:


> +; vec_set is supposed to *modify* an existing vector so operand 0 is
> +; duplicated as input operand.
> +(define_expand "vec_set"
> +  [(set (match_operand:V0 "register_operand" 
>  "")
> + (unspec:V [(match_operand: 1 "general_operand"   
> "")
> +(match_operand:SI2 "shift_count_or_setmem_operand" 
> "")

This is probably only cosmetic, but should we use nonmemory_operand here
instead of shift_count_or_setmem_operand (just like everywhere else now)?

> +(define_expand "vec_extract"
> +  [(set (match_operand: 0 "nonimmediate_operand" "")
> + (unspec: [(match_operand:V  1 "register_operand" "")
> +(match_operand:SI 2 "shift_count_or_setmem_operand" 
> "")]

Likewise.


> +(define_insn "*vec_set_plus"
> +  [(set (match_operand:V  0 "register_operand" "=v")
> + (unspec:V [(match_operand:   1 "general_operand"   "d")
> +(plus:SI (match_operand:SI 2 "register_operand"  "a")
> + (match_operand:SI 4 "const_int_operand" "n"))
> +(match_operand:V   3 "register_operand"  "0")]
> +   UNSPEC_VEC_SET))]
> +  "TARGET_VX"
> +  "vlvg\t%v0,%1,%4(%2)"
> +  [(set_attr "op_type" "VRS")])

Wouldn't it be better to use %Y4 instead of %4 here?  Or does the middle-end
guarantee that this is never out of range?

> +(define_insn "*vec_extract_plus"
> +  [(set (match_operand:  0 
> "nonimmediate_operand" "=d,QR")
> + (unspec: [(match_operand:V   1 "register_operand"  
> "v, v")
> +(plus:SI (match_operand:SI 2 "nonmemory_operand" 
> "a, I")
> + (match_operand:SI 3 "const_int_operand" 
> "n, I"))]
> +UNSPEC_VEC_EXTRACT))]
> +  "TARGET_VX"
> +  "@
> +   vlgv\t%0,%v1,%3(%2)
> +   vste\t%v1,%0,%2"
> +  [(set_attr "op_type" "VRS,VRX")])

Likewise for %3.  Also, the second alternative seems odd, it matches solely a
PLUS of two CONST_INTs, which is not canonical RTL ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH][AArch64] Set TREE_TARGET_GLOBALS in aarch64_set_current_function when new tree is the default node to recalculate optab availability

2016-02-29 Thread Christophe Lyon

On 29 February 2016 at 15:28, Kyrill Tkachov
 wrote:
> Hi Crhistophe,
>
>
> On 29/02/16 14:10, Christophe Lyon wrote:
>>
>> On 26 February 2016 at 16:51, James Greenhalgh 
>> wrote:
>>>
>>> On Thu, Feb 25, 2016 at 11:04:21AM +, Kyrill Tkachov wrote:

 Hi all,

 Seems like aarch64 is suffering from something similar to PR 69245 as
 well.
 If a target pragma sets the target state to the same as the
 target_option_default_node the node is just a pointer to
 target_option_default_node rather than a distinct identical node. So we
 must
 still restore the target globals even when setting to
 target_option_default_node in order to force the midend to recompute the
 availability of various optabs.

 If we don't do it, we can get in a problem like in the testcase where
 the
 isa_flags are all set correctly, but the optab HAVE_* predicates have
 not
 been recomputed.

 There is also a related issue present when popping/resetting target
 pragmas
 for which I'll send out a patch separately.

 Bootstrapped and tested on aarch64.

 Ok for trunk?
>>>
>>> OK.
>>>
>> Hi Kyrill,
>>
>> Since this patch, I'm seeing:
>>gcc.dg/torture/pr52429.c   -O2 -flto -fno-use-linker-plugin
>> -flto-partition=none  (internal compiler error)
>> on target aarch64-none-linux-gnu
>>
>> The log has:
>> spawn -ignore SIGHUP
>>
>> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/gcc/xgcc
>> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc
>> 3/gcc/
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c
>> -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -flto
>> -fno-use-linker-pl
>> ugin -flto-partition=none -g -ftree-parallelize-loops=4
>> -DSTACK_SIZE=16384 -S -o pr52429.s
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c:
>> In function 'foo':
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c:24:1:
>> internal compiler error: Segmentation fault
>> 0xade075 crash_signal
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:335
>> 0x91f88e record_operand_costs
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1293
>> 0x91fdba scan_one_insn
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1471
>> 0x91fdba process_bb_for_costs
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1592
>> 0x9214e7 find_costs_and_classes
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1699
>> 0x922552 ira_set_pseudo_classes(bool, _IO_FILE*)
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:2239
>> 0x1061ecd alloc_global_sched_pressure_data
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7244
>> 0x1061ecd sched_init()
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7394
>> 0x10679ed haifa_sched_init()
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7406
>> 0xa84fae schedule_insns()
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3504
>> 0xa85864 rest_of_handle_sched
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3717
>> 0xa85864 execute
>>  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3825
>>
>> Don't you see this regression on your side?
>
>
> I've reproduced it just now.
> I had not seen it initially because the test requires a pthread target,
> so it was marked UNSUPPORTED when I tested aarch64-none-elf :(

That's what I noticed too

> But this looks like a latent bug elsewhere.
> I'll try to investigate, can you please open a PR?
Sure, this is PR 70016:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70016

Christophe

> Thanks,
> Kyrill
>
>
>> Thanks,
>>
>> Christophe.
>>
>>
>>
>>> Thanks,
>>> James
>>>
 Thanks,
 Kyrill


 2016-02-25  Kyrylo Tkachov  

  PR target/69245
  * config/aarch64/aarch64.c (aarch64_set_current_function):
 Save/restore
  target globals when switching to target_option_default_node.

 2016-02-25  Kyrylo Tkachov  

  PR target/69245
  * gcc.target/aarch64/pr69245_1.c: New test.
>>>
>>>
>

Re: [PATCH][AArch64] Set TREE_TARGET_GLOBALS in aarch64_set_current_function when new tree is the default node to recalculate optab availability


Hi Crhistophe,

On 29/02/16 14:10, Christophe Lyon wrote:

On 26 February 2016 at 16:51, James Greenhalgh  wrote:

On Thu, Feb 25, 2016 at 11:04:21AM +, Kyrill Tkachov wrote:

Hi all,

Seems like aarch64 is suffering from something similar to PR 69245 as well.
If a target pragma sets the target state to the same as the
target_option_default_node the node is just a pointer to
target_option_default_node rather than a distinct identical node. So we must
still restore the target globals even when setting to
target_option_default_node in order to force the midend to recompute the
availability of various optabs.

If we don't do it, we can get in a problem like in the testcase where the
isa_flags are all set correctly, but the optab HAVE_* predicates have not
been recomputed.

There is also a related issue present when popping/resetting target pragmas
for which I'll send out a patch separately.

Bootstrapped and tested on aarch64.

Ok for trunk?

OK.


Hi Kyrill,

Since this patch, I'm seeing:
   gcc.dg/torture/pr52429.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (internal compiler error)
on target aarch64-none-linux-gnu

The log has:
spawn -ignore SIGHUP
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc
3/gcc/ 
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -flto
-fno-use-linker-pl
ugin -flto-partition=none -g -ftree-parallelize-loops=4
-DSTACK_SIZE=16384 -S -o pr52429.s
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c:
In function 'foo':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c:24:1:
internal compiler error: Segmentation fault
0xade075 crash_signal
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:335
0x91f88e record_operand_costs
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1293
0x91fdba scan_one_insn
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1471
0x91fdba process_bb_for_costs
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1592
0x9214e7 find_costs_and_classes
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1699
0x922552 ira_set_pseudo_classes(bool, _IO_FILE*)
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:2239
0x1061ecd alloc_global_sched_pressure_data
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7244
0x1061ecd sched_init()
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7394
0x10679ed haifa_sched_init()
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7406
0xa84fae schedule_insns()
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3504
0xa85864 rest_of_handle_sched
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3717
0xa85864 execute
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3825

Don't you see this regression on your side?


I've reproduced it just now.
I had not seen it initially because the test requires a pthread target,
so it was marked UNSUPPORTED when I tested aarch64-none-elf :(
But this looks like a latent bug elsewhere.
I'll try to investigate, can you please open a PR?

Thanks,
Kyrill


Thanks,

Christophe.




Thanks,
James


Thanks,
Kyrill


2016-02-25  Kyrylo Tkachov  

 PR target/69245
 * config/aarch64/aarch64.c (aarch64_set_current_function): Save/restore
 target globals when switching to target_option_default_node.

2016-02-25  Kyrylo Tkachov  

 PR target/69245
 * gcc.target/aarch64/pr69245_1.c: New test.

C++ PATCH for c++/69995 (wrong value with C++14 constexpr)

2016-02-29 Thread Jason Merrill

The bug here was that we were sharing the CONSTRUCTOR between the value 
of 'a' and the elements of 'result', so changing 'a' also changed the 
value of result[0].  Oops.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit 20c203ae124fd4fb2975eb3fb0c50ce7ade35e69
Author: Jason Merrill 
Date:   Sun Feb 28 23:50:41 2016 -0500

	PR c++/69995
	* constexpr.c (cxx_eval_store_expression): Unshare init.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 8d9168c..5e35940 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2925,6 +2925,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree t,
 
   init = cxx_eval_constant_expression (_ctx, init, false,
    non_constant_p, overflow_p);
+  /* Don't share a CONSTRUCTOR that might be changed later.  */
+  init = unshare_expr (init);
   if (target == object)
 {
   /* The hash table might have moved since the get earlier.  */
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-array3.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-array3.C
new file mode 100644
index 000..8cea41a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-array3.C
@@ -0,0 +1,43 @@
+// PR c++/69995
+// { dg-do compile { target c++14 } }
+
+#define assert(X) static_assert((X),#X)
+
+#define CONSTEXPR constexpr
+
+template 
+struct array {
+T elems_[Size];
+
+constexpr T const& operator[](unsigned long n) const
+{ return elems_[n]; }
+
+constexpr T& operator[](unsigned long n)
+{ return elems_[n]; }
+};
+
+template 
+CONSTEXPR void my_swap(T& a, T& b) {
+T tmp = a;
+a = b;
+b = tmp;
+}
+
+CONSTEXPR auto rotate2() {
+array, 2> result{};
+array a{{0, 1}};
+
+result[0] = a;
+my_swap(a[0], a[1]);
+result[1] = a;
+
+return result;
+}
+
+int main() {
+CONSTEXPR auto indices = rotate2();
+assert(indices[0][0] == 0);
+assert(indices[0][1] == 1);
+assert(indices[1][0] == 1);
+assert(indices[1][1] == 0);
+}

Re: [PATCH][AArch64] Set TREE_TARGET_GLOBALS in aarch64_set_current_function when new tree is the default node to recalculate optab availability

2016-02-29 Thread Christophe Lyon

On 26 February 2016 at 16:51, James Greenhalgh  wrote:
> On Thu, Feb 25, 2016 at 11:04:21AM +, Kyrill Tkachov wrote:
>> Hi all,
>>
>> Seems like aarch64 is suffering from something similar to PR 69245 as well.
>> If a target pragma sets the target state to the same as the
>> target_option_default_node the node is just a pointer to
>> target_option_default_node rather than a distinct identical node. So we must
>> still restore the target globals even when setting to
>> target_option_default_node in order to force the midend to recompute the
>> availability of various optabs.
>>
>> If we don't do it, we can get in a problem like in the testcase where the
>> isa_flags are all set correctly, but the optab HAVE_* predicates have not
>> been recomputed.
>>
>> There is also a related issue present when popping/resetting target pragmas
>> for which I'll send out a patch separately.
>>
>> Bootstrapped and tested on aarch64.
>>
>> Ok for trunk?
>
> OK.
>

Hi Kyrill,

Since this patch, I'm seeing:
  gcc.dg/torture/pr52429.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (internal compiler error)
on target aarch64-none-linux-gnu

The log has:
spawn -ignore SIGHUP
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc
3/gcc/ 
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -flto
-fno-use-linker-pl
ugin -flto-partition=none -g -ftree-parallelize-loops=4
-DSTACK_SIZE=16384 -S -o pr52429.s
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c:
In function 'foo':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/torture/pr52429.c:24:1:
internal compiler error: Segmentation fault
0xade075 crash_signal
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:335
0x91f88e record_operand_costs
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1293
0x91fdba scan_one_insn
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1471
0x91fdba process_bb_for_costs
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1592
0x9214e7 find_costs_and_classes
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:1699
0x922552 ira_set_pseudo_classes(bool, _IO_FILE*)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/ira-costs.c:2239
0x1061ecd alloc_global_sched_pressure_data
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7244
0x1061ecd sched_init()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7394
0x10679ed haifa_sched_init()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:7406
0xa84fae schedule_insns()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3504
0xa85864 rest_of_handle_sched
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3717
0xa85864 execute
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3825

Don't you see this regression on your side?

Thanks,

Christophe.



> Thanks,
> James
>
>> Thanks,
>> Kyrill
>>
>>
>> 2016-02-25  Kyrylo Tkachov  
>>
>> PR target/69245
>> * config/aarch64/aarch64.c (aarch64_set_current_function): Save/restore
>> target globals when switching to target_option_default_node.
>>
>> 2016-02-25  Kyrylo Tkachov  
>>
>> PR target/69245
>> * gcc.target/aarch64/pr69245_1.c: New test.
>
>

Re: [PATCH PR69652, Regression]

2016-02-29 Thread Jakub Jelinek

On Mon, Feb 29, 2016 at 05:03:38PM +0300, Yuri Rumyantsev wrote:
> 2016-02-29  Yuri Rumyantsev  
> 
> PR tree-optimization/69652
> gcc/testsuite/ChangeLog:
> * gcc.dg/torture/pr69652.c: Delete test.
> * gcc.dg/vect/pr69652.c: New test.

Ok, with:
/* { dg-additional-options "-mavx -ffast-math" { target { i?86-*-* x86_64-*-* } 
} } */
changed to:
/* { dg-additional-options "-ffast-math" } */
/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
(no reason not to use it in all targets), and if you verify the
test fails if you revert the compiler fix and passes otherwise.

Jakub

Re: [PATCH PR69652, Regression]

2016-02-29 Thread Yuri Rumyantsev

Jacub!

Here is patch and ChangeLog to move pr69652.c to /vect directory.

Is it OK for trunk.

Thanks.
Yuri.

ChangeLog:
2016-02-29  Yuri Rumyantsev  

PR tree-optimization/69652
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr69652.c: Delete test.
* gcc.dg/vect/pr69652.c: New test.




2016-02-29 16:05 GMT+03:00 Jakub Jelinek :
> On Mon, Feb 29, 2016 at 05:01:52AM -0800, H.J. Lu wrote:
>> On Mon, Feb 29, 2016 at 3:53 AM, Yuri Rumyantsev  wrote:
>> > This test simply checks that ICE is not occurred but not any
>> > vectorization issues.
>>
>> Can we remove
>>
>>  /* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
>>
>> then?
>
> Well, I bet -ffast-math -ftree-vectorize are needed to reproduce the ICE
> with broken compiler.  But, e.g. -O0 -ftree-vectorize doesn't make much
> sense to test.
> So, either put it into gcc.dg/pr69652.c with the above mentioned options,
> or into gcc.dg/vect/ with dg-additional-options "-ffast-math".
>
> Jakub


patch.2
Description: Binary data

Re: Fix PR44281 (bad RA with global regs)

2016-02-29 Thread Bernd Schmidt


On 02/27/2016 08:12 PM, Richard Biener wrote:



Am Freitag, 26. Februar 2016 schrieb Jeff Law :

The other case that came to mind was signal handlers.  What happens
if we're using the global register as a scratch, we hit a memory
reference that faults and inside the signal handler the original
code expects to be able to use the global register the user set up?

If that's a valid use scenario, then there's probably all kinds of
DF stuff that we'd need to expose.

I'd say that's a valid assumption.  Though maybe we want to be able to
change semantics with a flag here.


A flag seems like overkill, I don't think people are likely to enable 
that ever.


So what's the consensus here, closed wontfix?


Bernd

Re: [PATCH 1/9] gensupport: Fix define_subst operand renumbering.

2016-02-29 Thread Bernd Schmidt


On 02/29/2016 09:46 AM, Andreas Krebbel wrote:

Ok for mainline?

* gensupport.c (process_substs_on_one_elem): Split loop to
complete mark_operands_used_in_match_dup on all expressions in the
vector first.
(adjust_operands_numbers): Inline into process_substs_on_one_elem
and remove function.


Didn't I approve this a while ago? Not sure it's appropriate for stage4 
though; is this series fixing an important regression?



Bernd

Re: [PATCH] Fix PR69760

2016-02-29 Thread Dominik Vogt

On Wed, Feb 24, 2016 at 03:49:09PM +0100, Richard Biener wrote:
> 2016-02-24  Richard Biener  
>   Jakub Jelinek  
> 
>   PR middle-end/69760
>   * tree-scalar-evolution.c (interpret_rhs_expr): Re-write
>   conditionally executed ops to well-defined overflow behavior.
> 
>   * gcc.dg/torture/pr69760.c: New testcase.

This causes a regression on s390x w/ reassoc_6.f.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69760 for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH PR69652, Regression]

2016-02-29 Thread Jakub Jelinek

On Mon, Feb 29, 2016 at 05:01:52AM -0800, H.J. Lu wrote:
> On Mon, Feb 29, 2016 at 3:53 AM, Yuri Rumyantsev  wrote:
> > This test simply checks that ICE is not occurred but not any
> > vectorization issues.
> 
> Can we remove
> 
>  /* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
> 
> then?

Well, I bet -ffast-math -ftree-vectorize are needed to reproduce the ICE
with broken compiler.  But, e.g. -O0 -ftree-vectorize doesn't make much
sense to test.
So, either put it into gcc.dg/pr69652.c with the above mentioned options,
or into gcc.dg/vect/ with dg-additional-options "-ffast-math".

Jakub

Re: [PATCH PR69652, Regression]

2016-02-29 Thread H.J. Lu

On Mon, Feb 29, 2016 at 3:53 AM, Yuri Rumyantsev  wrote:
> This test simply checks that ICE is not occurred but not any
> vectorization issues.

Can we remove

 /* { dg-options "-O2 -ffast-math -ftree-vectorize " } */

then?

H.J.
> Best regards.
> Yuri.
>
> 2016-02-28 20:29 GMT+03:00 H.J. Lu :
>> On Wed, Feb 10, 2016 at 2:26 AM, Yuri Rumyantsev  wrote:
>>> Thanks Richard for your comments.
>>> I changes algorithm to remove dead scalar statements as you proposed.
>>>
>>> Bootstrap and regression testing did not show any new failures on x86-64.
>>> Is it OK for trunk?
>>>
>>> Changelog:
>>> 2016-02-10  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/69652
>>> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
>>> to nested loop, did source re-formatting, skip debug statements,
>>> add check on statement with volatile operand, remove dead scalar
>>> statements.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/torture/pr69652.c: New test.
>>>
>>>
>>
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
>> ^^^
>>
>> Any particular reason why it should be in gcc.dg/torture, not in
>> gcc.dg/vect? -O2 here is overridden by -Ox.
>>
>> /* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
>>
>>
>>
>> --
>> H.J.



-- 
H.J.

Re: [PATCH][ARM][2/4] Fix operand costing logic for SMUL[TB][TB]



On 04/02/16 09:00, Ramana Radhakrishnan wrote:

On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
 wrote:

Hi all,

As part of investigating the codegen effects of a fix for PR 65932 I found
we assign
too high a cost for the sign-extending multiply instruction SMULBB.
This is because we add the cost of a multiply-extend but then also recurse
into the
SIGN_EXTEND sub-expressions rather than the registers (or subregs) being
sign-extended.

This patch is a simple fix. The fix is right by itself, but in combination
with patch 3
fix the gcc.target/arm/wmul-2.c testcase.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?


OK.


Is it ok to backport this to the GCC 5 branch?
It fixes a testcase with cortex-a5 tuning and was tested by Christophe:
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01238.html

Thanks,
Kyrill


Thanks,
Ramana

Thanks,
Kyrill

2016-01-22  Kyrylo Tkachov  

 * config/arm/arm.c (arm_new_rtx_costs, MULT case): Properly extract
 the operands of the SIGN_EXTENDs from a SMUL[TB][TB] rtx.

Re: [PATCH PR69652, Regression]

2016-02-29 Thread Yuri Rumyantsev

This test simply checks that ICE is not occurred but not any
vectorization issues.

Best regards.
Yuri.

2016-02-28 20:29 GMT+03:00 H.J. Lu :
> On Wed, Feb 10, 2016 at 2:26 AM, Yuri Rumyantsev  wrote:
>> Thanks Richard for your comments.
>> I changes algorithm to remove dead scalar statements as you proposed.
>>
>> Bootstrap and regression testing did not show any new failures on x86-64.
>> Is it OK for trunk?
>>
>> Changelog:
>> 2016-02-10  Yuri Rumyantsev  
>>
>> PR tree-optimization/69652
>> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
>> to nested loop, did source re-formatting, skip debug statements,
>> add check on statement with volatile operand, remove dead scalar
>> statements.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/torture/pr69652.c: New test.
>>
>>
>
> /* { dg-do compile } */
> /* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
> ^^^
>
> Any particular reason why it should be in gcc.dg/torture, not in
> gcc.dg/vect? -O2 here is overridden by -Ox.
>
> /* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
>
>
>
> --
> H.J.

[PATCH PR69942] Fix test problem

2016-02-29 Thread Yuri Rumyantsev

Hi All,

Here is a simple patch for gcc.dg/ifcvt5.c test - detect "6 basic
blocks" string in rtl dump also to accept speculative motion of
else-part of if-stmt before test-part aka IF-CASE-2.

Is it OK for trunk?

ChanageLog:
2016-02-29  Yuri Rumyantsev  

PR rtl-optimization/69942
gcc/testsuite/ChangeLog:
* gcc.dg/ifcvt5.c: Detect '6 basic blocks' in rtl dump also.


patch
Description: Binary data

Re: [WWWDocs] Deprecate support for non-thumb ARM devices

On 28/02/16 21:34, Joel Sherrill wrote:

On February 28, 2016 3:20:24 PM CST, Gerald Pfeifer wrote:

On Wed, 24 Feb 2016, Richard Earnshaw (lists) wrote:

I propose to commit this patch later this week.

I am wondering whether this may be confusing for those not
intricately familiar with the older history of ARM platforms.

ARMv8 is pretty new, googling for it has
http://www.arm.com/products/processors/armv8-architecture.php
as first hit, for example, and the only difference versus ARM8
is that little lower-case "v".

I assume this means a number of values for the various -mXXX arguments will be
removed. Would it be more helpful to list those values?

I have to agree with Gerald. I think this will obsolete a few older RTEMS BSPs
but based on that wording, I don't know which.

ARM8 is a processor, whereas ARMv8-A is an architecture.
I think Richard's link earlier in the thread:

https://community.arm.com/groups/processors/blog/2011/11/02/arm-fundamentals-introduction-to-understanding-arm-processors

gives a good explanation of the naming schemes.
The -mcpu/-mtune arguments that would be deprecated can be found by looking at
the
file config/arm/arm-cores.def and finding all the ARM_CORE entries that have
'4' or lower in their
4th field These would be:
arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,
arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,
strongarm1100,strongarm1110,fa526,fa626.

The arguments to -march that would be deprecated are:
armv2,armv2a,armv3,armv3m,armv4.

I personally think that list is a bit too long for changes.html.
Do you think it would add more clarity for people who are not familiar with the
situation?

Thanks,
Kyrill

Gerald

--joel

Re: [C PATCH] Fix C error-recovery (PR c/69796, PR c/69974)

2016-02-29 Thread Marek Polacek

On Fri, Feb 26, 2016 at 02:45:38PM -0700, Jeff Law wrote:
> This one leaves the type incomplete, right?  So ISTM it's somewhat more
> likely than the second to expose other errors later with code that doesn't
> expect the type to be incomplete (much like other code doesn't expect to
> find error_mark_node in here).
> 
> The second patch at least puts a real type in there.  I suspect that's less
> likely to cause problems downstream, except perhaps with diagnostics.
> 
> I could argue for either.  I almost asked for the latter to be tested, but
> the more I think about it, I don't like slamming in another type like that.
> 
> I'll conditionally approve -- if nobody objects in 72hrs, consider the first
> patch OK for the trunk.

I'm leaning towards the first patch, i.e. the one without setting the type to
char_type_node.  If it causes some issues (I hope not), we'll probably have to
add some COMPLETE_TYPE_P checks.

Marek

Re: [ARM] Add support for overflow add, sub, and neg operations

2016-02-29 Thread Michael Collison




On 2/29/2016 4:13 AM, Kyrill Tkachov wrote:


On 26/02/16 10:32, Michael Collison wrote:



On 02/25/2016 02:51 AM, Kyrill Tkachov wrote:

Hi Michael,

On 24/02/16 23:02, Michael Collison wrote:
This patch adds support for builtin overflow of add, subtract and 
negate. This patch is targeted for gcc 7 stage 1. It was tested 
with no regressions in arm and thumb modes on the following targets:


arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi



I'll have a deeper look once we're closer to GCC 7 development.
I've got a few comments in the meantime.


2016-02-24 Michael Collison 

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New 
patterns

to support builtin overflow negate operations.




Can you please summarise what sequences are generated for these 
operations, and how

they are better than the default fallback sequences.


Sure for a simple test case such as:

int
fn3 (int x, int y, int *ovf)
{
  int res;
  *ovf = __builtin_sadd_overflow (x, y, );
  return res;
}

Current trunk at -O2 generates

fn3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
mov r3, #0
add r1, r0, r1
blt .L4
cmp r1, r0
blt .L3
.L2:
str r3, [r2]
mov r0, r1
bx  lr
.L4:
cmp r1, r0
ble .L2
.L3:
mov r3, #1
b   .L2

With the overflow patch this now generates:

   addsr0, r0, r1
   movvs   r3, #1
   movvc   r3, #0
   str r3, [r2]
   bx  lr



Thanks! That looks much better.

Also, we'd need tests for each of these overflow operations, since 
these are pretty complex

patterns that are being added.


The patterns are tested now most notably by tests in:

c-c++-common/torture/builtin-arith-overflow*.c

I had a few failures I resolved so the builtin overflow arithmetic 
functions are definitely being exercised.


Great, that gives me more confidence on the correctness aspects but...


Not so fast. I went back and changed the TARGET_ARM conditions to 
TARGET_32BIT. When I did this some of the
test cases fail in thumb2 mode. I was a little surprised by this result 
since I generate the same rtl in both modes in almost

all cases. I am investigating.




Also, you may want to consider splitting this into a patch series, 
each adding a single
overflow operation, together with its tests. That way it will be 
easier to keep track of
which pattern applies to which use case and they can go in 
independently of each other.


Let me know if you still fell the same way given the existing test 
cases.




... I'd like us to still have scan-assembler tests. The torture tests 
exercise the correctness,
but we'd want tests to catch regressions where we stop generating the 
new patterns due to other

optimisation changes, which would lead to code quality regressions.
So I'd like us to have scan-assembler tests for these sequences to 
make sure we generate the right

instructions.

I will definitely write some scan-assembler tests. Thanks for the feedback.



Thanks,
Kyrill



+(define_expand "uaddv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareC (operands[0], operands[1], 
operands[2]));

+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), 
const0_rtx);

+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+

I notice this and many other patterns in this patch are guarded on 
TARGET_ARM. Is there any reason why they

should be restricted to arm state and not be TARGET_32BIT ?
I thought about this as well. I will test will TARGET_32BIT and get 
back to you.



Thanks,
Kyrill

Re: [PATCH][ARM] PR target/70008

2016-02-29 Thread Michael Collison




On 2/29/2016 4:06 AM, Kyrill Tkachov wrote:

Hi Michael,

On 29/02/16 04:47, Michael Collison wrote:
This patches address PR 70008, where a reverse subtract with carry 
instruction can be generated in thumb2 mode. It was tested with no 
regressions in arm and thumb modes on the following targets:


arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-02-28  Michael Collison 

PR target/70008
* config/arm/arm.md (*subsi3_carryin): Only match pattern if
TARGET_ARM due to 'rsc' instruction alternative.
* config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.




The *subsi3_carrying pattern has the arch attribute:
   (set_attr "arch" "*,a")

That means that the second alternative that generates the RSC 
instruction is only enabled
for ARM mode. Do you have a testcase where this doesn't happen and 
this pattern generates

the second alternative for Thumb2?


No I don't have a test case; i noticed the pattern when working on the 
overflow project. I did not realize
that an attribute could affect the matching of an alternative. I will 
close the bug.





Thanks,
Kyrill

Re: [SPARC] Fix PR target/69706

2016-02-29 Thread Uros Bizjak

Hello!

> This is both an ICE and an ABI bug dating back to the implementation of the
> 64-bit calling conventions in 1998: for structures larger than 8 bytes and not
> larger than 16 bytes containing a FP field in the second half and passed in
> slot #15 of the parameter array, the compiler passes the FP field in the %f32
> register instead of on the stack; this results in an ICE for a 'float' and an
> ABI bug for a 'double'.

...

+/* Number of words (partially) occupied for a given size in units.  */
+#define NWORDS_UP(SIZE) (((SIZE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)

-#define ROUND_ADVANCE(SIZE) (((SIZE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)

You can use CEIL macro from system.h here.

Uros.

Re: [ARM] Add support for overflow add, sub, and neg operations



On 26/02/16 10:32, Michael Collison wrote:



On 02/25/2016 02:51 AM, Kyrill Tkachov wrote:

Hi Michael,

On 24/02/16 23:02, Michael Collison wrote:

This patch adds support for builtin overflow of add, subtract and negate. This 
patch is targeted for gcc 7 stage 1. It was tested with no regressions in arm 
and thumb modes on the following targets:

arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi



I'll have a deeper look once we're closer to GCC 7 development.
I've got a few comments in the meantime.


2016-02-24 Michael Collison 

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New patterns
to support builtin overflow negate operations.




Can you please summarise what sequences are generated for these operations, and 
how
they are better than the default fallback sequences.


Sure for a simple test case such as:

int
fn3 (int x, int y, int *ovf)
{
  int res;
  *ovf = __builtin_sadd_overflow (x, y, );
  return res;
}

Current trunk at -O2 generates

fn3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
mov r3, #0
add r1, r0, r1
blt .L4
cmp r1, r0
blt .L3
.L2:
str r3, [r2]
mov r0, r1
bx  lr
.L4:
cmp r1, r0
ble .L2
.L3:
mov r3, #1
b   .L2

With the overflow patch this now generates:

   addsr0, r0, r1
   movvs   r3, #1
   movvc   r3, #0
   str r3, [r2]
   bx  lr



Thanks! That looks much better.


Also, we'd need tests for each of these overflow operations, since these are 
pretty complex
patterns that are being added.


The patterns are tested now most notably by tests in:

c-c++-common/torture/builtin-arith-overflow*.c

I had a few failures I resolved so the builtin overflow arithmetic functions 
are definitely being exercised.


Great, that gives me more confidence on the correctness aspects but...



Also, you may want to consider splitting this into a patch series, each adding 
a single
overflow operation, together with its tests. That way it will be easier to keep 
track of
which pattern applies to which use case and they can go in independently of 
each other.


Let me know if you still fell the same way given the existing test cases.



... I'd like us to still have scan-assembler tests. The torture tests exercise 
the correctness,
but we'd want tests to catch regressions where we stop generating the new 
patterns due to other
optimisation changes, which would lead to code quality regressions.
So I'd like us to have scan-assembler tests for these sequences to make sure we 
generate the right
instructions.

Thanks,
Kyrill



+(define_expand "uaddv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareC (operands[0], operands[1], operands[2]));
+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), const0_rtx);
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+

I notice this and many other patterns in this patch are guarded on TARGET_ARM. 
Is there any reason why they
should be restricted to arm state and not be TARGET_32BIT ?

I thought about this as well. I will test will TARGET_32BIT and get back to you.



Thanks,
Kyrill

Re: [PATCH][ARM] PR target/70008


Hi Michael,

On 29/02/16 04:47, Michael Collison wrote:

This patches address PR 70008, where a reverse subtract with carry instruction 
can be generated in thumb2 mode. It was tested with no regressions in arm and 
thumb modes on the following targets:

arm-none-linux-gnueabi
arm-none-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-none-eabi

Okay for trunk?

2016-02-28  Michael Collison  

PR target/70008
* config/arm/arm.md (*subsi3_carryin): Only match pattern if
TARGET_ARM due to 'rsc' instruction alternative.
* config/arm/thumb2.md (*thumb2_subsi3_carryin): New pattern.




The *subsi3_carrying pattern has the arch attribute:
   (set_attr "arch" "*,a")

That means that the second alternative that generates the RSC instruction is 
only enabled
for ARM mode. Do you have a testcase where this doesn't happen and this pattern 
generates
the second alternative for Thumb2?

Thanks,
Kyrill

Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-02-29 Thread kugan



Err.  I think the way you implement that in reassoc is ad-hoc and not
related to reassoc at all.

In fact what reassoc is missing is to handle

  -y * z * (-w) * x -> y * x * w * x

thus optimize negates as if they were additional * -1 entries in a
multiplication chain.  And
then optimize a single remaining * -1 in the result chain to a negate.

Then match.pd handles x + (-y) -> x - y (independent of -frounding-math btw).

So no, this isn't ok as-is, IMHO you want to expand the multiplication ops chain
pulling in the * -1 ops (if single-use, of course).



I agree. Here is the updated patch along what you suggested. Does this 
look better ?


Thanks,
Kugan
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 17eb64f..bbb5ffb 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4674,6 +4674,41 @@ attempt_builtin_powi (gimple *stmt, vec 
*ops)
   return result;
 }
 
+/* Factor out NEGATE_EXPR from the multiplication operands.  */
+static void
+factor_out_negate_expr (gimple_stmt_iterator *gsi,
+   gimple *stmt, vec *ops)
+{
+  operand_entry *oe;
+  unsigned int i;
+  int neg_count = 0;
+
+  FOR_EACH_VEC_ELT (*ops, i, oe)
+{
+  if (TREE_CODE (oe->op) != SSA_NAME
+ || !has_single_use (oe->op))
+   continue;
+  gimple *def_stmt = SSA_NAME_DEF_STMT (oe->op);
+  if (!is_gimple_assign (def_stmt)
+ || gimple_assign_rhs_code (def_stmt) != NEGATE_EXPR)
+   continue;
+  oe->op = gimple_assign_rhs1 (def_stmt);
+  neg_count ++;
+}
+
+  if (neg_count % 2)
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  tree tmp = make_temp_ssa_name (TREE_TYPE (lhs), NULL, "reassocneg");
+  gimple_set_lhs (stmt, tmp);
+  gassign *neg_stmt = gimple_build_assign (lhs, NEGATE_EXPR,
+  tmp);
+  gimple_set_location (neg_stmt, gimple_location (stmt));
+  gimple_set_uid (neg_stmt, gimple_uid (stmt));
+  gsi_insert_after (gsi, neg_stmt, GSI_SAME_STMT);
+}
+}
+
 /* Attempt to optimize
CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0, or
CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0.  */
@@ -4917,6 +4952,12 @@ reassociate_bb (basic_block bb)
  if (rhs_code == MULT_EXPR)
attempt_builtin_copysign ();
 
+ if (rhs_code == MULT_EXPR)
+   {
+ factor_out_negate_expr (, stmt, );
+ ops.qsort (sort_by_operand_rank);
+   }
+
  if (reassoc_insert_powi_p
  && rhs_code == MULT_EXPR
  && flag_unsafe_math_optimizations)

Re: [Ping^2][PATCH][GCC-5] Fix "#pragma GCC pop_options" warning.

2016-02-29 Thread Andre Vieira (lists)


On 15/02/16 10:33, Andre Vieira (lists) wrote:

On 18/01/16 11:04, Andre Vieira (lists) wrote:

Hi there,

Can we have the "#pragma GCC pop_options" fix backported to GCC-5?

Patch found in https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01261.html
and was committed in r228794.

The same patch applies cleanly to gcc-5, which would otherwise not be
able to use this pragma even though the support is there.

Cheers,
Andre



Ping.


Ping.

Re: [PATCH][AArch64] Remove an unused reload hook.

2016-02-29 Thread Matthew Wahab


On 25/02/16 11:00, Yvan Roux wrote:

Hi,

On 26 January 2015 at 18:01, Matthew Wahab  wrote:

Hello,

The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
Aarch64 backend no longer supports reload, this macro is not needed and this
patch removes it.

Tested aarch64-none-linux-gnu with gcc-check. No new failures.

Ok for trunk?
Matthew

gcc/
2015-01-26  Matthew Wahab  

 * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
 * config/aarch64/arch64-protos.h
 (aarch64_legitimize_reload_address): Remove.
 * config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
 Remove.


It seems that this old patch was forgotten, I guess that it'll have to
wait for GCC 7 now, but I think it's a good thing top cleanup the
reload specific hooks and constructions (I've another patch on for
that under on-going).



Thanks for spotting this. I'll take of it when stage 1 opens.
Matthew

[wwwdocs] Document 3 changes in GCC 6

Hi,

this documents the following changes:
 - new scalar_storage_order type attribute in C,
 - ABI change for SPARC 64-bit,
 - automatic enabling of -mstackrealign with SSE for Windows 32-bit.

OK to apply?

-- 
Eric BotcazouIndex: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.64
diff -u -r1.64 changes.html
--- htdocs/gcc-6/changes.html	26 Feb 2016 14:51:16 -	1.64
+++ htdocs/gcc-6/changes.html	29 Feb 2016 10:34:04 -
@@ -78,7 +78,7 @@
 	  with char in all cases because it is an array while
 	  char is scalar.
 	  INTEGER(KIND=C_SIGNED_CHAR) should be used instead.
-	  In general, this inter-operability can not be implemented, for
+	  In general, this inter-operability cannot be implemented, for
 	  example, on targets where function passing conventions of arrays
 	  differs from scalars.
   More type information is now preserved at link time reducing
@@ -223,6 +223,10 @@
 	a structure or a union with side effects is being overridden when
 	using designated initializers via a new warning option
 	-Woverride-init-side-effects.
+   A new type attribute scalar_storage_order applying to
+   structures and unions has been introduced.  It makes it possible
+   to specify the storage order (aka endianness) in memory of scalar
+   fields in the structures or unions.
   
 
 C++
@@ -605,6 +609,18 @@
 configure option.
   
 
+SPARC
+  
+An ABI bug has been fixed in 64-bit mode. Unfortunately, this change
+will break binary compatibility with earlier releases for code it affects,
+but this should be pretty rare in practice.  The conditions are: a 16-byte
+structure containing a double or a 8-byte vector in the second
+half is passed in slot #15 to a subprogram, for example as 16th parameter
+if the first 15 ones have at most 8 bytes.  The double or
+vector was wrongly passed in floating-point register %d32
+in lieu of on the stack as per the SPARC calling conventions.
+  
+
 
 Operating Systems
 
@@ -637,6 +653,11 @@
 capabilities.
   
 
+Windows
+  
+   The option -mstackrealign is now automatically activated
+   in 32-bit mode whenever the use of SSE instructions is requested.
+

[SPARC] Fix PR target/69706

This is both an ICE and an ABI bug dating back to the implementation of the 
64-bit calling conventions in 1998: for structures larger than 8 bytes and not 
larger than 16 bytes containing a FP field in the second half and passed in 
slot #15 of the parameter array, the compiler passes the FP field in the %f32 
register instead of on the stack; this results in an ICE for a 'float' and an 
ABI bug for a 'double'.

The attached fix implements the detection and the handling of this case and 
shouldn't change anything in the other cases; however, since the existing code 
was suffering from excessive duplication (structure traversal was implemented 
3 times, handling of FP fields twice and handling of integer fields 3 times), 
the code refactors the whole thing and in particular uses a function template 
for the traversal, which is then instantiated 3 times.

Bootstrapped/regtested on SPARC64/Solaris and also compat-regtested (both bugs 
are rare enough as to go unnoticed by the compat testsuite).  I also verified 
manually that GCC is now compatible with Sun CC in this case too.

I'll document the ABI change in changes.html separately.


2016-02-29  Eric Botcazou  <ebotca...@adacore.com>

PR target/69706
* config/sparc/sparc.c (ROUND_ADVANCE): Rename to...
(NWORDS_UP): ...this
(init_cumulative_args): Minor tweaks.
(sparc_promote_function_mode): Likewise.
(scan_record_type): Delete.
(traverse_record_type): New function template.
(classify_data_t): New structure type.
(classify_registers): New inline function.
(function_arg_slotno): In 64-bit mode, bail out early if FP slots are
exhausted.  Instantiate traverse_record_type on classify_registers and
deal with the case of a structure passed in slot #15 with no FP field
in the first word.
(assign_data_t): New structure type.
(compute_int_layout): New static function.
(compute_fp_layout): Likewise.
(count_registers): New inline function.
(assign_int_registers): New static function.
(assign_fp_registers): Likewise.
(assign_registers): New inline function.
(function_arg_record_value_1): Delete.
(function_arg_record_value_2): Likewise.
(function_arg_record_value_3): Likewise.
(function_arg_record_value): Adjust to above changes.  Instantiate
traverse_record_type on count_registers to first count the number of
registers to be used and then on assign_registers to assign them.
(function_arg_union_value): Adjust to above renaming.
(sparc_function_arg_1); Minor tweaks.  Remove commented out code.
(sparc_arg_partial_bytes): Adjust to above renaming.  Deal with the
case of a structure passed in slot #15
(sparc_function_arg_advance): Likewise.
(function_arg_padding): Minor tweak.


2016-02-29  Eric Botcazou  <ebotca...@adacore.com>

* gcc.target/sparc/20160229-1.c: New test.

-- 
Eric Botcazou/* PR target/69706 */
/* Reported by John Paul Adrian Glaubitz <glaub...@physik.fu-berlin.de> */

/* { dg-do run } */
/* { dg-options "-std=gnu99" }
/* { dg-require-effective-target lp64 } */

extern void abort (void);


/* Pass a 12-byte structure partially in slot #15 and on the stack.  */

struct t_rgb { float r, g, b; };

void write_xpm (void *out, unsigned int flags, const char *title, 
	const char *legend, const char *label_x, const char *label_y,
	int n_x, int n_y, float axis_x[], float axis_y[], float *mat[],
	float lo, float hi, struct t_rgb rlo, struct t_rgb rhi)
{
  register float f30 asm ("f30");
  register float f31 asm ("f31");

  if (f30 != 1.0f)
abort ();

  if (f31 != 2.0f)
abort ();

  if (rhi.r != 1.0f)
abort ();

  if (rhi.g != 2.0f)
abort ();

  if (rhi.b != 3.0f)
abort ();
}


/* Pass a 16-byte structure partially in slot #15 and on the stack.  */

struct S1 { _Complex float f1; _Complex float f2; };

void f1 (int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8,
	 int p9, int p10, int p11, int p12, int p13, int p14, int p15,
	 struct S1 s1)
{
  register float f30 asm ("f30");
  register float f31 asm ("f31");

  if (f30 != 4.0f)
abort ();

  if (f31 != 5.0f)
abort ();

  if (__real__ s1.f1 != 4.0f)
abort ();

  if (__imag__ s1.f1 != 5.0f)
abort ();

  if (__real__ s1.f2 != 6.0f)
abort ();

  if (__imag__ s1.f2 != 7.0f)
abort ();
}


/* Pass a 16-byte structure partially in slot #15 and on the stack.  */

struct S2 { double d1; double d2; };

void f2 (int p1, int p2, int p3, int p4, int p5, int p6, int p7, int p8,
	 int p9, int p10, int p11, int p12, int p13, int p14, int p15,
	 struct S2 s2)
{
  register double d30 asm ("f30");

  if (d30 != 1.0)
abort ();

  if (s2.d1 != 1.0)
abort ();

  if (s2.d2 != 2.0)
abort ();
}


/* Pass a 16-by

[Ada] Fix infinite recursion on circular type in with'ed unit

This one is not a regression but the fix is trivial and also enables a nice 
cleanup in the processing of access types: for a circular array type (an array 
type whose component type is a pointer to itself), gigi enters an infinite 
recursion and then eventually crashes.

Tested on x86_64-suse-linux, applied on the mainline.


2016-02-29  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Retrofit
handling of unconstrained array types as designated types into common
processing.  Also handle array types as incomplete designated types.


2016-02-29  Eric Botcazou  

* gnat.dg/incomplete4.adb: New test.
* gnat.dg/incomplete4_pkg.ads: New helper.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 233806)
+++ gcc-interface/decl.c	(working copy)
@@ -3855,8 +3855,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	/* The type actually used to represent the designated type, either
 	   gnat_desig_full or gnat_desig_equiv.  */
 	Entity_Id gnat_desig_rep;
-	/* True if this is a pointer to an unconstrained array.  */
-	bool is_unconstrained_array;
 	/* We want to know if we'll be seeing the freeze node for any
 	   incomplete type we may be pointing to.  */
 	bool in_main_unit
@@ -3890,62 +3888,26 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		&& Ekind (Etype (gnat_desig_full)) == E_Record_Type)))
 	  gnat_desig_full = Etype (gnat_desig_full);
 
-	/* Set the type that's actually the representation of the designated
-	   type and also flag whether we have a unconstrained array.  */
+	/* Set the type that's the representation of the designated type.  */
 	gnat_desig_rep
 	  = Present (gnat_desig_full) ? gnat_desig_full : gnat_desig_equiv;
-	is_unconstrained_array
-	  = Is_Array_Type (gnat_desig_rep) && !Is_Constrained (gnat_desig_rep);
-
-	/* If we are pointing to an incomplete type whose completion is an
-	   unconstrained array, make dummy fat and thin pointer types to it.
-	   Likewise if the type itself is dummy or an unconstrained array.  */
-	if (is_unconstrained_array
-	&& (Present (gnat_desig_full)
-		|| (present_gnu_tree (gnat_desig_equiv)
-		&& TYPE_IS_DUMMY_P
-		   (TREE_TYPE (get_gnu_tree (gnat_desig_equiv
-		|| (!in_main_unit
-		&& defer_incomplete_level != 0
-		&& !present_gnu_tree (gnat_desig_equiv))
-		|| (in_main_unit
-		&& is_from_limited_with
-		&& Present (Freeze_Node (gnat_desig_equiv)
-	  {
-	if (present_gnu_tree (gnat_desig_rep))
-	  gnu_desig_type = TREE_TYPE (get_gnu_tree (gnat_desig_rep));
-	else
-	  {
-		gnu_desig_type = make_dummy_type (gnat_desig_rep);
-		made_dummy = true;
-	  }
-
-	/* If the call above got something that has a pointer, the pointer
-	   is our type.  This could have happened either because the type
-	   was elaborated or because somebody else executed the code.  */
-	if (!TYPE_POINTER_TO (gnu_desig_type))
-	  build_dummy_unc_pointer_types (gnat_desig_equiv, gnu_desig_type);
-	gnu_type = TYPE_POINTER_TO (gnu_desig_type);
-	  }
 
 	/* If we already know what the full type is, use it.  */
-	else if (Present (gnat_desig_full)
-		 && present_gnu_tree (gnat_desig_full))
+	if (Present (gnat_desig_full) && present_gnu_tree (gnat_desig_full))
 	  gnu_desig_type = TREE_TYPE (get_gnu_tree (gnat_desig_full));
 
 	/* Get the type of the thing we are to point to and build a pointer to
 	   it.  If it is a reference to an incomplete or private type with a
-	   full view that is a record, make a dummy type node and get the
-	   actual type later when we have verified it is safe.  */
+	   full view that is a record or an array, make a dummy type node and
+	   get the actual type later when we have verified it is safe.  */
 	else if ((!in_main_unit
 		  && !present_gnu_tree (gnat_desig_equiv)
 		  && Present (gnat_desig_full)
-		  && !present_gnu_tree (gnat_desig_full)
-		  && Is_Record_Type (gnat_desig_full))
+		  && (Is_Record_Type (gnat_desig_full)
+		  || Is_Array_Type (gnat_desig_full)))
 		 /* Likewise if we are pointing to a record or array and we are
 		to defer elaborating incomplete types.  We do this as this
-		access type may be the full view of a private type.  Note
-		that the unconstrained array case is handled above.  */
+		access type may be the full view of a private type.  */
 		 || ((!in_main_unit || imported_p)
 		 && defer_incomplete_level != 0
 		 && !present_gnu_tree (gnat_desig_equiv)
@@ -3958,11 +3920,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		in which case we make the dummy type and it will be reused
 		when the declaration is finally processed.  In both cases,
 		the pointer eventually created below will be automatically
-		adjusted when the freeze node is processed.  Note that the
-		unconstrained array case is handled above.  */
-		 ||

[Ada] Fix crash in ASIS mode on concurrent types

This one is a very recent regression introduced on the mainline in ASIS mode.

Tested on x86_64-suse-linux, applied on the mainline.


2016-02-29  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : In
ASIS mode, fully lay out the minimal record type.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 233804)
+++ gcc-interface/decl.c	(working copy)
@@ -4926,7 +4926,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		  gnu_field_list = gnu_field;
 		}
 
-	  TYPE_FIELDS (gnu_type) = nreverse (gnu_field_list);
+	  finish_record_type (gnu_type, nreverse (gnu_field_list), 0,
+  false);
 	}
 	  else
 	gnu_type = void_type_node;

[Ada] Fix unexpectedly large frame with calls manipulating strings

Another long-standing regression present in the compiler (dating back to the 
Tree-SSA merge): the compiler generates code that has an unexpectedly large 
stack usage for nested calls on strings, because the gimplifier creates 
temporaries in the outermost scope that have overlapping live ranges.

Tested on x86_64-suse-linux, applied on the mainline.


2016-02-29  Eric Botcazou  

* gcc-interface/trans.c (finalize_nrv_r): Remove obsolete code.
(build_return_expr): Likewise.
(Call_to_gnu): If this is a function call and there is no target,
create a temporary for the return value for all aggregate types,
but never create it for a return statement.  Push a binding level
around the call in more cases.  Remove obsolete code.


2016-02-29  Eric Botcazou  

* gnat.dg/stack_usage3.adb: New test.
* gnat.dg/stack_usage3_pkg.ads: New helper.

-- 
Eric Botcazou-- { dg-do compile }
-- { dg-options "-O -fstack-usage" }

with Ada.Text_IO; use Ada.Text_IO;
with Stack_Usage3_Pkg; use Stack_Usage3_Pkg;

procedure Stack_Usage3 is

begin
   Put_Line (Diag ("Diag line 0"));
   Put_Line (Diag ("Diag line 1"));
   Put_Line (Diag ("Diag line 2"));
   Put_Line (Diag ("Diag line 3"));
   Put_Line (Diag ("Diag line 4"));
   Put_Line (Diag ("Diag line 5"));
   Put_Line (Diag ("Diag line 6"));
   Put_Line (Diag ("Diag line 7"));
   Put_Line (Diag ("Diag line 8"));
   Put_Line (Diag ("Diag line 9"));
   Put_Line (Diag ("Diag line 10"));
   Put_Line (Diag ("Diag line 11"));
   Put_Line (Diag ("Diag line 12"));
   Put_Line (Diag ("Diag line 13"));
   Put_Line (Diag ("Diag line 14"));
end;

-- { dg-final { scan-stack-usage "\t\[0-9\]\[0-9\]\t" { target i?86-*-* x86_64-*-* } } }
-- { dg-final { cleanup-stack-usage } }
package Stack_Usage3_Pkg is

   subtype Small_String is String (1..80);

   function Diag (S : String) return Small_String;

end Stack_Usage3_Pkg;
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 233804)
+++ gcc-interface/trans.c	(working copy)
@@ -3330,32 +3330,14 @@ finalize_nrv_r (tree *tp, int *walk_subt
   else if (TREE_CODE (t) == RETURN_EXPR
 	   && TREE_CODE (TREE_OPERAND (t, 0)) == INIT_EXPR)
 {
-  tree ret_val = TREE_OPERAND (TREE_OPERAND (t, 0), 1), init_expr;
-
-  /* If this is the temporary created for a return value with variable
-	 size in Call_to_gnu, we replace the RHS with the init expression.  */
-  if (TREE_CODE (ret_val) == COMPOUND_EXPR
-	  && TREE_CODE (TREE_OPERAND (ret_val, 0)) == INIT_EXPR
-	  && TREE_OPERAND (TREE_OPERAND (ret_val, 0), 0)
-	 == TREE_OPERAND (ret_val, 1))
-	{
-	  init_expr = TREE_OPERAND (TREE_OPERAND (ret_val, 0), 1);
-	  ret_val = TREE_OPERAND (ret_val, 1);
-	}
-  else
-	init_expr = NULL_TREE;
+  tree ret_val = TREE_OPERAND (TREE_OPERAND (t, 0), 1);
 
   /* Strip useless conversions around the return value.  */
   if (gnat_useless_type_conversion (ret_val))
 	ret_val = TREE_OPERAND (ret_val, 0);
 
   if (is_nrv_p (dp->nrv, ret_val))
-	{
-	  if (init_expr)
-	TREE_OPERAND (TREE_OPERAND (t, 0), 1) = init_expr;
-	  else
-	TREE_OPERAND (t, 0) = dp->result;
-	}
+	TREE_OPERAND (t, 0) = dp->result;
 }
 
   /* Replace the DECL_EXPR of NRVs with an initialization of the RESULT_DECL,
@@ -3659,14 +3641,6 @@ build_return_expr (tree ret_obj, tree re
 	  && TYPE_MODE (operation_type) == BLKmode
 	  && aggregate_value_p (operation_type, current_function_decl))
 	{
-	  /* Recognize the temporary created for a return value with variable
-	 size in Call_to_gnu.  We want to eliminate it if possible.  */
-	  if (TREE_CODE (ret_val) == COMPOUND_EXPR
-	  && TREE_CODE (TREE_OPERAND (ret_val, 0)) == INIT_EXPR
-	  && TREE_OPERAND (TREE_OPERAND (ret_val, 0), 0)
-		 == TREE_OPERAND (ret_val, 1))
-	ret_val = TREE_OPERAND (ret_val, 1);
-
 	  /* Strip useless conversions around the return value.  */
 	  if (gnat_useless_type_conversion (ret_val))
 	ret_val = TREE_OPERAND (ret_val, 0);
@@ -4314,14 +4288,22 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
 	  because we need to preserve the return value before copying back the
 	  parameters.
 
-   2. There is no target and this is neither an object nor a renaming
-	  declaration, and the return type has variable size, because in
-	  these cases the gimplifier cannot create the temporary.
+   2. There is no target and the call is made for neither an object nor a
+	  renaming declaration, nor a return statement, and the return type has
+	  variable size, because in this case the gimplifier cannot create the
+	  temporary, or more generally is simply an aggregate type, because the
+	  gimplifier would create the temporary in the outermost scope instead
+	  of locally.
 
3. There is a target and it is a slice or an array with fixed size,
 	  and the return type has variable size, because the

[Ada] Fix spurious error on renaming of component of return value

This is a long-standing regression present in the compiler: it issues an 
unexpected error on the renaming of a component of the return value of a 
function call, when the return type has dynamic size and the renaming is 
declared at library level.

Tested on x86_64-suse-linux, applied on the mainline.


2016-02-29  Eric Botcazou  

* gcc-interface/ada-tree.h (DECL_RETURN_VALUE_P): New macro.
* gcc-interface/gigi.h (gigi): Remove useless attribute.
(gnat_gimplify_expr): Likewise.
(gnat_to_gnu_external): Declare.
* gcc-interface/decl.c (gnat_to_gnu_entity) : Factor out
code dealing with the expression of external constants into...
Invoke gnat_to_gnu_external instead.
: Invoke gnat_to_gnu_external to translate renamed objects
when not for a definition.  Deal with COMPOUND_EXPR and variables with
DECL_RETURN_VALUE_P set for renamings and with the case of a dangling
'reference to a function call in a renaming.  Remove obsolete test and
adjust associated comment.
* gcc-interface/trans.c (Call_to_gnu): Set DECL_RETURN_VALUE_P on the
temporaries created to hold the return value, if any.
(gnat_to_gnu_external): ...this.  New function.
* gcc-interface/utils.c (create_var_decl): Detect a constant created
to hold 'reference to function call.
* gcc-interface/utils2.c (build_unary_op) : Add folding
for COMPOUND_EXPR in the DECL_RETURN_VALUE_P case.


2016-02-29  Eric Botcazou  

* gnat.dg/renaming8.adb: New test.
* gnat.dg/renaming8_pkg1.ads: New helper.
* gnat.dg/renaming8_pkg2.ad[sb]: Likewise.
* gnat.dg/renaming8_pkg3.ad[sb]: Likewise.


-- 
Eric BotcazouIndex: gcc-interface/ada-tree.h
===
--- gcc-interface/ada-tree.h	(revision 233738)
+++ gcc-interface/ada-tree.h	(working copy)
@@ -457,6 +457,10 @@ do {		   \
a discriminant of a discriminated type without default expression.  */
 #define DECL_INVARIANT_P(NODE) DECL_LANG_FLAG_4 (FIELD_DECL_CHECK (NODE))
 
+/* Nonzero in a VAR_DECL if it is a temporary created to hold the return
+   value of a function call or 'reference to a function call.  */
+#define DECL_RETURN_VALUE_P(NODE) DECL_LANG_FLAG_5 (VAR_DECL_CHECK (NODE))
+
 /* In a FIELD_DECL corresponding to a discriminant, contains the
discriminant number.  */
 #define DECL_DISCRIMINANT_NUMBER(NODE) DECL_INITIAL (FIELD_DECL_CHECK (NODE))
Index: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 233738)
+++ gcc-interface/decl.c	(working copy)
@@ -552,31 +552,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  && Present (Expression (Declaration_Node (gnat_entity)))
 	  && Nkind (Expression (Declaration_Node (gnat_entity)))
 	 != N_Allocator)
-	{
-	  bool went_into_elab_proc = false;
-	  int save_force_global = force_global;
-
 	  /* The expression may contain N_Expression_With_Actions nodes and
-	 thus object declarations from other units.  In this case, even
-	 though the expression will eventually be discarded since not a
-	 constant, the declarations would be stuck either in the global
-	 varpool or in the current scope.  Therefore we force the local
-	 context and create a fake scope that we'll zap at the end.  */
-	  if (!current_function_decl)
-	{
-	  current_function_decl = get_elaboration_procedure ();
-	  went_into_elab_proc = true;
-	}
-	  force_global = 0;
-	  gnat_pushlevel ();
-
-	  gnu_expr = gnat_to_gnu (Expression (Declaration_Node (gnat_entity)));
-
-	  gnat_zaplevel ();
-	  force_global = save_force_global;
-	  if (went_into_elab_proc)
-	current_function_decl = NULL_TREE;
-	}
+	 thus object declarations from other units.  Discard them.  */
+	gnu_expr
+	  = gnat_to_gnu_external (Expression (Declaration_Node (gnat_entity)));
 
   /* ... fall through ... */
 
@@ -611,13 +590,17 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	tree renamed_obj = NULL_TREE;
 	tree gnu_object_size;
 
+	/* We need to translate the renamed object even though we are only
+	   referencing the renaming.  But it may contain a call for which
+	   we'll generate a temporary to hold the return value and which
+	   is part of the definition of the renaming, so discard it.  */
 	if (Present (Renamed_Object (gnat_entity)) && !definition)
 	  {
 	if (kind == E_Exception)
 	  gnu_expr = gnat_to_gnu_entity (Renamed_Entity (gnat_entity),
 	 NULL_TREE, 0);
 	else
-	  gnu_expr = gnat_to_gnu (Renamed_Object (gnat_entity));
+	  gnu_expr = gnat_to_gnu_external (Renamed_Object (gnat_entity));
 	  }
 
 	/* Get the type after elaborating the renamed object.  */
@@ -976,14 +959,39 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  inner = TREE_OPERAND (inner, 0);
 	/* Expand_Dispatching_Call can

[PATCH 9/9] S/390: Disallow SImode in s390_decompose_address

After Y is never used anymore with SImode operands we can finally
disallow SImode (if != Pmode) in s390_decompose_address.  In fact that
was the whole point of the patch series.

gcc/ChangeLog:

2016-02-29  Andreas Krebbel  

* config/s390/s390.c (s390_decompose_address): Don't accept SImode
anymore.
---
 gcc/config/s390/s390.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 43219dd..8924367 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2817,9 +2817,7 @@ s390_decompose_address (rtx addr, struct s390_address 
*out)
return false;
  }
 
-  if (!REG_P (base)
- || (GET_MODE (base) != SImode
- && GET_MODE (base) != Pmode))
+  if (!REG_P (base) || GET_MODE (base) != Pmode)
return false;
 
   if (REGNO (base) == STACK_POINTER_REGNUM
@@ -2865,9 +2863,7 @@ s390_decompose_address (rtx addr, struct s390_address 
*out)
return false;
  }
 
-  if (!REG_P (indx)
- || (GET_MODE (indx) != SImode
- && GET_MODE (indx) != Pmode))
+  if (!REG_P (indx) || GET_MODE (indx) != Pmode)
return false;
 
   if (REGNO (indx) == STACK_POINTER_REGNUM
-- 
1.9.1

[PATCH 5/9] S/390: Get rid of Y constraint in arithmetic right shift patterns.

The arithmetic shift patterns set also the condition code.  This adds
more substitution potential.  Depending on whether the actual result
or the CC output will be used 3 different variants of each of these
patterns are needed.  This multiplied with the PLUS and the AND
operands from the earlier substitutions enables a lot of folding.

2016-02-29  Andreas Krebbel  

* config/s390/s390.md ("*ashrdi3_cc_31")
("*ashrdi3_cconly_31""*ashrdi3_cc_31_and")
("*ashrdi3_cconly_31_and", "*ashrdi3_31_and", "*ashrdi3_31"):
Merge insn definitions into ...
("*ashrdi3_31"):
New pattern definition.
("*ashr3_cc", "*ashr3_cconly", "ashr3", )
("*ashr3_cc_and", "*ashr3_cconly_and")
("*ashr3_and"): Merge insn definitions into ...
("*ashr3"):
New pattern definition.
* config/s390/subst.md ("addr_style_op_cc_subst")
("masked_op_cc_subst", "setcc_subst", "cconly_subst"): New
substitutions patterns plus attributes.
Add ashiftrt to SUBST iterator.
---
 gcc/config/s390/s390.md  | 181 ++-
 gcc/config/s390/subst.md |  62 +++-
 2 files changed, 81 insertions(+), 162 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 771d1e9..dd91383 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -8448,181 +8448,40 @@
   [(parallel
 [(set (match_operand:DSI 0 "register_operand" "")
   (ashiftrt:DSI (match_operand:DSI 1 "register_operand" "")
-(match_operand:SI 2 "shift_count_or_setmem_operand" 
"")))
+(match_operand:SI 2 "nonmemory_operand" "")))
  (clobber (reg:CC CC_REGNUM))])]
   ""
   "")
 
-(define_insn "*ashrdi3_cc_31"
-  [(set (reg CC_REGNUM)
-(compare (ashiftrt:DI (match_operand:DI 1 "register_operand" "0")
-  (match_operand:SI 2 
"shift_count_or_setmem_operand" "Y"))
- (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=d")
-(ashiftrt:DI (match_dup 1) (match_dup 2)))]
-  "!TARGET_ZARCH && s390_match_ccmode(insn, CCSmode)"
-  "srda\t%0,%Y2"
-  [(set_attr "op_type"  "RS")
-   (set_attr "atype""reg")])
-
-(define_insn "*ashrdi3_cconly_31"
-  [(set (reg CC_REGNUM)
-(compare (ashiftrt:DI (match_operand:DI 1 "register_operand" "0")
-  (match_operand:SI 2 
"shift_count_or_setmem_operand" "Y"))
- (const_int 0)))
-   (clobber (match_scratch:DI 0 "=d"))]
-  "!TARGET_ZARCH && s390_match_ccmode(insn, CCSmode)"
-  "srda\t%0,%Y2"
-  [(set_attr "op_type"  "RS")
-   (set_attr "atype""reg")])
-
-(define_insn "*ashrdi3_31"
-  [(set (match_operand:DI 0 "register_operand" "=d")
-(ashiftrt:DI (match_operand:DI 1 "register_operand" "0")
- (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))
+; FIXME: The number of alternatives is doubled here to match the fix
+; number of 2 in the subst pattern for the (clobber (match_scratch...
+; The right fix should be to support match_scratch in the output
+; pattern of a define_subst.
+(define_insn "*ashrdi3_31"
+  [(set (match_operand:DI 0 "register_operand"   "=d, d")
+(ashiftrt:DI (match_operand:DI 1 "register_operand"   "0, 0")
+ (match_operand:SI 2 "nonmemory_operand" "an,an")))
(clobber (reg:CC CC_REGNUM))]
   "!TARGET_ZARCH"
-  "srda\t%0,%Y2"
-  [(set_attr "op_type"  "RS")
-   (set_attr "atype""reg")])
-
-; sra, srag, srak
-(define_insn "*ashr3_cc"
-  [(set (reg CC_REGNUM)
-(compare (ashiftrt:GPR (match_operand:GPR 1 "register_operand" 
 ",d")
-   (match_operand:SI 2 
"shift_count_or_setmem_operand" "Y,Y"))
- (const_int 0)))
-   (set (match_operand:GPR 0 "register_operand"
   "=d,d")
-(ashiftrt:GPR (match_dup 1) (match_dup 2)))]
-  "s390_match_ccmode(insn, CCSmode)"
   "@
-   sra\t%0,<1>%Y2
-   sra\t%0,%1,%Y2"
-  [(set_attr "op_type"  "RS,RSY")
-   (set_attr "atype""reg,reg")
-   (set_attr "cpu_facility" "*,z196")
-   (set_attr "z10prop" "z10_super_E1,*")])
+   srda\t%0,
+   srda\t%0,"
+  [(set_attr "op_type" "RS")
+   (set_attr "atype"   "reg")])
 
-; sra, srag, srak
-(define_insn "*ashr3_cconly"
-  [(set (reg CC_REGNUM)
-(compare (ashiftrt:GPR (match_operand:GPR 1 "register_operand" 
 ",d")
-   (match_operand:SI 2 
"shift_count_or_setmem_operand" "Y,Y"))
- (const_int 0)))
-   (clobber (match_scratch:GPR 0   
   "=d,d"))]
-  "s390_match_ccmode(insn, CCSmode)"
-  "@
-   sra\t%0,<1>%Y2
-   sra\t%0,%1,%Y2"
-  [(set_attr "op_type"  "RS,RSY")
-   (set_attr "atype""reg,reg")
-   (set_attr "cpu_facility" "*,z196")
-   (set_attr "z10prop" "z10_super_E1,*")])
 
 ; sra, srag
-(define_insn

[PATCH 8/9] S/390: Use define_subst for the setmem patterns.

While trying to get rid of the Y constraint in the setmem patterns I
noticed that for these patterns it isn't even a problem since these
always only use the constraint with a Pmode match_operand.  But while
being at it I've tried to fold some of the patterns a bit.

gcc/ChangeLog:

2016-02-29  Andreas Krebbel  

* config/s390/constraints.md ("jm8"): New constraint.
* config/s390/predicates.md ("const_int_8bitset_operand"): New 
predicate.
* config/s390/s390.md ("*setmem_long", "*setmem_long_and"): Merge
into ...
("*setmem_long"): New pattern.
("*setmem_long_31z", "*setmem_long_and_31z"): Merge
into ...
("*setmem_long_31z"): New pattern.
* config/s390/subst.md ("setmem_31z_subst", "setmem_and_subst"):
New substitution rules with the required attributes.

---
 gcc/config/s390/constraints.md |  5 +
 gcc/config/s390/predicates.md  |  6 ++
 gcc/config/s390/s390.md| 35 ++-
 gcc/config/s390/subst.md   | 25 +
 4 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md
index 60a7edf..6eeaa98 100644
--- a/gcc/config/s390/constraints.md
+++ b/gcc/config/s390/constraints.md
@@ -37,6 +37,7 @@
 ;; jKK: constant vector with all elements having the same value and
 ;;  matching K constraint
 ;; jm6: An integer operand with the lowest order 6 bits all ones.
+;; jm8: An integer operand with the lowest order 8 bits all ones.
 ;;t -- Access registers 36 and 37.
 ;;v -- Vector registers v0-v31.
 ;;C -- A signed 8-bit constant (-128..127)
@@ -420,6 +421,10 @@
   "@internal An integer operand with the lowest order 6 bits all ones."
   (match_operand 0 "const_int_6bitset_operand"))
 
+(define_constraint "jm8"
+  "@internal An integer operand with the lowest order 8 bits all ones."
+  (match_operand 0 "const_int_8bitset_operand"))
+
 ;;
 ;; Memory constraints follow.
 ;;
diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index fefefb3..fbff24d 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -119,6 +119,12 @@
 (define_predicate "const_int_6bitset_operand"
  (and (match_code "const_int")
   (match_test "(INTVAL (op) & 63) == 63")))
+
+; An integer operand with the lowest order 8 bits all ones.
+(define_predicate "const_int_8bitset_operand"
+ (and (match_code "const_int")
+  (match_test "(INTVAL (op) & 255) == 255")))
+
 (define_predicate "nonzero_shift_count_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 1, GET_MODE_BITSIZE (mode) - 1)")))
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index ca58c42..d085fa1 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3323,7 +3323,7 @@
 
 ; Patterns for 31 bit + Esa and 64 bit + Zarch.
 
-(define_insn "*setmem_long"
+(define_insn "*setmem_long"
   [(clobber (match_operand: 0 "register_operand" "=d"))
(set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
 (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
@@ -3336,26 +3336,10 @@
   [(set_attr "length" "8")
(set_attr "type" "vs")])
 
-(define_insn "*setmem_long_and"
-  [(clobber (match_operand: 0 "register_operand" "=d"))
-   (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
-(unspec:BLK [(and:P
- (match_operand:P 2 "shift_count_or_setmem_operand" "Y")
- (match_operand:P 4 "const_int_operand" "n"))
-   (subreg:P (match_dup 3) )]
-   UNSPEC_REPLICATE_BYTE))
-   (use (match_operand: 1 "register_operand" "d"))
-   (clobber (reg:CC CC_REGNUM))]
-  "(TARGET_64BIT || !TARGET_ZARCH) &&
-   (INTVAL (operands[4]) & 255) == 255"
-  "mvcle\t%0,%1,%Y2\;jo\t.-4"
-  [(set_attr "length" "8")
-   (set_attr "type" "vs")])
-
 ; Variants for 31 bit + Zarch, necessary because of the odd in-register offsets
 ; of the SImode subregs.
 
-(define_insn "*setmem_long_31z"
+(define_insn "*setmem_long_31z"
   [(clobber (match_operand:TI 0 "register_operand" "=d"))
(set (mem:BLK (subreg:SI (match_operand:TI 3 "register_operand" "0") 4))
 (unspec:BLK [(match_operand:SI 2 "shift_count_or_setmem_operand" "Y")
@@ -3367,21 +3351,6 @@
   [(set_attr "length" "8")
(set_attr "type" "vs")])
 
-(define_insn "*setmem_long_and_31z"
-  [(clobber (match_operand:TI 0 "register_operand" "=d"))
-   (set (mem:BLK (subreg:SI (match_operand:TI 3 "register_operand" "0") 4))
-(unspec:BLK [(and:SI
- (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")
- (match_operand:SI 4 "const_int_operand" "n"))
-   (subreg:SI (match_dup 3) 12)] UNSPEC_REPLICATE_BYTE))
-   (use (match_operand:TI 1 "register_operand" "d"))

[PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

This finally removes the Y constraint from the vector patterns while
folding some of them using a code iterator.

gcc/ChangeLog:

2016-02-29  Andreas Krebbel  

* config/s390/subst.md (DSI_VI): New mode iterator.
("addr_style_op_subst"): Use DSI_VI instead of DSI.
* config/s390/vector.md ("vec_set"): Move expander before
the insn definition.
("*vec_set"): Change predicate and add alternative to
support only either register or const_int operands as element
selector.
("*vec_set_plus"): New pattern to support reg + const_int
operands.
("vec_extract"): New expander.
("*vec_extract"): New insn definition supporting reg and
const_int element selectors.
("*vec_extract_plus"): New insn definition supporting
reg+const_int element selectors.
("rotl3", "ashl3", "ashr3"): Merge into the
following expander+insn definition.
("3"): New expander.
("*3"): New insn definition.
---
 gcc/config/s390/subst.md  |  13 ++---
 gcc/config/s390/vector.md | 127 +++---
 2 files changed, 81 insertions(+), 59 deletions(-)

diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md
index 3becf20..8a1b814 100644
--- a/gcc/config/s390/subst.md
+++ b/gcc/config/s390/subst.md
@@ -20,19 +20,20 @@
 ;; .
 
 (define_code_iterator SUBST [rotate ashift lshiftrt ashiftrt])
+(define_mode_iterator DSI_VI [SI DI V2QI V4QI V8QI V16QI V2HI V4HI V8HI V2SI 
V4SI V2DI])
 
 ; This expands an register/immediate operand to a register+immediate
 ; operand to draw advantage of the address style operand format
 ; providing a addition for free.
 (define_subst "addr_style_op_subst"
-  [(set (match_operand:DSI 0 "" "")
-(SUBST:DSI (match_operand:DSI 1 "" "")
-  (match_operand:SI 2 "" "")))]
+  [(set (match_operand:DSI_VI 0 "" "")
+(SUBST:DSI_VI (match_operand:DSI_VI 1 "" "")
+ (match_operand:SI 2 "" "")))]
   ""
   [(set (match_dup 0)
-(SUBST:DSI (match_dup 1)
-  (plus:SI (match_operand:SI 2 "register_operand" "a")
-   (match_operand 3 "const_int_operand"   "n"])
+(SUBST:DSI_VI (match_dup 1)
+ (plus:SI (match_operand:SI 2 "register_operand" "a")
+  (match_operand 3 "const_int_operand"   "n"])
 
 ; Use this in the insn name.
 (define_subst_attr "addr_style_op" "addr_style_op_subst" "" "_plus")
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index cc3287c..2b8e9bf 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -307,47 +307,82 @@
 
 ; vec_store_lanes?
 
+; vec_set is supposed to *modify* an existing vector so operand 0 is
+; duplicated as input operand.
+(define_expand "vec_set"
+  [(set (match_operand:V0 "register_operand"  
"")
+   (unspec:V [(match_operand: 1 "general_operand"   
"")
+  (match_operand:SI2 "shift_count_or_setmem_operand" 
"")
+  (match_dup 0)]
+  UNSPEC_VEC_SET))]
+  "TARGET_VX")
+
 ; FIXME: Support also vector mode operands for 1
 ; FIXME: A target memory operand seems to be useful otherwise we end
 ; up with vl vlvgg vst.  Shouldn't the middle-end be able to handle
 ; that itself?
 (define_insn "*vec_set"
-  [(set (match_operand:V0 "register_operand" 
"=v, v,v")
-   (unspec:V [(match_operand: 1 "general_operand"   
"d,QR,K")
-  (match_operand:SI2 "shift_count_or_setmem_operand" 
"Y, I,I")
-  (match_operand:V 3 "register_operand"  
"0, 0,0")]
+  [(set (match_operand:V0 "register_operand"  "=v, v,v")
+   (unspec:V [(match_operand: 1 "general_operand""d,QR,K")
+  (match_operand:SI2 "nonmemory_operand" "an, I,I")
+  (match_operand:V 3 "register_operand"   "0, 0,0")]
  UNSPEC_VEC_SET))]
-  "TARGET_VX"
+  "TARGET_VX
+   && (!CONST_INT_P (operands[2])
+   || UINTVAL (operands[2]) < GET_MODE_NUNITS (mode))"
   "@
vlvg\t%v0,%1,%Y2
vle\t%v0,%1,%2
vlei\t%v0,%1,%2"
   [(set_attr "op_type" "VRS,VRX,VRI")])
 
-; vec_set is supposed to *modify* an existing vector so operand 0 is
-; duplicated as input operand.
-(define_expand "vec_set"
-  [(set (match_operand:V0 "register_operand"  
"")
-   (unspec:V [(match_operand: 1 "general_operand"   
"")
-  (match_operand:SI2 "shift_count_or_setmem_operand" 
"")
-  (match_dup 0)]
-  UNSPEC_VEC_SET))]
-  "TARGET_VX")
+(define_insn "*vec_set_plus"
+  [(set (match_operand:V  0 "register_operand" "=v")
+   (unspec:V [(match_operand:

[PATCH 2/9] S/390: Use enabled attribute overrides to disable alternatives.

So far whenever we wanted to disable an alternative we have used mode
attributes emitting constraints matching an earlier alternative
assuming that due to this the later alternative will never be chosen.

With this patch the `enabled' attribute, which so far is only set from
`cpu_facility', is overridden to 0 to disable certain alternatives.
This comes handy when defining the substitutions later and while
adding it anyway I've used it for the existing cases as well.

gcc/ChangeLog:

2016-02-29  Andreas Krebbel  

* config/s390/s390.md ("op_type", "atype", "length" attributes):
Remove RRR type.  It doesn't really exist.
("RRer", "f0", "v0", "vf", "vd", "op1", "Rf"): Remove mode
attributes.
("BFP", "DFP", "nDSF", "nDFDI"): Add mode attributes.
("*cmp_ccs", "floatdi2", "add3")
("*add3_cc", "*add3_cconly", "sub3")
("*sub3_cc", "*sub3_cconly", "mul3")
("fma4", "fms4", "div3", "*neg2")
("*abs2", "*negabs2", "sqrt2"): Override
`enabled' attribute.
---
 gcc/config/s390/s390.md | 215 +---
 1 file changed, 111 insertions(+), 104 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 8f92018..65b6ce9 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -366,7 +366,7 @@
 ;; Used to determine defaults for length and other attribute values.
 
 (define_attr "op_type"
-  
"NN,E,RR,RRE,RX,RS,RSI,RI,SI,S,SS,SSE,RXE,RSE,RIL,RIE,RXY,RSY,SIY,RRF,RRR,SIL,RRS,RIS,VRI,VRR,VRS,VRV,VRX"
+  
"NN,E,RR,RRE,RX,RS,RSI,RI,SI,S,SS,SSE,RXE,RSE,RIL,RIE,RXY,RSY,SIY,RRF,SIL,RRS,RIS,VRI,VRR,VRS,VRV,VRX"
   (const_string "NN"))
 
 ;; Instruction type attribute used for scheduling.
@@ -393,7 +393,7 @@
 ;;   reg: Instruction does not use the agen unit
 
 (define_attr "atype" "agen,reg"
-  (if_then_else (eq_attr "op_type" "E,RR,RI,RRE,RSI,RIL,RIE,RRF,RRR")
+  (if_then_else (eq_attr "op_type" "E,RR,RI,RRE,RSI,RIL,RIE,RRF")
(const_string "reg")
(const_string "agen")))
 
@@ -434,8 +434,8 @@
 ;; Length in bytes.
 
 (define_attr "length" ""
-  (cond [(eq_attr "op_type" "E,RR")  (const_int 2)
- (eq_attr "op_type" "RX,RI,RRE,RS,RSI,S,SI,RRF,RRR")  (const_int 4)]
+  (cond [(eq_attr "op_type" "E,RR")  (const_int 2)
+ (eq_attr "op_type" "RX,RI,RRE,RS,RSI,S,SI,RRF")  (const_int 4)]
 (const_int 6)))
 
 
@@ -618,27 +618,14 @@
 ;; fp register operands.  The following attributes allow to merge the bfp and
 ;; dfp variants in a single insn definition.
 
-;; This attribute is used to set op_type accordingly.
-(define_mode_attr RRer [(TF "RRE") (DF "RRE") (SF "RRE") (TD "RRR")
-(DD "RRR") (SD "RRR")])
-
-;; This attribute is used in the operand constraint list in order to have the
-;; first and the second operand match for bfp modes.
-(define_mode_attr f0 [(TF "0") (DF "0") (SF "0") (TD "f") (DD "f") (DD "f")])
-
-;; This attribute is used to merge the scalar vector instructions into
-;; the FP patterns.  For non-supported modes (all but DF) it expands
-;; to constraints which are supposed to be matched by an earlier
-;; variant.
-(define_mode_attr v0  [(TF "0") (DF "v") (SF "0") (TD "0") (DD "0") (DD 
"0") (TI "0") (DI "v") (SI "0")])
-(define_mode_attr vf  [(TF "f") (DF "v") (SF "f") (TD "f") (DD "f") (DD 
"f") (TI "f") (DI "v") (SI "f")])
-(define_mode_attr vd  [(TF "d") (DF "v") (SF "d") (TD "d") (DD "d") (DD 
"d") (TI "d") (DI "v") (SI "d")])
-
-;; This attribute is used in the operand list of the instruction to have an
-;; additional operand for the dfp instructions.
-(define_mode_attr op1 [(TF "") (DF "") (SF "")
-   (TD "%1,") (DD "%1,") (SD "%1,")])
-
+;; These mode attributes are supposed to be used in the `enabled' insn
+;; attribute to disable certain alternatives for certain modes.
+(define_mode_attr nBFP [(TF "0") (DF "0") (SF "0") (TD "*") (DD "*") (DD "*")])
+(define_mode_attr nDFP [(TF "*") (DF "*") (SF "*") (TD "0") (DD "0") (DD "0")])
+(define_mode_attr DSF [(TF "0") (DF "*") (SF "*") (TD "0") (DD "0") (SD "0")])
+(define_mode_attr DFDI [(TF "0") (DF "*") (SF "0")
+   (TD "0") (DD "0") (DD "0")
+   (TI "0") (DI "*") (SI "0")])
 
 ;; This attribute is used in the operand constraint list
 ;; for instructions dealing with the sign bit of 32 or 64bit fp values.
@@ -648,10 +635,6 @@
 ;; target operand uses the same fp register.
 (define_mode_attr fT0 [(TF "0") (DF "f") (SF "f")])
 
-;; In FP templates, "" will expand to "f" in TFmode and "R" otherwise.
-;; This is used to disable the memory alternative in TFmode patterns.
-(define_mode_attr Rf [(TF "f") (DF "R") (SF "R") (TD "f") (DD "f") (SD "f")])
-
 ;; This attribute adds b for bfp instructions and t for dfp instructions and 
is used
 ;; within instruction mnemonics.
 (define_mode_attr bt [(TF "b") (DF "b")

[PATCH 4/9] S/390: Get rid of Y constraint in left and logical right shift patterns.

With this patch the substitution patterns added earlier are used for
the logical right shift and all the left shift patterns.

2016-02-29  Andreas Krebbel  

* config/s390/s390.md ("3"): Change predicate of
op2 to nonmemory_operand.
("*di3_31", "*di3_31_and"):
Merge into single pattern definition ...
("*di3_31"): New pattern.
("*3", "*3_and"): Merge into single
pattern definition ...
("*3"): New pattern.
* config/s390/subst.md: Add ashift and lshiftrt to SUBST
iterator.
---
 gcc/config/s390/s390.md  | 55 ++--
 gcc/config/s390/subst.md |  2 +-
 2 files changed, 17 insertions(+), 40 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index b7c037a..771d1e9 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -8408,60 +8408,37 @@
 (define_expand "3"
   [(set (match_operand:DSI 0 "register_operand" "")
 (SHIFT:DSI (match_operand:DSI 1 "register_operand" "")
-   (match_operand:SI 2 "shift_count_or_setmem_operand" "")))]
+   (match_operand:SI 2 "nonmemory_operand" "")))]
   ""
   "")
 
+; ESA 64 bit register pair shift with reg or imm shift count
 ; sldl, srdl
-(define_insn "*di3_31"
-  [(set (match_operand:DI 0 "register_operand" "=d")
-(SHIFT:DI (match_operand:DI 1 "register_operand" "0")
-  (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))]
+(define_insn "*di3_31"
+  [(set (match_operand:DI 0 "register_operand""=d")
+(SHIFT:DI (match_operand:DI 1 "register_operand"   "0")
+  (match_operand:SI 2 "nonmemory_operand" "an")))]
   "!TARGET_ZARCH"
-  "sdl\t%0,%Y2"
+  "sdl\t%0,"
   [(set_attr "op_type"  "RS")
(set_attr "atype""reg")
(set_attr "z196prop" "z196_cracked")])
 
-; sll, srl, sllg, srlg, sllk, srlk
-(define_insn "*3"
-  [(set (match_operand:GPR 0 "register_operand"  
"=d,d")
-(SHIFT:GPR (match_operand:GPR 1 "register_operand" 
",d")
-   (match_operand:SI 2 "shift_count_or_setmem_operand"
"Y,Y")))]
-  ""
-  "@
-   sl\t%0,<1>%Y2
-   sl\t%0,%1,%Y2"
-  [(set_attr "op_type"  "RS,RSY")
-   (set_attr "atype""reg,reg")
-   (set_attr "cpu_facility" "*,z196")
-   (set_attr "z10prop" "z10_super_E1,*")])
-
-; sldl, srdl
-(define_insn "*di3_31_and"
-  [(set (match_operand:DI 0 "register_operand" "=d")
-(SHIFT:DI (match_operand:DI 1 "register_operand" "0")
-  (and:SI (match_operand:SI 2 "shift_count_or_setmem_operand" 
"Y")
- (match_operand:SI 3 "const_int_operand"   "n"]
-  "!TARGET_ZARCH && (INTVAL (operands[3]) & 63) == 63"
-  "sdl\t%0,%Y2"
-  [(set_attr "op_type"  "RS")
-   (set_attr "atype""reg")])
 
+; 64 bit register shift with reg or imm shift count
 ; sll, srl, sllg, srlg, sllk, srlk
-(define_insn "*3_and"
-  [(set (match_operand:GPR 0 "register_operand"
 "=d,d")
-(SHIFT:GPR (match_operand:GPR 1 "register_operand"
",d")
-   (and:SI (match_operand:SI 2 "shift_count_or_setmem_operand" 
  "Y,Y")
-  (match_operand:SI 3 "const_int_operand"  
 "n,n"]
-  "(INTVAL (operands[3]) & 63) == 63"
+(define_insn "*3"
+  [(set (match_operand:GPR 0 "register_operand"  "=d, d")
+(SHIFT:GPR (match_operand:GPR 1 "register_operand" ", d")
+   (match_operand:SI 2 "nonmemory_operand"   "an,an")))]
+  ""
   "@
-   sl\t%0,<1>%Y2
-   sl\t%0,%1,%Y2"
+   sl\t%0,<1>
+   sl\t%0,%1,"
   [(set_attr "op_type"  "RS,RSY")
(set_attr "atype""reg,reg")
(set_attr "cpu_facility" "*,z196")
-   (set_attr "z10prop" "z10_super_E1,*")])
+   (set_attr "z10prop"  "z10_super_E1,*")])
 
 ;
 ; ashr(di|si)3 instruction pattern(s).
diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md
index c3761a9..907676a 100644
--- a/gcc/config/s390/subst.md
+++ b/gcc/config/s390/subst.md
@@ -19,7 +19,7 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .
 
-(define_code_iterator SUBST [rotate])
+(define_code_iterator SUBST [rotate ashift lshiftrt])
 
 ; This expands an register/immediate operand to a register+immediate
 ; operand to draw advantage of the address style operand format
-- 
1.9.1

[PATCH 0/9] S/390 rework shift count handling - v3

here is an updated version of the shift count rework in the S/390
backend.

Bootstrapped and regtested on s390 and s390x --with-arch=z196,zEC12,z13

Changes:

- Merge the address reg and immediate alternatives as suggested in:
  https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01744.html

- Add constraints (jm6 and jm8) equivalent to the
  const_int_6bitset_operand and const_int_8bitset_operand predicates.

Andreas Krebbel (9):
  gensupport: Fix define_subst operand renumbering.
  S/390: Use enabled attribute overrides to disable alternatives.
  S/390: Get rid of Y constraint in rotate patterns.
  S/390: Get rid of Y constraint in left and logical right shift
patterns.
  S/390: Get rid of Y constraint in arithmetic right shift patterns.
  S/390: Get rid of Y constraint in tabort.
  S/390: Get rid of Y constraint in vector.md.
  S/390: Use define_subst for the setmem patterns.
  S/390: Disallow SImode in s390_decompose_address

 gcc/config/s390/constraints.md |   9 +
 gcc/config/s390/predicates.md  |  10 +
 gcc/config/s390/s390.c |  31 ++-
 gcc/config/s390/s390.md| 530 ++---
 gcc/config/s390/subst.md   | 147 
 gcc/config/s390/vector.md  | 127 +-
 gcc/gensupport.c   |  45 ++--
 7 files changed, 453 insertions(+), 446 deletions(-)
 create mode 100644 gcc/config/s390/subst.md

-- 
1.9.1

[PATCH 6/9] S/390: Get rid of Y constraint in tabort.

This removes the Y constraint from the tabort pattern definition.  In
this case it is easier without using substitutions.

gcc/ChangeLog:

2016-02-29  Andreas Krebbel  

* config/s390/s390.md ("*tabort_1"): Change predicate to
nonmemory_operand.  Add a second alternative to cover
register as well as const int operands.
("*tabort_1_plus"): New pattern definition.
---
 gcc/config/s390/s390.md | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index dd91383..ca58c42 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -10698,7 +10698,7 @@
 ; Transaction abort
 
 (define_expand "tabort"
-  [(unspec_volatile [(match_operand:SI 0 "shift_count_or_setmem_operand" "")]
+  [(unspec_volatile [(match_operand:SI 0 "nonmemory_operand" "")]
UNSPECV_TABORT)]
   "TARGET_HTM && operands != NULL"
 {
@@ -10713,12 +10713,21 @@
 })
 
 (define_insn "*tabort_1"
-  [(unspec_volatile [(match_operand:SI 0 "shift_count_or_setmem_operand" "Y")]
+  [(unspec_volatile [(match_operand:SI 0 "nonmemory_operand" "aJ")]
UNSPECV_TABORT)]
   "TARGET_HTM && operands != NULL"
   "tabort\t%Y0"
   [(set_attr "op_type" "S")])
 
+(define_insn "*tabort_1_plus"
+  [(unspec_volatile [(plus:SI (match_operand:SI 0 "register_operand"  "a")
+ (match_operand:SI 1 "const_int_operand" "J"))]
+   UNSPECV_TABORT)]
+  "TARGET_HTM && operands != NULL
+   && CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[1]), 'J', \"J\")"
+  "tabort\t%1(%0)"
+  [(set_attr "op_type" "S")])
+
 ; Transaction extract nesting depth
 
 (define_insn "etnd"
-- 
1.9.1

[PATCH 1/9] gensupport: Fix define_subst operand renumbering.

When processing substitutions the operands are renumbered.  To find a
free operand number the array used_operands_numbers is used.
Currently this array is used to assign new numbers before all the
RTXes in the vector have been processed.  I did run into problems with
this for insns where a match_dup occurred in a later (use ...) operand
referring to an earlier operand (e.g. s390.md "setmem_long").

The patch splits the loop doing the processing into two in order to
have all the operand numbers collected already when assigning new
numbers.

Bootstrapped and regtested on s390, s390x, and x86_64.

Ok for mainline?

Bye,

-Andreas-

gcc/ChangeLog:

2016-02-29  Andreas Krebbel  

* gensupport.c (process_substs_on_one_elem): Split loop to
complete mark_operands_used_in_match_dup on all expressions in the
vector first.
(adjust_operands_numbers): Inline into process_substs_on_one_elem
and remove function.
---
 gcc/gensupport.c | 45 -
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 8c5a1ab..de29579 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -126,7 +126,10 @@ static const char * duplicate_each_alternative (const char 
* str, int n_dup);
 
 typedef const char * (*constraints_handler_t) (const char *, int);
 static rtx alter_constraints (rtx, int, constraints_handler_t);
-static rtx adjust_operands_numbers (rtx);
+
+static void mark_operands_used_in_match_dup (rtx);
+static void renumerate_operands_in_pattern (rtx);
+
 static rtx replace_duplicating_operands_in_pattern (rtx);
 
 /* Make a version of gen_rtx_CONST_INT so that GEN_INT can be used in
@@ -1844,7 +1847,18 @@ process_substs_on_one_elem (struct queue_elem *elem,
  subst_pattern = alter_constraints (subst_pattern, alternatives,
 duplicate_each_alternative);
 
- subst_pattern = adjust_operands_numbers (subst_pattern);
+ mark_operands_used_in_match_dup (subst_pattern);
+ RTVEC_ELT (subst_pattern_vec, j) = subst_pattern;
+   }
+
+  for (j = 0; j < XVECLEN (subst_elem->data, 3); j++)
+   {
+ subst_pattern = RTVEC_ELT (subst_pattern_vec, j);
+
+ /* The number of MATCH_OPERANDs in the output pattern might
+change.  This routine assigns new numbers to the
+MATCH_OPERAND expressions to avoid collisions.  */
+ renumerate_operands_in_pattern (subst_pattern);
 
  /* Substitute match_dup and match_op_dup in the new pattern and
 duplicate constraints.  */
@@ -1857,7 +1871,6 @@ process_substs_on_one_elem (struct queue_elem *elem,
  if (GET_CODE (elem->data) == DEFINE_EXPAND)
remove_constraints (subst_pattern);
 
- RTVEC_ELT (subst_pattern_vec, j) = subst_pattern;
}
   XVEC (elem->data, 1) = subst_pattern_vec;
 
@@ -1927,7 +1940,7 @@ mark_operands_from_match_dup (rtx pattern)
 }
 }
 
-/* This is a subroutine of adjust_operands_numbers.
+/* This is a subroutine of process_substs_on_one_elem.
It goes through all expressions in PATTERN and when MATCH_DUP is
met, all MATCH_OPERANDs inside it is marked as occupied.  The
process of marking is done by routin mark_operands_from_match_dup.  */
@@ -1973,10 +1986,9 @@ find_first_unused_number_of_operand ()
   return MAX_OPERANDS;
 }
 
-/* This is subroutine of adjust_operands_numbers.
-   It visits all expressions in PATTERN and assigns not-occupied
-   operand indexes to MATCH_OPERANDs and MATCH_OPERATORs of this
-   PATTERN.  */
+/* This is a subroutine of process_substs_on_one_elem.  It visits all
+   expressions in PATTERN and assigns not-occupied operand indexes to
+   MATCH_OPERANDs and MATCH_OPERATORs of this PATTERN.  */
 static void
 renumerate_operands_in_pattern (rtx pattern)
 {
@@ -2011,23 +2023,6 @@ renumerate_operands_in_pattern (rtx pattern)
 }
 }
 
-/* If output pattern of define_subst contains MATCH_DUP, then this
-   expression would be replaced with the pattern, matched with
-   MATCH_OPERAND from input pattern.  This pattern could contain any
-   number of MATCH_OPERANDs, MATCH_OPERATORs etc., so it's possible
-   that a MATCH_OPERAND from output_pattern (if any) would have the
-   same number, as MATCH_OPERAND from copied pattern.  To avoid such
-   indexes overlapping, we assign new indexes to MATCH_OPERANDs,
-   laying in the output pattern outside of MATCH_DUPs.  */
-static rtx
-adjust_operands_numbers (rtx pattern)
-{
-  mark_operands_used_in_match_dup (pattern);
-
-  renumerate_operands_in_pattern (pattern);
-
-  return pattern;
-}
 
 /* Generate RTL expression
(match_dup OPNO)
-- 
1.9.1

[PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.