date:20170418

Re: [PATCH] Fix fixincludes for canadian cross builds

2017-04-18 Thread Yvan Roux

On 18 April 2017 at 20:17, Bernd Edlinger  wrote:
> On 04/14/17 12:29, Bernd Edlinger wrote:
>> Hi RMs:
>>
>> I am sorry that this happened so close to the imminent gcc-7 release
>> date.
>>
>> To my best knowledge it would be fine to apply this update patch on the
>> trunk: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00649.html
>>
>> But if you decide otherwise, I am also ready to revert my original
>> commit: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=245613
>>
>>
>> Thanks
>> Bernd.
>
> Aehm, Sorry.
>
> I just realized that the updated patch did still not yet work correctly
> in all possible configurations...
>
> I think this part of the configure.ac needs some more rework,
> but that is probably not the right time for it.
>
> Therefore I reverted r245613 for now.
>
>
> I will post an updated patch at a later time.

Thanks Bernd, let me know if you want me to try your patch in with our
configurations.

>
> Thanks
> Bernd.
>
>
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog   (Revision 246978)
> +++ gcc/ChangeLog   (Revision 246979)
> @@ -1,3 +1,11 @@
> +2017-04-18  Bernd Edlinger  
> +
> +   Revert:
> +   2017-02-20  Bernd Edlinger  
> +   * Makefile.in (BUILD_SYSTEM_HEADER_DIR): New make variabe.
> +   (LIMITS_H_TEST, if_multiarch, stmp-fixinc): Use 
> BUILD_SYSTEM_HEADER_DIR
> +   instead of SYSTEM_HEADER_DIR.
> +
>   2017-04-18  Jeff Law  
>
> PR middle-end/80422
> Index: gcc/Makefile.in
> ===
> --- gcc/Makefile.in (Revision 246978)
> +++ gcc/Makefile.in (Revision 246979)
> @@ -517,18 +517,11 @@
>   # macro is also used in a double-quoted context.
>   SYSTEM_HEADER_DIR = `echo @SYSTEM_HEADER_DIR@ | sed -e :a -e
> 's,[^/]*/\.\.\/,,' -e ta`
>
> -# Path to the system headers on the build machine
> -ifeq ($(build),$(host))
> -BUILD_SYSTEM_HEADER_DIR = $(SYSTEM_HEADER_DIR)
> -else
> -BUILD_SYSTEM_HEADER_DIR = `echo $(CROSS_SYSTEM_HEADER_DIR) | sed -e :a
> -e 's,[^/]*/\.\.\/,,' -e ta`
> -endif
> -
>   # Control whether to run fixincludes.
>   STMP_FIXINC = @STMP_FIXINC@
>
>   # Test to see whether  exists in the system header files.
> -LIMITS_H_TEST = [ -f $(BUILD_SYSTEM_HEADER_DIR)/limits.h ]
> +LIMITS_H_TEST = [ -f $(SYSTEM_HEADER_DIR)/limits.h ]
>
>   # Directory for prefix to system directories, for
>   # each of $(system_prefix)/usr/include, $(system_prefix)/usr/lib, etc.
> @@ -579,7 +572,7 @@
>   else
> ifeq ($(enable_multiarch),auto)
>   # SYSTEM_HEADER_DIR is makefile syntax, cannot be evaluated in
> configure.ac
> -if_multiarch = $(if $(wildcard $(shell echo
> $(BUILD_SYSTEM_HEADER_DIR))/../../usr/lib/*/crti.o),$(1))
> +if_multiarch = $(if $(wildcard $(shell echo
> $(SYSTEM_HEADER_DIR))/../../usr/lib/*/crti.o),$(1))
> else
>   if_multiarch =
> endif
> @@ -2999,11 +2992,11 @@
> sysroot_headers_suffix=`echo $${ml} | sed -e 's/;.*$$//'`; \
> multi_dir=`echo $${ml} | sed -e 's/^[^;]*;//'`; \
> fix_dir=include-fixed$${multi_dir}; \
> -   if ! $(inhibit_libc) && test ! -d ${BUILD_SYSTEM_HEADER_DIR}; 
> then \
> +   if ! $(inhibit_libc) && test ! -d ${SYSTEM_HEADER_DIR}; then \
>   echo The directory that should contain system headers does not
> exist: >&2 ; \
> - echo "  ${BUILD_SYSTEM_HEADER_DIR}" >&2 ; \
> + echo "  ${SYSTEM_HEADER_DIR}" >&2 ; \
>   tooldir_sysinc=`echo "${gcc_tooldir}/sys-include" | sed -e :a
> -e "s,[^/]*/\.\.\/,," -e ta`; \
> - if test "x${BUILD_SYSTEM_HEADER_DIR}" = "x$${tooldir_sysinc}"; \
> + if test "x${SYSTEM_HEADER_DIR}" = "x$${tooldir_sysinc}"; \
>   then sleep 1; else exit 1; fi; \
> fi; \
> $(mkinstalldirs) $${fix_dir}; \
> @@ -3014,7 +3007,7 @@
>   export TARGET_MACHINE srcdir SHELL MACRO_LIST && \
>   cd $(build_objdir)/fixincludes && \
>   $(SHELL) ./fixinc.sh "$${gcc_dir}/$${fix_dir}" \
> -   $(BUILD_SYSTEM_HEADER_DIR) $(OTHER_FIXINCLUDES_DIRS) ); \
> +   $(SYSTEM_HEADER_DIR) $(OTHER_FIXINCLUDES_DIRS) ); \
> rm -f $${fix_dir}/syslimits.h; \
> if [ -f $${fix_dir}/limits.h ]; then \
>   mv $${fix_dir}/limits.h $${fix_dir}/syslimits.h; \

Re: [PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-18 Thread Jakub Jelinek

On Wed, Apr 19, 2017 at 07:56:30AM +0200, Jakub Jelinek wrote:
> On Wed, Apr 19, 2017 at 07:45:36AM +0200, Richard Biener wrote:
> > >As mentioned in the PR, we now use TYPE_TYPELESS_STORAGE flag on
> > >ARRAY_TYPEs to denote types that need the special C++ alias handling.
> > >The problem is how is that created, we just use build_array_type and
> > >set TYPE_TYPELESS_STORAGE on the result, but build_array_type uses type
> > >caching, so we might modify that way some other array type.
> > >If all the array type creation goes through build_cplus_array_type,
> > >that
> > >wouldn't be a problem, as that flag is dependent just on the element
> > >type, but that is not the case, c-family as well as the middle-end has
> > >lots of spots that also create array types.  So in the end whether
> > >one gets TYPE_TYPELESS_STORAGE flag or not is quite random, depends on
> > >GC etc.
> > >
> > >The following patch attempts to resolve this, by making the type
> > >hashing
> > >take that flag into account.  Bootstrapped/regtested on x86_64-linux
> > >and
> > >i686-linux, ok for trunk?
> > 
> > When changing the C++ function I thought that calling build_array_type was
> > wrong and it should instead do the same it does in the other places, use
> > its raw creation routine and then the canonical type register stuff.  But
> > I was hesitant to change this at this point.
> 
> The problem is that as the patch shows, we don't need it in a single place
> (the C++ FE), but at least in two places (C++ FE and c-family), and it
> wouldn't surprise me if we don't need it later on in further places
> (e.g. in middle-end, if we have a TYPE_TYPELESS_STORAGE array and say DSE
> wants to create a smaller one with the same property).
> 
> Using a default argument to build_array_type is likely cleaner indeed,
> I'd just then also swap the arguments to build_array_type_1 (the shared
> vs. typeless_storage).

Here in (so far untested) patch form:

2017-04-19  Jakub Jelinek  

PR middle-end/80423
* tree.h (build_array_type): Add typeless_storage default argument.
* tree.c (type_cache_hasher::equal): Also compare
TYPE_TYPELESS_STORAGE flag for ARRAY_TYPEs.
(build_array_type): Add typeless_storage argument, set
TYPE_TYPELESS_STORAGE to it, if shared also hash it, and pass to
recursive call.
(build_nonshared_array_type): Adjust build_array_type_1 caller.
(build_array_type): Likewise.  Add typeless_storage argument.
c-family/
* c-common.c (complete_array_type): Preserve TYPE_TYPELESS_STORAGE.
cp/
* tree.c (build_cplus_array_type): Call build_array_type
with the intended TYPE_TYPELESS_STORAGE flag value, instead
of calling build_array_type and modifying later TYPE_TYPELESS_STORAGE
on the shared type.
testsuite/
* g++.dg/other/pr80423.C: New test.

--- gcc/tree.h.jj   2017-04-18 15:13:25.180398014 +0200
+++ gcc/tree.h  2017-04-19 08:16:12.859844328 +0200
@@ -4068,7 +4068,7 @@ extern tree build_truth_vector_type (uns
 extern tree build_same_sized_truth_vector_type (tree vectype);
 extern tree build_opaque_vector_type (tree innertype, int nunits);
 extern tree build_index_type (tree);
-extern tree build_array_type (tree, tree);
+extern tree build_array_type (tree, tree, bool = false);
 extern tree build_nonshared_array_type (tree, tree);
 extern tree build_array_type_nelts (tree, unsigned HOST_WIDE_INT);
 extern tree build_function_type (tree, tree);
--- gcc/tree.c.jj   2017-04-18 15:13:25.158398308 +0200
+++ gcc/tree.c  2017-04-19 08:17:50.938542960 +0200
@@ -7073,7 +7073,9 @@ type_cache_hasher::equal (type_hash *a,
 break;
   return 0;
 case ARRAY_TYPE:
-  return TYPE_DOMAIN (a->type) == TYPE_DOMAIN (b->type);
+  return (TYPE_TYPELESS_STORAGE (a->type)
+ == TYPE_TYPELESS_STORAGE (b->type)
+ && TYPE_DOMAIN (a->type) == TYPE_DOMAIN (b->type));
 
 case RECORD_TYPE:
 case UNION_TYPE:
@@ -8350,10 +8352,12 @@ subrange_type_for_debug_p (const_tree ty
 
 /* Construct, lay out and return the type of arrays of elements with ELT_TYPE
and number of elements specified by the range of values of INDEX_TYPE.
+   If TYPELESS_STORAGE is true, TYPE_TYPELESS_STORAGE flag is set on the type.
If SHARED is true, reuse such a type that has already been constructed.  */
 
 static tree
-build_array_type_1 (tree elt_type, tree index_type, bool shared)
+build_array_type_1 (tree elt_type, tree index_type, bool typeless_storage,
+   bool shared)
 {
   tree t;
 
@@ -8367,6 +8371,7 @@ build_array_type_1 (tree elt_type, tree
   TREE_TYPE (t) = elt_type;
   TYPE_DOMAIN (t) = index_type;
   TYPE_ADDR_SPACE (t) = TYPE_ADDR_SPACE (elt_type);
+  TYPE_TYPELESS_STORAGE (t) = typeless_storage;
   layout_type (t);
 
   /* If the element type is incomplete at this point we get marked for
@@ -8381,6 +8386,7 @@ build_array_type_1 (tree elt_type, tree
   hstate.add_object (TYPE_HASH (e

[patch] Fix PR tree-optimization/80426

2017-04-18 Thread Eric Botcazou

Hi,

this is a regression on the mainline, but the underlying issue is present on 
the branches too, and similar to PR tree-optimization/79666, but this time for 
the invariant part instead of the symbolic one.

In extract_range_from_binary_expr_1, when the invariant part of the symbolic 
computation overflows, we drop to varying:

  /* If we have overflow for the constant part and the resulting
 range will be symbolic, drop to VR_VARYING.  */
  if ((min_ovf && sym_min_op0 != sym_min_op1)
  || (max_ovf && sym_max_op0 != sym_max_op1))
{
  set_value_range_to_varying (vr);
  return;
}

But we fail to compute the overflow when the operation is MINUS_EXPR and the 
first operand is 0, i.e. we can silently negate INT_MIN and fail to bail out.

Fixed by computing overflow in this case too.  Tested on x86_64-suse-linux, OK 
for mainline and 6 branch?


2017-04-19  Eric Botcazou  

PR tree-optimization/80426
* tree-vrp.c (extract_range_from_binary_expr_1): For an additive
operation on symbolic operands, also compute the overflow for the
invariant part when the operation degenerates into a negation.


2017-04-19  Eric Botcazou  

* gcc.c-torture/execute/20170419-1.c: New test.


-- 
Eric BotcazouIndex: tree-vrp.c
===
--- tree-vrp.c	(revision 246960)
+++ tree-vrp.c	(working copy)
@@ -2461,7 +2461,19 @@ extract_range_from_binary_expr_1 (value_
 	  else if (min_op0)
 	wmin = min_op0;
 	  else if (min_op1)
-	wmin = minus_p ? wi::neg (min_op1) : min_op1;
+	{
+	  if (minus_p)
+		{
+		  wmin = wi::neg (min_op1);
+
+		  /* Check for overflow.  */
+		  if (wi::cmp (0, min_op1, sgn)
+		  != wi::cmp (wmin, 0, sgn))
+		min_ovf = wi::cmp (0, min_op1, sgn);
+		}
+	  else
+		wmin = min_op1;
+	}
 	  else
 	wmin = wi::shwi (0, prec);
 
@@ -2489,7 +2501,19 @@ extract_range_from_binary_expr_1 (value_
 	  else if (max_op0)
 	wmax = max_op0;
 	  else if (max_op1)
-	wmax = minus_p ? wi::neg (max_op1) : max_op1;
+	{
+	  if (minus_p)
+		{
+		  wmax = wi::neg (max_op1);
+
+		  /* Check for overflow.  */
+		  if (wi::cmp (0, max_op1, sgn)
+		  != wi::cmp (wmax, 0, sgn))
+		max_ovf = wi::cmp (0, max_op1, sgn);
+		}
+	  else
+		wmax = max_op1;
+	}
 	  else
 	wmax = wi::shwi (0, prec);
 
/* PR tree-optimization/80426 */
/* Testcase by  */

#define INT_MAX 0x7fff
#define INT_MIN (-INT_MAX-1)

int x;

int main (void)
{
  volatile int a = 0;
  volatile int b = -INT_MAX;
  int j;

  for(j = 0; j < 18; j += 1) {
x = ( (a == 0) != (b - (int)(INT_MIN) ) );
  }

  if (x != 0)
__builtin_abort ();

  return 0;
}

Re: [PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-18 Thread Jakub Jelinek

On Wed, Apr 19, 2017 at 07:45:36AM +0200, Richard Biener wrote:
> >As mentioned in the PR, we now use TYPE_TYPELESS_STORAGE flag on
> >ARRAY_TYPEs to denote types that need the special C++ alias handling.
> >The problem is how is that created, we just use build_array_type and
> >set TYPE_TYPELESS_STORAGE on the result, but build_array_type uses type
> >caching, so we might modify that way some other array type.
> >If all the array type creation goes through build_cplus_array_type,
> >that
> >wouldn't be a problem, as that flag is dependent just on the element
> >type, but that is not the case, c-family as well as the middle-end has
> >lots of spots that also create array types.  So in the end whether
> >one gets TYPE_TYPELESS_STORAGE flag or not is quite random, depends on
> >GC etc.
> >
> >The following patch attempts to resolve this, by making the type
> >hashing
> >take that flag into account.  Bootstrapped/regtested on x86_64-linux
> >and
> >i686-linux, ok for trunk?
> 
> When changing the C++ function I thought that calling build_array_type was
> wrong and it should instead do the same it does in the other places, use
> its raw creation routine and then the canonical type register stuff.  But
> I was hesitant to change this at this point.

The problem is that as the patch shows, we don't need it in a single place
(the C++ FE), but at least in two places (C++ FE and c-family), and it
wouldn't surprise me if we don't need it later on in further places
(e.g. in middle-end, if we have a TYPE_TYPELESS_STORAGE array and say DSE
wants to create a smaller one with the same property).

Using a default argument to build_array_type is likely cleaner indeed,
I'd just then also swap the arguments to build_array_type_1 (the shared
vs. typeless_storage).

Jakub

Re: [PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-18 Thread Richard Biener

On April 18, 2017 5:14:30 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>As mentioned in the PR, we now use TYPE_TYPELESS_STORAGE flag on
>ARRAY_TYPEs to denote types that need the special C++ alias handling.
>The problem is how is that created, we just use build_array_type and
>set TYPE_TYPELESS_STORAGE on the result, but build_array_type uses type
>caching, so we might modify that way some other array type.
>If all the array type creation goes through build_cplus_array_type,
>that
>wouldn't be a problem, as that flag is dependent just on the element
>type, but that is not the case, c-family as well as the middle-end has
>lots of spots that also create array types.  So in the end whether
>one gets TYPE_TYPELESS_STORAGE flag or not is quite random, depends on
>GC etc.
>
>The following patch attempts to resolve this, by making the type
>hashing
>take that flag into account.  Bootstrapped/regtested on x86_64-linux
>and
>i686-linux, ok for trunk?

When changing the C++ function I thought that calling build_array_type was 
wrong and it should instead do the same it does in the other places, use its 
raw creation routine and then the canonical type register stuff.  But I was 
hesitant to change this at this point.

I'm on FTO today so can't experiment with that before torrow.

Richard.

>2017-04-18  Jakub Jelinek  
>
>   PR middle-end/80423
>   * tree.h (build_array_type_1): New prototype.
>   * tree.c (type_cache_hasher::equal): Also compare
>   TYPE_TYPELESS_STORAGE flag for ARRAY_TYPEs.
>   (build_array_type_1): No longer static.  Add typeless_storage
>   argument, set TYPE_TYPELESS_STORAGE to it, if shared also
>   hash it, and pass to recursive call.
>   (build_array_type, build_nonshared_array_type): Adjust
>   build_array_type_1 callers.
>c-family/
>   * c-common.c (complete_array_type): Preserve TYPE_TYPELESS_STORAGE.
>cp/
>   * tree.c (build_cplus_array_type): Call build_array_type_1
>   with the intended TYPE_TYPELESS_STORAGE flag value, instead
>   of calling build_array_type and modifying TYPE_TYPELESS_STORAGE
>   on the shared type.
>testsuite/
>   * g++.dg/other/pr80423.C: New test.
>
>--- gcc/tree.h.jj  2017-04-12 13:22:23.0 +0200
>+++ gcc/tree.h 2017-04-18 12:38:02.981708334 +0200
>@@ -4068,6 +4068,7 @@ extern tree build_truth_vector_type (uns
> extern tree build_same_sized_truth_vector_type (tree vectype);
> extern tree build_opaque_vector_type (tree innertype, int nunits);
> extern tree build_index_type (tree);
>+extern tree build_array_type_1 (tree, tree, bool, bool);
> extern tree build_array_type (tree, tree);
> extern tree build_nonshared_array_type (tree, tree);
> extern tree build_array_type_nelts (tree, unsigned HOST_WIDE_INT);
>--- gcc/tree.c.jj  2017-04-07 11:46:46.0 +0200
>+++ gcc/tree.c 2017-04-18 12:37:38.765024350 +0200
>@@ -7073,7 +7073,9 @@ type_cache_hasher::equal (type_hash *a,
> break;
>   return 0;
> case ARRAY_TYPE:
>-  return TYPE_DOMAIN (a->type) == TYPE_DOMAIN (b->type);
>+  return (TYPE_TYPELESS_STORAGE (a->type)
>+== TYPE_TYPELESS_STORAGE (b->type)
>+&& TYPE_DOMAIN (a->type) == TYPE_DOMAIN (b->type));
> 
> case RECORD_TYPE:
> case UNION_TYPE:
>@@ -8352,8 +8354,9 @@ subrange_type_for_debug_p (const_tree ty
> and number of elements specified by the range of values of INDEX_TYPE.
>If SHARED is true, reuse such a type that has already been constructed.
> */
> 
>-static tree
>-build_array_type_1 (tree elt_type, tree index_type, bool shared)
>+tree
>+build_array_type_1 (tree elt_type, tree index_type, bool shared,
>+  bool typeless_storage)
> {
>   tree t;
> 
>@@ -8367,6 +8370,7 @@ build_array_type_1 (tree elt_type, tree
>   TREE_TYPE (t) = elt_type;
>   TYPE_DOMAIN (t) = index_type;
>   TYPE_ADDR_SPACE (t) = TYPE_ADDR_SPACE (elt_type);
>+  TYPE_TYPELESS_STORAGE (t) = typeless_storage;
>   layout_type (t);
> 
>   /* If the element type is incomplete at this point we get marked for
>@@ -8381,6 +8385,7 @@ build_array_type_1 (tree elt_type, tree
>   hstate.add_object (TYPE_HASH (elt_type));
>   if (index_type)
>   hstate.add_object (TYPE_HASH (index_type));
>+  hstate.add_flag (typeless_storage);
>   t = type_hash_canon (hstate.end (), t);
> }
> 
>@@ -8396,7 +8401,7 @@ build_array_type_1 (tree elt_type, tree
> = build_array_type_1 (TYPE_CANONICAL (elt_type),
>   index_type
>   ? TYPE_CANONICAL (index_type) : NULL_TREE,
>-  shared);
>+  shared, typeless_storage);
> }
> 
>   return t;
>@@ -8407,7 +8412,7 @@ build_array_type_1 (tree elt_type, tree
> tree
> build_array_type (tree elt_type, tree index_type)
> {
>-  return build_array_type_1 (elt_type, index_type, true);
>+  return build_array_type_1 (elt_type, index_type, true, false);
> }
> 
> /* Wrapper around build_array_type_1 with S

Re: [PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-18 Thread Jeff Law


On 04/18/2017 09:14 AM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, we now use TYPE_TYPELESS_STORAGE flag on
ARRAY_TYPEs to denote types that need the special C++ alias handling.
The problem is how is that created, we just use build_array_type and
set TYPE_TYPELESS_STORAGE on the result, but build_array_type uses type
caching, so we might modify that way some other array type.
If all the array type creation goes through build_cplus_array_type, that
wouldn't be a problem, as that flag is dependent just on the element
type, but that is not the case, c-family as well as the middle-end has
lots of spots that also create array types.  So in the end whether
one gets TYPE_TYPELESS_STORAGE flag or not is quite random, depends on
GC etc.

The following patch attempts to resolve this, by making the type hashing
take that flag into account.  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2017-04-18  Jakub Jelinek  

PR middle-end/80423
* tree.h (build_array_type_1): New prototype.
* tree.c (type_cache_hasher::equal): Also compare
TYPE_TYPELESS_STORAGE flag for ARRAY_TYPEs.
(build_array_type_1): No longer static.  Add typeless_storage
argument, set TYPE_TYPELESS_STORAGE to it, if shared also
hash it, and pass to recursive call.
(build_array_type, build_nonshared_array_type): Adjust
build_array_type_1 callers.
c-family/
* c-common.c (complete_array_type): Preserve TYPE_TYPELESS_STORAGE.
cp/
* tree.c (build_cplus_array_type): Call build_array_type_1
with the intended TYPE_TYPELESS_STORAGE flag value, instead
of calling build_array_type and modifying TYPE_TYPELESS_STORAGE
on the shared type.
testsuite/
* g++.dg/other/pr80423.C: New test.
Rather than exporting build_array_type_1, which seems rather gross, why 
not add a default argument for typeless storage to build_array_type 
prototype in tree.h?  That way existing users "just work", and in the 
cases where we may want to specify typeless storage, we can pass it into 
the build_array_type wrapper.


That seems marginally cleaner to me.

jeff

[PATCH][PR target/74563] Fix return patterns for MIPS16

2017-04-18 Thread Jeff Law

As Maciej reports in PR 74563, we currently generate incorrect code for 
mips16 (classic) non-leaf function returns.


Specifically we sometimes generate jr $31, when we should be generating 
jr $7 (and there are times when we want to generate jr $31).


The problem as Maciej and I independently concluded was the bogus 
assignment to operands[0] in the {return,simple_return}_internal 
pattern.  That pattern accepts the return pointer register as an 
argument and AFAICT mips.c passes the right return pointer register 
consistently.  Overwriting that operand is just pointless and wrong.


This patch removes the bogus assignment and adds a suitable testcase. 
Tested by verifying my MIPS configurations can build newlib/glibc as 
appropriate.  Also tested the testcase using mips64vr-elf cross compiler.


Installing on the trunk.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ea44ddb553e..7aa8c03c45b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-04-18  Jeff Law  
+
+   PR target/74563
+   * mips.md ({return,simple_return}_internal): Do not overwrite
+   operands[0].
+
 2017-04-18  Jakub Jelinek  
 
PR tree-optimization/80443
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 7acf00d0451..28e0a444ba9 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -6585,7 +6585,6 @@
(use (match_operand 0 "pmode_register_operand" ""))]
   ""
   {
-operands[0] = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
 return mips_output_jump (operands, 0, -1, false);
   }
   [(set_attr "type""jump")
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 11410bb7045..c21e3733ed4 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-04-18  Jeff Law  
+
+   PR target/74563
+   * gcc.target/mips/pr74563: New test.
+
 2017-04-18  Jakub Jelinek  
 
PR tree-optimization/80443
diff --git a/gcc/testsuite/gcc.target/mips/pr74563.c 
b/gcc/testsuite/gcc.target/mips/pr74563.c
new file mode 100644
index 000..09545fcb5bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr74563.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mips3 -mips16 -msoft-float" } */
+
+void f2(void);
+
+void f1(void)
+{
+f2();
+}
+
+/* { dg-final { scan-assembler-not "\tjr\t\\\$31" } } */
+/* { dg-final { scan-assembler "\tjr\t\\\$7" } } */
+
+

Re: [PATCH] libiberty: Limit demangler maximum d_print_comp recursion call depth.

2017-04-18 Thread Ian Lance Taylor via gcc-patches

On Tue, Apr 18, 2017 at 4:03 PM, Mark Wielaard  wrote:
> On Tue, Apr 18, 2017 at 03:40:05PM -0700, Ian Lance Taylor wrote:
>> On Tue, Apr 18, 2017 at 3:23 PM, Mark Wielaard  wrote:
>> > The fix for PR demangler/70909 and 67264 (endless demangler recursion)
>> > catches when a demangle_component is printed in a cycle. But that doesn't
>> > protect the call stack blowing up from non-cyclic nested types printed
>> > recursively through d_print_comp. This can happen by a (very) long mangled
>> > string that simply creates a very deep pointer or qualifier chain. Limit
>> > the recursive d_print_comp call depth for a d_print_info to 1K nested
>> > types.
>> >
>> > libiberty/ChangeLog:
>> >
>> > * cp-demangle.c (MAX_RECURSION_COUNT): New constant.
>> > (struct d_print_info): Add recursion field.
>> > (d_print_init): Initialize recursion.
>> > (d_print_comp): Check and update d_print_info recursion depth.
>>
>> I'm probably missing something, but this kind of seems like an
>> arbitrary limit.  It's possible to imagine a rather unlikely valid
>> symbol that will no longer be demangled.  Why do we want to do this?
>> What bug are we fixing?
>
> It is an arbitrary limit and I am happy to change it if it is unrealistic.
> I thought 1K was small enough that if we hit it we wouldn't have blown up
> the call stack yet. But big enough that it is unlikely that it would be a
> valid symbol (with that large a number of nested component types). The bug
> we fix with this is a program trying to demangle a string that looks like
> e.g. _Z3fnGGOGGG crashing because of stack overflow.

Hmmm, well, OK for stage 1, I guess.

Ian

Re: [PATCH] libiberty: Limit demangler maximum d_print_comp recursion call depth.

2017-04-18 Thread Mark Wielaard

On Tue, Apr 18, 2017 at 03:40:05PM -0700, Ian Lance Taylor wrote:
> On Tue, Apr 18, 2017 at 3:23 PM, Mark Wielaard  wrote:
> > The fix for PR demangler/70909 and 67264 (endless demangler recursion)
> > catches when a demangle_component is printed in a cycle. But that doesn't
> > protect the call stack blowing up from non-cyclic nested types printed
> > recursively through d_print_comp. This can happen by a (very) long mangled
> > string that simply creates a very deep pointer or qualifier chain. Limit
> > the recursive d_print_comp call depth for a d_print_info to 1K nested
> > types.
> >
> > libiberty/ChangeLog:
> >
> > * cp-demangle.c (MAX_RECURSION_COUNT): New constant.
> > (struct d_print_info): Add recursion field.
> > (d_print_init): Initialize recursion.
> > (d_print_comp): Check and update d_print_info recursion depth.
> 
> I'm probably missing something, but this kind of seems like an
> arbitrary limit.  It's possible to imagine a rather unlikely valid
> symbol that will no longer be demangled.  Why do we want to do this?
> What bug are we fixing?

It is an arbitrary limit and I am happy to change it if it is unrealistic.
I thought 1K was small enough that if we hit it we wouldn't have blown up
the call stack yet. But big enough that it is unlikely that it would be a
valid symbol (with that large a number of nested component types). The bug
we fix with this is a program trying to demangle a string that looks like
e.g. _Z3fnGGOGGG crashing because of stack overflow.

Cheers,

Mark

Re: [PATCH] libiberty: Limit demangler maximum d_print_comp recursion call depth.

2017-04-18 Thread Ian Lance Taylor via gcc-patches

On Tue, Apr 18, 2017 at 3:23 PM, Mark Wielaard  wrote:
> The fix for PR demangler/70909 and 67264 (endless demangler recursion)
> catches when a demangle_component is printed in a cycle. But that doesn't
> protect the call stack blowing up from non-cyclic nested types printed
> recursively through d_print_comp. This can happen by a (very) long mangled
> string that simply creates a very deep pointer or qualifier chain. Limit
> the recursive d_print_comp call depth for a d_print_info to 1K nested
> types.
>
> libiberty/ChangeLog:
>
> * cp-demangle.c (MAX_RECURSION_COUNT): New constant.
> (struct d_print_info): Add recursion field.
> (d_print_init): Initialize recursion.
> (d_print_comp): Check and update d_print_info recursion depth.

I'm probably missing something, but this kind of seems like an
arbitrary limit.  It's possible to imagine a rather unlikely valid
symbol that will no longer be demangled.  Why do we want to do this?
What bug are we fixing?

Ian

[PATCH] Fix PR80457

2017-04-18 Thread Bill Schmidt

Hi,

While investigating a performance issue, I happened to notice that vectorized
COND_EXPRs were not contributing to the vectorizer cost model.  This patch
addresses that.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this ok for
trunk, or should it wait for GCC 8?

Thanks,
Bill


[gcc]

2017-04-18  Bill Schmidt  

PR tree-optimization/80457
* tree-vect-stmts.c (vectorizable_condition): Update the cost
model when vectorizing a COND_EXPR.

[gcc/testsuite]

2017-04-18  Bill Schmidt  

PR tree-optimization/80457
* gcc.target/powerpc/pr78604.c: Verify that vectorized COND_EXPRs
call vect_model_simple_cost.


Index: gcc/testsuite/gcc.target/powerpc/pr78604.c
===
--- gcc/testsuite/gcc.target/powerpc/pr78604.c  (revision 246948)
+++ gcc/testsuite/gcc.target/powerpc/pr78604.c  (working copy)
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize" } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fdump-tree-vect-details" } 
*/
 
 #ifndef SIZE
 #define SIZE 1024
@@ -110,3 +110,4 @@ uns_gte (UNS_TYPE val1, UNS_TYPE val2)
 /* { dg-final { scan-assembler-times {\mvcmpgtsd\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mvcmpgtud\M} 4 } } */
 /* { dg-final { scan-assembler-not   {\mvcmpequd\M} } } */
+/* { dg-final { scan-tree-dump-times "vect_model_simple_cost" 8 "vect" } } */
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 246948)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -7746,7 +7746,7 @@ vectorizable_condition (gimple *stmt, gimple_stmt_
   tree vec_compare;
   tree new_temp;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  enum vect_def_type dt, dts[4];
+  enum vect_def_type dt[2], dts[4];
   int ncopies;
   enum tree_code code, cond_code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   stmt_vec_info prev_stmt_info = NULL;
@@ -7813,10 +7813,10 @@ vectorizable_condition (gimple *stmt, gimple_stmt_
 return false;
 
   gimple *def_stmt;
-  if (!vect_is_simple_use (then_clause, stmt_info->vinfo, &def_stmt, &dt,
+  if (!vect_is_simple_use (then_clause, stmt_info->vinfo, &def_stmt, &dt[0],
   &vectype1))
 return false;
-  if (!vect_is_simple_use (else_clause, stmt_info->vinfo, &def_stmt, &dt,
+  if (!vect_is_simple_use (else_clause, stmt_info->vinfo, &def_stmt, &dt[1],
   &vectype2))
 return false;
 
@@ -7900,8 +7900,12 @@ vectorizable_condition (gimple *stmt, gimple_stmt_
return false;
}
}
-  return expand_vec_cond_expr_p (vectype, comp_vectype,
-cond_code);
+  if (expand_vec_cond_expr_p (vectype, comp_vectype, cond_code))
+   {
+ vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+ return true;
+   }
+  return false;
 }
 
   /* Transform.  */

[PATCH] libiberty: Limit demangler maximum d_print_comp recursion call depth.

2017-04-18 Thread Mark Wielaard

The fix for PR demangler/70909 and 67264 (endless demangler recursion)
catches when a demangle_component is printed in a cycle. But that doesn't
protect the call stack blowing up from non-cyclic nested types printed
recursively through d_print_comp. This can happen by a (very) long mangled
string that simply creates a very deep pointer or qualifier chain. Limit
the recursive d_print_comp call depth for a d_print_info to 1K nested
types.

libiberty/ChangeLog:

* cp-demangle.c (MAX_RECURSION_COUNT): New constant.
(struct d_print_info): Add recursion field.
(d_print_init): Initialize recursion.
(d_print_comp): Check and update d_print_info recursion depth.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index aeff7a7..e1db900 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -319,6 +319,9 @@ struct d_info_checkpoint
   int expansion;
 };
 
+/* Maximum number of times d_print_comp may be called recursively.  */
+#define MAX_RECURSION_COUNT 1024
+
 enum { D_PRINT_BUFFER_LENGTH = 256 };
 struct d_print_info
 {
@@ -341,6 +344,9 @@ struct d_print_info
   struct d_print_mod *modifiers;
   /* Set to 1 if we saw a demangling error.  */
   int demangle_failure;
+  /* Number of times d_print_comp was recursively called.  Should not
+ be bigger than MAX_RECURSION_COUNT.  */
+  int recursion;
   /* Non-zero if we're printing a lambda argument.  A template
  parameter reference actually means 'auto'.  */
   int is_lambda_arg;
@@ -4151,6 +4157,7 @@ d_print_init (struct d_print_info *dpi, 
demangle_callbackref callback,
   dpi->opaque = opaque;
 
   dpi->demangle_failure = 0;
+  dpi->recursion = 0;
   dpi->is_lambda_arg = 0;
 
   dpi->component_stack = NULL;
@@ -5685,13 +5692,14 @@ d_print_comp (struct d_print_info *dpi, int options,
  struct demangle_component *dc)
 {
   struct d_component_stack self;
-  if (dc == NULL || dc->d_printing > 1)
+  if (dc == NULL || dc->d_printing > 1 || dpi->recursion > MAX_RECURSION_COUNT)
 {
   d_print_error (dpi);
   return;
 }
-  else
-dc->d_printing++;
+
+  dc->d_printing++;
+  dpi->recursion++;
 
   self.dc = dc;
   self.parent = dpi->component_stack;
@@ -5701,6 +5709,7 @@ d_print_comp (struct d_print_info *dpi, int options,
 
   dpi->component_stack = self.parent;
   dc->d_printing--;
+  dpi->recursion--;
 }
 
 /* Print a Java dentifier.  For Java we try to handle encoded extended

Re: [PATCH] libiberty: Always return NULL if d_add_substitution fails.

2017-04-18 Thread Ian Lance Taylor via gcc-patches

On Tue, Apr 18, 2017 at 2:21 PM, Mark Wielaard  wrote:
> d_add_substitution can fail for various reasons, like when the subs array
> is full. If d_add_substitution fails d_substitution should return NULL
> early and not try to continue. Every other call of d_add_substitution
> is handled in the same way.
>
> libiberty/ChangeLog:
>
> * cp-demangle.c (d_substitution): Return NULL if d_add_substitution
> fails.

This is OK.

Thanks.

Ian

Re: [PATCH GCC8][33/33]Fix PR69710/PR68030 by reassociate vect base address and a simple CSE pass

2017-04-18 Thread Michael Meissner

I did a bootstrap and make check-{gcc,c++,fortran,lto} comparing the results to
the baseline (subversion id 246975).

There were 2 differences:

The baseline failed on gcc.dg/sms-4.c but succeeded on gcc.dg/sms-1.c.

Here are the sms-[14] lines from the baseline:

Executing on host: /home/meissner/fsf-build-ppc64le/trunk-246975/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/trunk-246975/gcc/ 
/home/meissner/fsf-src/trunk-246975/gcc/testsuite/gcc.dg/sms-4.c  
-fno-diagnostics-show-caret
 -fdiagnostics-color=never  -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 -ffat-lto-objects  -lm-o ./sms-4.exe
(timeout = 300)
spawn /home/meissner/fsf-build-ppc64le/trunk-246975/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/trunk-246975/gcc/ 
/home/meissner/fsf-src/trunk-246975/gcc/testsuite/gcc.dg/sms-4.c 
-fno-diagnostics-show-caret -fdiagnostics
-color=never -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves -fdump-rtl-sms 
--param sms-min-sc=1 -ffat-lto-objects -lm -o ./sms-4.exe
PASS: gcc.dg/sms-4.c (test for excess errors)
Setting LD_LIBRARY_PATH to 
:/home/meissner/fsf-build-ppc64le/trunk-246975/gcc:/home/meissner/fsf-build-ppc64le/trunk-246975/powerpc64le-unknown-linux-gnu/./libatomic/.libs::/home/meissner/fsf-build-ppc64le/trunk-246975/g
cc:/home/meissner/fsf-build-ppc64le/trunk-246975/powerpc64le-unknown-linux-gnu/./libatomic/.libs:
spawn [open ...]
PASS: gcc.dg/sms-4.c execution test
FAIL: gcc.dg/sms-4.c scan-rtl-dump-times sms "SMS succeeded" 1

Executing on host: /home/meissner/fsf-build-ppc64le/trunk-246975/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/trunk-246975/gcc/ 
/home/meissner/fsf-src/trunk-246975/gcc/testsuite/gcc.dg/sms-1.c  
-fno-diagnostics-show-caret -fdiagnostics-color=never  -O2 -fmodulo-sched 
-fmodulo-sched-allow-regmoves -fdump-rtl-sms -ffat-lto-objects  -lm-o 
./sms-1.exe(timeout = 300)
spawn /home/meissner/fsf-build-ppc64le/trunk-246975/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/trunk-246975/gcc/ 
/home/meissner/fsf-src/trunk-246975/gcc/testsuite/gcc.dg/sms-1.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -fmodulo-sched 
-fmodulo-sched-allow-regmoves -fdump-rtl-sms -ffat-lto-objects -lm -o 
./sms-1.exe
PASS: gcc.dg/sms-1.c (test for excess errors)
Setting LD_LIBRARY_PATH to 
:/home/meissner/fsf-build-ppc64le/trunk-246975/gcc:/home/meissner/fsf-build-ppc64le/trunk-246975/powerpc64le-unknown-linux-gnu/./libatomic/.libs::/home/meissner/fsf-build-ppc64le/trunk-246975/gcc:/home/meissner/fsf-build-ppc64le/trunk-246975/powerpc64le-unknown-linux-gnu/./libatomic/.libs:
spawn [open ...]
PASS: gcc.dg/sms-1.c execution test
PASS: gcc.dg/sms-1.c scan-rtl-dump-times sms "SMS succeeded" 1

While here are the lines from the ivopts run:

Executing on host: /home/meissner/fsf-build-ppc64le/ivopts/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/ivopts/gcc/ 
/home/meissner/fsf-src/ivopts/gcc/testsuite/gcc.dg/sms-1.c  
-fno-diagnostics-show-caret -fdiagnostics-color=never  -O2 -fmodulo-sched 
-fmodulo-sched-allow-regmoves -fdump-rtl-sms -ffat-lto-objects  -lm-o 
./sms-1.exe(timeout = 300)
spawn /home/meissner/fsf-build-ppc64le/ivopts/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/ivopts/gcc/ 
/home/meissner/fsf-src/ivopts/gcc/testsuite/gcc.dg/sms-1.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -fmodulo-sched 
-fmodulo-sched-allow-regmoves -fdump-rtl-sms -ffat-lto-objects -lm -o 
./sms-1.exe
PASS: gcc.dg/sms-1.c (test for excess errors)
Setting LD_LIBRARY_PATH to 
:/home/meissner/fsf-build-ppc64le/ivopts/gcc:/home/meissner/fsf-build-ppc64le/ivopts/powerpc64le-unknown-linux-gnu/./libatomic/.libs::/home/meissner/fsf-build-ppc64le/ivopts/gcc:/home/meissner/fsf-build-ppc64le/ivopts/powerpc64le-unknown-linux-gnu/./libatomic/.libs:
spawn [open ...]
PASS: gcc.dg/sms-1.c execution test
FAIL: gcc.dg/sms-1.c scan-rtl-dump-times sms "SMS succeeded" 1

Executing on host: /home/meissner/fsf-build-ppc64le/ivopts/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/ivopts/gcc/ 
/home/meissner/fsf-src/ivopts/gcc/testsuite/gcc.dg/sms-4.c  
-fno-diagnostics-show-caret -fdiagnostics-color=never  -O2 -fmodulo-sched 
-fmodulo-sched-allow-regmoves -fdump-rtl-sms --param sms-min-sc=1 
-ffat-lto-objects  -lm-o ./sms-4.exe(timeout = 300)
spawn /home/meissner/fsf-build-ppc64le/ivopts/gcc/xgcc 
-B/home/meissner/fsf-build-ppc64le/ivopts/gcc/ 
/home/meissner/fsf-src/ivopts/gcc/testsuite/gcc.dg/sms-4.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -fmodulo-sched 
-fmodulo-sched-allow-regmoves -fdump-rtl-sms --param sms-min-sc=1 
-ffat-lto-objects -lm -o ./sms-4.exe
PASS: gcc.dg/sms-4.c (test for excess errors)
Setting LD_LIBRARY_PATH to 
:/home/meissner/fsf-build-ppc64le/ivopts/gcc:/home/meissner/fsf-build-ppc64le/ivopts/powerpc64le-unknown-linux-gnu/./libatomic/.libs::/home/meissner/fsf-build-ppc64le/ivopts/gcc:/home/meissner/fsf-build-ppc64le/ivopts/powerpc64le-unknown-linux-gnu/./libatomic/.libs:
spawn [open ...]
PASS: gcc.dg/sms-4.c executio

[PATCH] libiberty: Always return NULL if d_add_substitution fails.

2017-04-18 Thread Mark Wielaard

d_add_substitution can fail for various reasons, like when the subs array
is full. If d_add_substitution fails d_substitution should return NULL
early and not try to continue. Every other call of d_add_substitution
is handled in the same way.

libiberty/ChangeLog:

* cp-demangle.c (d_substitution): Return NULL if d_add_substitution
fails.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 2c7d5c5..aeff7a7 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -3891,7 +3891,8 @@ d_substitution (struct d_info *di, int prefix)
  /* If there are ABI tags on the abbreviation, it becomes
 a substitution candidate.  */
  dc = d_abi_tags (di, dc);
- d_add_substitution (di, dc);
+ if (! d_add_substitution (di, dc))
+   return NULL;
}
  return dc;
}

Re: [PATCH] libiberty: Don't update and remove did_subs field from demangler structs.

2017-04-18 Thread Ian Lance Taylor via gcc-patches

On Tue, Apr 18, 2017 at 1:56 PM, Mark Wielaard  wrote:
>
> On Sat, Dec 03, 2016 at 10:54:58PM +0100, Mark Wielaard wrote:
>> The d_info field did_subs was used for estimating the string output
>> size. It was no longer used when the malloc-less callback interface
>> was introduced in 2007 (svn r121305). But the field was still updated.
>> When backtracking was introduced in 2013 (svn r205292) did_subs was
>> also added to the d_info_checkpoint struct. But except for updating
>> the field it was still not used.
>>
>> Since it is never used just stop updating the field and remove it
>> from the two structs.
>>
>> libiberty/ChangeLog:
>>
>>   * cp-demangle.h (struct d_info): Remove did_subs field.
>>   * cp-demangle.c (struct d_info_checkpoint): Likewise.
>>   (d_template_param): Don't update did_subs.
>>   (d_substitution): Likewise.
>>   (d_checkpoint): Don't assign did_subs.
>>   (d_backtrack): Likewise.
>>   (cplus_demangle_init_info): Don't initialize did_subs.
>
> Ping. Does this look OK to commit?

This is fine when we are back in stage 1.

Thanks.

Ian

Re: [PATCH] libiberty: Don't update and remove did_subs field from demangler structs.

2017-04-18 Thread Mark Wielaard

Hi,

On Sat, Dec 03, 2016 at 10:54:58PM +0100, Mark Wielaard wrote:
> The d_info field did_subs was used for estimating the string output
> size. It was no longer used when the malloc-less callback interface
> was introduced in 2007 (svn r121305). But the field was still updated.
> When backtracking was introduced in 2013 (svn r205292) did_subs was
> also added to the d_info_checkpoint struct. But except for updating
> the field it was still not used.
> 
> Since it is never used just stop updating the field and remove it
> from the two structs.
> 
> libiberty/ChangeLog:
> 
>   * cp-demangle.h (struct d_info): Remove did_subs field.
>   * cp-demangle.c (struct d_info_checkpoint): Likewise.
>   (d_template_param): Don't update did_subs.
>   (d_substitution): Likewise.
>   (d_checkpoint): Don't assign did_subs.
>   (d_backtrack): Likewise.
>   (cplus_demangle_init_info): Don't initialize did_subs.

Ping. Does this look OK to commit?

> ---
>  libiberty/cp-demangle.c | 8 
>  libiberty/cp-demangle.h | 4 
>  2 files changed, 12 deletions(-)
> 
> diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
> index 45663fe..c628dd6 100644
> --- a/libiberty/cp-demangle.c
> +++ b/libiberty/cp-demangle.c
> @@ -317,7 +317,6 @@ struct d_info_checkpoint
>const char *n;
>int next_comp;
>int next_sub;
> -  int did_subs;
>int expansion;
>  };
>  
> @@ -3062,8 +3061,6 @@ d_template_param (struct d_info *di)
>if (param < 0)
>  return NULL;
>  
> -  ++di->did_subs;
> -
>return d_make_template_param (di, param);
>  }
>  
> @@ -3815,8 +3812,6 @@ d_substitution (struct d_info *di, int prefix)
>if (id >= (unsigned int) di->next_sub)
>   return NULL;
>  
> -  ++di->did_subs;
> -
>return di->subs[id];
>  }
>else
> @@ -3881,7 +3876,6 @@ d_checkpoint (struct d_info *di, struct 
> d_info_checkpoint *checkpoint)
>checkpoint->n = di->n;
>checkpoint->next_comp = di->next_comp;
>checkpoint->next_sub = di->next_sub;
> -  checkpoint->did_subs = di->did_subs;
>checkpoint->expansion = di->expansion;
>  }
>  
> @@ -3891,7 +3885,6 @@ d_backtrack (struct d_info *di, struct 
> d_info_checkpoint *checkpoint)
>di->n = checkpoint->n;
>di->next_comp = checkpoint->next_comp;
>di->next_sub = checkpoint->next_sub;
> -  di->did_subs = checkpoint->did_subs;
>di->expansion = checkpoint->expansion;
>  }
>  
> @@ -6106,7 +6099,6 @@ cplus_demangle_init_info (const char *mangled, int 
> options, size_t len,
>   chars in the mangled string.  */
>di->num_subs = len;
>di->next_sub = 0;
> -  di->did_subs = 0;
>  
>di->last_name = NULL;
>  
> diff --git a/libiberty/cp-demangle.h b/libiberty/cp-demangle.h
> index 197883e..f197f99 100644
> --- a/libiberty/cp-demangle.h
> +++ b/libiberty/cp-demangle.h
> @@ -112,10 +112,6 @@ struct d_info
>int next_sub;
>/* The number of available entries in the subs array.  */
>int num_subs;
> -  /* The number of substitutions which we actually made from the subs
> - array, plus the number of template parameter references we
> - saw.  */
> -  int did_subs;
>/* The last name we saw, for constructors and destructors.  */
>struct demangle_component *last_name;
>/* A running total of the length of large expansions from the
> -- 
> 1.8.3.1

Re: [PATCH GCC8][33/33]Fix PR69710/PR68030 by reassociate vect base address and a simple CSE pass

2017-04-18 Thread Michael Meissner

By the way, if anybody else wanted to try the patches, but did not want to
apply all 33 patches, I cloned the trunk and applied the patches to the
following branch:
svn+ssh://gcc.gnu.org/svn/gcc/branches/ibm/ivopts

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH GCC8][33/33]Fix PR69710/PR68030 by reassociate vect base address and a simple CSE pass

2017-04-18 Thread Michael Meissner

I was attempting to build a spec 2006 run on a little endian power8 PowerPC
system with all of your patches installed.  I had problems with the dealII
benchmark:

--> ./xgcc -B./ -O2 -S function_lib.ii -ftree-vectorize
function_lib.cc: In member function ‘Tensor<2, dim> 
Functions::CosineFunction::hessian(const Point&, unsigned int) const 
[with int dim = 3]’:
function_lib.cc:518:3: error: RESULT_DECL should be read only when 
DECL_BY_REFERENCE is set
while verifying SSA_NAME result_43 in statement
result_43 = result_68 + 16;
function_lib.cc:518:3: internal compiler error: verify_ssa failed
0x10e1535b verify_ssa(bool, bool)
/home/meissner/fsf-src/ivopts/gcc/tree-ssa.c:1184
0x10a4f9db execute_function_todo
/home/meissner/fsf-src/ivopts/gcc/passes.c:1973
0x10a506d3 do_per_function
/home/meissner/fsf-src/ivopts/gcc/passes.c:1650
0x10a50953 execute_todo
/home/meissner/fsf-src/ivopts/gcc/passes.c:2016
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

I'll attach a bzip2'ed version of the function_lib.ii pre-processed file that I
used to this mail message.

I'll run the other benchmarks and compare it to the baseline version to see
what improvements there are.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797


function_lib.ii.bz2
Description: BZip2 compressed data

Re: [PATCH] PR80101: Fix ICE in store_data_bypass_p

2017-04-18 Thread Eric Botcazou

[Sorry for the long delay]

> Why is it nonsense?  The predicate gives the answer to the question
> "given these insns A and B, does A feed data that B stores in memory".
> That is a perfectly valid question to ask of any two insns.

I disagree, for example it's nonsensical to send it a blockage insn.

> There are workarounds to this problem as well: mips_store_data_bypass_p,
> added in 2006.  mep_store_data_bypass_p, added in 2009 (the port has
> been removed since then, of course).

I see, no strong opinion then, but individual back-ends should be preferably 
fixed if this is easily doable instead of changing the middle-end, which may 
affect the other ~50 back-ends.

-- 
Eric Botcazou

Re: [PATCH] Fix VRP intersect_ranges for 1-bit precision signed types (PR tree-optimization/80443)

2017-04-18 Thread Richard Biener

On April 18, 2017 5:06:48 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>As can be seen on the testcase, intersect_ranges in some cases attempts
>to add or subtract 1 from one of the bounds.  That is fine except for
>1-bit signed type, where 1 is not a value in the range of the type,
>so already build_int_cst yields (OVF) constant.
>
>The following patch fixes it by special casing those, instead of
>adding/subtracting 1 for those types it subtracts/adds -1.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ick.

OK.

Richard.

>2017-04-18  Jakub Jelinek  
>
>   PR tree-optimization/80443
>   * tree-vrp.c (intersect_ranges): For signed 1-bit precision type,
>   instead of adding 1, subtract -1 and similarly instead of subtracting
>   1 add -1.
>
>   * gcc.c-torture/compile/pr80443.c: New test.
>
>--- gcc/tree-vrp.c.jj  2017-03-23 15:49:55.0 +0100
>+++ gcc/tree-vrp.c 2017-04-18 10:09:44.560549718 +0200
>@@ -8756,20 +8756,32 @@ intersect_ranges (enum value_range_type
> /* Choose the right gap if the left one is empty.  */
> if (mineq)
>   {
>-if (TREE_CODE (vr1max) == INTEGER_CST)
>-  *vr0min = int_const_binop (PLUS_EXPR, vr1max,
>- build_int_cst (TREE_TYPE (vr1max), 
>1));
>-else
>+if (TREE_CODE (vr1max) != INTEGER_CST)
>   *vr0min = vr1max;
>+else if (TYPE_PRECISION (TREE_TYPE (vr1max)) == 1
>+ && !TYPE_UNSIGNED (TREE_TYPE (vr1max)))
>+  *vr0min
>+= int_const_binop (MINUS_EXPR, vr1max,
>+   build_int_cst (TREE_TYPE (vr1max), -1));
>+else
>+  *vr0min
>+= int_const_binop (PLUS_EXPR, vr1max,
>+   build_int_cst (TREE_TYPE (vr1max), 1));
>   }
> /* Choose the left gap if the right one is empty.  */
> else if (maxeq)
>   {
>-if (TREE_CODE (vr1min) == INTEGER_CST)
>-  *vr0max = int_const_binop (MINUS_EXPR, vr1min,
>- build_int_cst (TREE_TYPE (vr1min), 
>1));
>-else
>+if (TREE_CODE (vr1min) != INTEGER_CST)
>   *vr0max = vr1min;
>+else if (TYPE_PRECISION (TREE_TYPE (vr1min)) == 1
>+ && !TYPE_UNSIGNED (TREE_TYPE (vr1min)))
>+  *vr0max
>+= int_const_binop (PLUS_EXPR, vr1min,
>+   build_int_cst (TREE_TYPE (vr1min), -1));
>+else
>+  *vr0max
>+= int_const_binop (MINUS_EXPR, vr1min,
>+   build_int_cst (TREE_TYPE (vr1min), 1));
>   }
> /* Choose the anti-range if the range is effectively varying.  */
> else if (vrp_val_is_min (*vr0min)
>@@ -8811,22 +8823,34 @@ intersect_ranges (enum value_range_type
> if (mineq)
>   {
> *vr0type = VR_RANGE;
>-if (TREE_CODE (*vr0max) == INTEGER_CST)
>-  *vr0min = int_const_binop (PLUS_EXPR, *vr0max,
>- build_int_cst (TREE_TYPE (*vr0max), 
>1));
>-else
>+if (TREE_CODE (*vr0max) != INTEGER_CST)
>   *vr0min = *vr0max;
>+else if (TYPE_PRECISION (TREE_TYPE (*vr0max)) == 1
>+ && !TYPE_UNSIGNED (TREE_TYPE (*vr0max)))
>+  *vr0min
>+= int_const_binop (MINUS_EXPR, *vr0max,
>+   build_int_cst (TREE_TYPE (*vr0max), -1));
>+else
>+  *vr0min
>+= int_const_binop (PLUS_EXPR, *vr0max,
>+   build_int_cst (TREE_TYPE (*vr0max), 1));
> *vr0max = vr1max;
>   }
> /* Choose the left gap if the right is empty.  */
> else if (maxeq)
>   {
> *vr0type = VR_RANGE;
>-if (TREE_CODE (*vr0min) == INTEGER_CST)
>-  *vr0max = int_const_binop (MINUS_EXPR, *vr0min,
>- build_int_cst (TREE_TYPE (*vr0min), 
>1));
>-else
>+if (TREE_CODE (*vr0min) != INTEGER_CST)
>   *vr0max = *vr0min;
>+else if (TYPE_PRECISION (TREE_TYPE (*vr0min)) == 1
>+ && !TYPE_UNSIGNED (TREE_TYPE (*vr0min)))
>+  *vr0max
>+= int_const_binop (PLUS_EXPR, *vr0min,
>+   build_int_cst (TREE_TYPE (*vr0min), -1));
>+else
>+  *vr0max
>+= int_const_binop (MINUS_EXPR, *vr0min,
>+   build_int_cst (TREE_TYPE (*vr0min), 1));
> *vr0min = vr1min;
>   }
> /* Choose the anti-range if the range is effectively varying.  */
>--- gcc/testsuite/gcc.c-torture/compile/pr80443.c.jj   2017-04-18
>10:16:35.867952

Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]

2017-04-18 Thread Sandra Loosemore


On 04/18/2017 12:30 PM, Denys Vlasenko wrote:


2017-04-18  Denys Vlasenko  

 * doc/invoke.texi: Update option documentation.
 [snip]


The documentation part of this version is OK.

-Sandra

Re: PR80357: Negative register pressure

2017-04-18 Thread Jeff Law


On 04/14/2017 04:37 AM, Richard Sandiford wrote:

In the PR testcase, there were two instructions that had
a large number of insn_reg_use records for the same register.
model_recompute was instead expecting the records to be unique
and so decremented the register pressure for each one.  We then
ended up with a negative pressure.

I think the records *should* be unique here, and that this is
really a bug in the generic -fsched-pressure code.  Making them
unique could be too invasive for GCC 7 though.  There are at
least two problems:

(1) sched-deps.c uses rtx_insn_lists instead of bitmaps to record
 the set of instructions that use a live register.
 sched-rgn.c then propagates this information between blocks
 in a region using list concatenation:

   succ_rl->uses = concat_INSN_LIST (pred_rl->uses, succ_rl->uses);

 So dependencies for common predecessors will appear multiple
 times in the list.

 In this case (and for the PR), it might be enough to make
 setup_insn_reg_uses detect duplicate entries.  However...

(2) setup_insn_reg_uses adds entries for all queued uses of a register
 at the point that the register is expected to die.  It looks like
 this doesn't work reliably for REG_INC registers: if a register R
 is auto-incremented in I1 and then used for the last time in I2,
 setup_insn_reg_uses will first treat the original R as dying
 in I1 (rightly IMO).  But that use of R in I1 is still in the
 reg_last->uses list when processing I2, so setup_insn_reg_uses
 adds it a second time.

There might be more reasons for multiple records: I stopped looking
at this point.

I think for GCC 7 it'd be more pragmatic to live with the duplicate
entries and keep the fix specific to SCHED_PRESSURE_MODEL.

Tested on aarch64-linux-gnu (which uses SCHED_PRESSURE_MODEL by default)
and on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR rtl-optimization/80357
* haifa-sched.c (tmp_bitmap): New variable.
(model_recompute): Handle duplicate use records.
(alloc_global_sched_pressure_data): Initialize tmp_bitmap.
(free_global_sched_pressure_data): Free it.

gcc/testsuite/
PR rtl-optimization/80357
* gcc.c-torture/compile/pr80357.c: New test.

OK.

I'm going to go ahead and commit.

Jeff

[PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]

2017-04-18 Thread Denys Vlasenko

falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte 
alignment
of functions runs fastest (he tried many other possibilites):
this way, after a call CPU can fetch a lot of insns in the first cacheline fill.

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
On 05/20/2015 02:47 AM, Linus Torvalds wrote:
> At the same time, I have to admit that I abhor a 64-byte function
> alignment, when we have a fair number of functions that are (much)
> smaller than that.
>
> Is there some way to get gcc to take the size of the function into
> account? Because aligning a 16-byte or 32-byte function on a 64-byte
> alignment is just criminally nasty and wasteful.

This change makes it possible to align functions to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Example syntax is -falign-functions=64,9: "align to 64 by skipping up to
9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able
to fetch at least 9 bytes of insns".

x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment.
For example, falign-functions=10 was emitting this before every function:
.p2align 4,,9
.p2align 3
This tweak was removed by the previous patch. Now it is reinstated
by the logic that if falign-functions=N[,M] is specified and N > 8,
then default value of N2 is 8, not 1. Now this can be suppressed by
falign-functions=N,M,1 - which wasn't possible before.
In general, optional N2,M2 pair can be used to generate any secondary
alignment user wants.

Subalignment for loops/jumps/labels are trickier to fully implement.
The implementation in this patch uses falign-labels subalignment values
for any of these three types of labels - but only if "main" alignment
triggers. With -O2 defaults, this provides a matching behavior on x86:
loops and jumps are aligned (to 16-32 bytes depending on selected CPU)
and subaligned to 8 bytes. Labels are not aligned.

Testing:

Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal parameters) works exactly
like -falign-functions=N.

No change from past behavior:
Tested that "-falign-functions" uses an arch-dependent alignment.
Tested that "-O2" uses an arch-dependent alignment.
Tested that "-O2 -falign-functions=N" uses explicitly given alignment.

2017-04-18  Denys Vlasenko  

* doc/invoke.texi: Update option documentation.
* common.opt (-falign-functions=): Accept a string instead of integer.
(-falign-jumps=): Likewise.
(-falign-labels=): Likewise.
(-falign-loops=): Likewise.
* flags.h (struct target_flag_state): Revamp how alignment data is stored:
for each of four alignment types, store two pairs of log/maxskip values.
* toplev.c (read_uint): New function.
(read_log_maxskip): New function.
(parse_N_M): New function.
(init_alignments): Rename to parse_alignment_opts, make globally visible.
Set align_foo[0/1].log/maxskip from
specified falign-FOO=N[,M[,N[,M]]] options.
* toplev.h (parse_alignment_opts): Now globally visible.
(min_align_loops_log): Variable which holds arch override for minimal
alignment of loops.
(min_align_jumps_log): Likewise for jumps.
(min_align_labels_log): Likewise for labels.
(min_align_functions_log): Likewise for functions.
* varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN
macros, first for N,M and second time for N2,M2 from
falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives.
* final.c (final_scan_insn): If a label, jump or loop target
is being aligned, emit a secondary alignment directive.
* config/i386/i386.c (struct ptt): Change foo_align members from
integers to strings. Add align_label member. Set it to "0,0,8"
on the processors which have maxskips > 7 for loops and jumps -
this preserves existing behaviout of adding 8-byte subalign.
* config/i386/i386.c (processor_target_table): Likewise.
* config/aarch64/aarch64-protos.h (struct tune_params):
Change foo_align members from integers to strings.
* config/aarch64/aarch64.c (_tunings):
Change foo_align field values from integers to strings.
* config/arm/arm.c (arm_override_options_after_change_1):
Fix if() condition to detect that -falign-functions is specified,
change code which sets arch-default alignment.
* config/i386/i386.c (ix86_default_align): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
* config/mips/mips.c (mips_set_compression_mode): Likewise.
* config/alpha/alpha.c (alpha_override_options_after_change): Likewise.
* config/visium/visium.c (visium_option_override): Likewise.
* config/sh/sh.c (sh_override_opti

[PATCH 1/3] Remove support for obsolete x86 -malign-foo options

2017-04-18 Thread Denys Vlasenko

2017-04-18  Denys Vlasenko  

* config/i386/i386-common.c (ix86_handle_option): Remove support
for obsolete -malign-loops, -malign-jumps and -malign-functions
options.
* config/i386/i386.opt: Likewise.

Index: gcc/common/config/i386/i386-common.c
===
--- gcc/common/config/i386/i386-common.c(revision 240663)
+++ gcc/common/config/i386/i386-common.c(working copy)
@@ -998,38 +998,6 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
-
-  /* Comes from final.c -- no real reason to change it.  */
-#define MAX_CODE_ALIGN 16
-
-case OPT_malign_loops_:
-  warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops");
-  if (value > MAX_CODE_ALIGN)
-   error_at (loc, "-malign-loops=%d is not between 0 and %d",
- value, MAX_CODE_ALIGN);
-  else
-   opts->x_align_loops = 1 << value;
-  return true;
-
-case OPT_malign_jumps_:
-  warning_at (loc, 0, "-malign-jumps is obsolete, use -falign-jumps");
-  if (value > MAX_CODE_ALIGN)
-   error_at (loc, "-malign-jumps=%d is not between 0 and %d",
- value, MAX_CODE_ALIGN);
-  else
-   opts->x_align_jumps = 1 << value;
-  return true;
-
-case OPT_malign_functions_:
-  warning_at (loc, 0,
- "-malign-functions is obsolete, use -falign-functions");
-  if (value > MAX_CODE_ALIGN)
-   error_at (loc, "-malign-functions=%d is not between 0 and %d",
- value, MAX_CODE_ALIGN);
-  else
-   opts->x_align_functions = 1 << value;
-  return true;
-
 case OPT_mbranch_cost_:
   if (value > 5)
{
Index: gcc/config/i386/i386.opt
===
--- gcc/config/i386/i386.opt(revision 240663)
+++ gcc/config/i386/i386.opt(working copy)
@@ -205,18 +205,6 @@ malign-double
 Target Report Mask(ALIGN_DOUBLE) Save
 Align some doubles on dword boundary.
 
-malign-functions=
-Target RejectNegative Joined UInteger
-Function starts are aligned to this power of 2.
-
-malign-jumps=
-Target RejectNegative Joined UInteger
-Jump targets are aligned to this power of 2.
-
-malign-loops=
-Target RejectNegative Joined UInteger
-Loop code aligned to this power of 2.
-
 malign-stringops
 Target RejectNegative Report InverseMask(NO_ALIGN_STRINGOPS, ALIGN_STRINGOPS) 
Save
 Align destination of the string operations.

[PATCH 2/3] Temporary remove "at least 8 byte alignment" code from x86

2017-04-18 Thread Denys Vlasenko

This change drops forced alignment to 8 if requested alignment is higher
than 8: before the patch, -falign-functions=9 was generating

.p2align 4,,8
.p2align 3

which means: "align to 16 if the skip is 8 bytes or less; else align to 8".
After this change, ".p2align 3" is not emitted.

This behavior will be implemented differently by the next patch.

The new SUBALIGN_LOG define will be used by the next patch.

While we are here, avoid generating ".p2align N,,2^N-1" -
it is functionally equivalent to ".p2align N". In this case, use the latter.

2017-04-18  Denys Vlasenko  

* config/i386/dragonfly.h: (ASM_OUTPUT_MAX_SKIP_ALIGN):
Use a simpler align directive also if MAXSKIP = ALIGN-1.
* config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
* config/i386/freebsd.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Remove "If N
is large, do at least 8 byte alignment" code. Add SUBALIGN_LOG
define. Use a simpler align directive also if MAXSKIP = ALIGN-1.
* config/i386/gnu-user.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/iamcu.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/openbsdelf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/x86-64.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.

Index: gcc/config/i386/dragonfly.h
===
--- gcc/config/i386/dragonfly.h (revision 239860)
+++ gcc/config/i386/dragonfly.h (working copy)
@@ -69,10 +69,12 @@ see the files COPYING3 and COPYING.RUNTIME respect
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #undef  ASM_OUTPUT_MAX_SKIP_ALIGN
-#define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE, LOG, MAX_SKIP) 
\
-  if ((LOG) != 0) {
\
-if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));   \
-else fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));   \
+#define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE, LOG, MAX_SKIP) \
+  if ((LOG) != 0) {\
+if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1) \
+  fprintf ((FILE), "\t.p2align %d\n", (LOG));  \
+else   \
+  fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));  \
   }
 #endif
 
Index: gcc/config/i386/freebsd.h
===
--- gcc/config/i386/freebsd.h   (revision 239860)
+++ gcc/config/i386/freebsd.h   (working copy)
@@ -92,9 +92,9 @@ along with GCC; see the file COPYING3.  If not see
 
 /* A C statement to output to the stdio stream FILE an assembler
command to advance the location counter to a multiple of 1<= (1<<(LOG))-1)   \
+   fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
+  else \
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
-   /* Make sure that we have at least 8 byte alignment if > 8 byte \
-  alignment is preferred.  */  \
-   if ((LOG) > 3   \
-   && (1 << (LOG)) > ((MAX_SKIP) + 1)  \
-   && (MAX_SKIP) >= 7) \
- fputs ("\t.p2align 3\n", (FILE)); \
-  }
\
 }  \
   } while (0)
 #endif
Index: gcc/config/i386/gas.h
===
--- gcc/config/i386/gas.h   (revision 239860)
+++ gcc/config/i386/gas.h   (working copy)
@@ -72,10 +72,12 @@ along with GCC; see the file COPYING3.  If not see
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #  define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP) \
- if ((LOG) != 0) {\
-   if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
-   else fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
- }
+if ((LOG) != 0) { \
+  if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)   \
+   fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
+  else \
+   fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
+}
 #endif
 
 /* A C statement or statements which output an assembler instruction
Index: gcc/config/i386/gnu-user.h
===
--- gcc/config/i386/gnu-user.h  (revision 239860

[PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8

2017-04-18 Thread Denys Vlasenko

These patches are for this bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66240
"RFE: extend -falign-xyz syntax"

An extended explanation is in commit message of patch 3.

The test program:

int g();
int f(int i) {
i *= 3;
while (--i > 100) {
 L1:if (g()) goto L1;
if (g()) goto L2;
}
return i;
 L2:return 123;
}

"-O2" assembly before the patch:After the patch:
.text   .text
.p2align 4,,15  .p2align 4
.globl  f   .globl  f
.type   f, @function.type   f, @function
f:  f:
.LFB0:  .LFB0:
pushq   %rbxpushq   %rbx
leal(%rdi,%rdi,2), %ebx leal(%rdi,%rdi,2), %ebx
.p2align 4,,10  .p2align 4,,10
.p2align 3  .p2align 3
.L2:.L2:
subl$1, %ebxsubl$1, %ebx
cmpl$100, %ebx  cmpl$100, %ebx
jle .L1 jle .L1
.p2align 4,,10  .p2align 4,,10
.p2align 3  .p2align 3
.L3:.L3:
xorl%eax, %eax  xorl%eax, %eax
callg   callg
testl   %eax, %eax  testl   %eax, %eax
jne .L3 jne .L3
callg   callg
testl   %eax, %eax  testl   %eax, %eax
je  .L2 je  .L2
movl$123, %ebx  movl$123, %ebx
.L4:.L4:
.L1:.L1:
movl%ebx, %eax  movl%ebx, %eax
popq%rbxpopq%rbx
ret ret

This is version 8 of the patch set.

Changes since version 7:

* Documentation fixes

Changes since version 6:

* Rediffed to accomodate changes introduced by recently introduced
  -flimit-function-alignment

Changes since version 5:

* Changes in rs6000, mips, alpha, visium, sh, rx, spu to accomodate
  new alignment options.
* Explicitly list secondary alignment of 8 ("n,m,8") in x86 tables
  for all types of jump targets.

Changes since version 4:

* Deleted rather than NOPed -malign-foo=N support.
* Improved behavior match with x86 8-byte subalignment for labels.

Changes since version 3:

* Improved documentation in invoke.texi
* Fixed x86-specific calculation of default N2 value:
  previous version was doing it incorrectly for cross-compile

Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]

2017-04-18 Thread Denys Vlasenko


On 04/17/2017 09:54 PM, Sandra Loosemore wrote:

  @item -falign-functions
  @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
+@itemx -falign-functions=@var{n},@var{m},@var{n2}
+@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
  @opindex falign-functions
  Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  Such alignment ensures that
+after branch, at least @var{m} bytes can be fetched by the CPU
+without crossing specified alignment boundary.


This last sentence doesn't make much sense to me.  How about something like

This ensures that at least the first @var{m} bytes of the function can be 
fetched by the CPU without crossing an @var{n}-byte alignment boundary.


-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
+Same for @var{m2} and @var{n2}.


You haven't said what m2 and n2 are yet.  The last sentence should be moved to 
the end of this paragraph instead.


+The second pair of @var{n2},@var{m2} values allows to have a secondary
+alignment: @option{-falign-functions=64,7,32,3} aligns to the next
+64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.


Also please
s/allows to have/allows you to specify/


@@ -8697,12 +8716,13 @@ skip more bytes than the size of the function.

  @item -falign-labels
  @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
+@itemx -falign-labels=@var{n},@var{m},@var{n2}
+@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
  @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.

+Parameters of this option are analogous to @option{-falign-functions} option.


s/to @option/to the @option/

Here and for -falign-loops and -falign-jumps too.


Thanks for the review.

I'm sending version 8 which has all of your changes incorporated.

Re: [PATCH] Fix fixincludes for canadian cross builds

2017-04-18 Thread Bernd Edlinger

On 04/14/17 12:29, Bernd Edlinger wrote:
> Hi RMs:
>
> I am sorry that this happened so close to the imminent gcc-7 release
> date.
>
> To my best knowledge it would be fine to apply this update patch on the
> trunk: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00649.html
>
> But if you decide otherwise, I am also ready to revert my original
> commit: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=245613
>
>
> Thanks
> Bernd.

Aehm, Sorry.

I just realized that the updated patch did still not yet work correctly
in all possible configurations...

I think this part of the configure.ac needs some more rework,
but that is probably not the right time for it.

Therefore I reverted r245613 for now.


I will post an updated patch at a later time.


Thanks
Bernd.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (Revision 246978)
+++ gcc/ChangeLog   (Revision 246979)
@@ -1,3 +1,11 @@
+2017-04-18  Bernd Edlinger  
+
+   Revert:
+   2017-02-20  Bernd Edlinger  
+   * Makefile.in (BUILD_SYSTEM_HEADER_DIR): New make variabe.
+   (LIMITS_H_TEST, if_multiarch, stmp-fixinc): Use BUILD_SYSTEM_HEADER_DIR
+   instead of SYSTEM_HEADER_DIR.
+
  2017-04-18  Jeff Law  

PR middle-end/80422
Index: gcc/Makefile.in
===
--- gcc/Makefile.in (Revision 246978)
+++ gcc/Makefile.in (Revision 246979)
@@ -517,18 +517,11 @@
  # macro is also used in a double-quoted context.
  SYSTEM_HEADER_DIR = `echo @SYSTEM_HEADER_DIR@ | sed -e :a -e 
's,[^/]*/\.\.\/,,' -e ta`

-# Path to the system headers on the build machine
-ifeq ($(build),$(host))
-BUILD_SYSTEM_HEADER_DIR = $(SYSTEM_HEADER_DIR)
-else
-BUILD_SYSTEM_HEADER_DIR = `echo $(CROSS_SYSTEM_HEADER_DIR) | sed -e :a 
-e 's,[^/]*/\.\.\/,,' -e ta`
-endif
-
  # Control whether to run fixincludes.
  STMP_FIXINC = @STMP_FIXINC@

  # Test to see whether  exists in the system header files.
-LIMITS_H_TEST = [ -f $(BUILD_SYSTEM_HEADER_DIR)/limits.h ]
+LIMITS_H_TEST = [ -f $(SYSTEM_HEADER_DIR)/limits.h ]

  # Directory for prefix to system directories, for
  # each of $(system_prefix)/usr/include, $(system_prefix)/usr/lib, etc.
@@ -579,7 +572,7 @@
  else
ifeq ($(enable_multiarch),auto)
  # SYSTEM_HEADER_DIR is makefile syntax, cannot be evaluated in 
configure.ac
-if_multiarch = $(if $(wildcard $(shell echo 
$(BUILD_SYSTEM_HEADER_DIR))/../../usr/lib/*/crti.o),$(1))
+if_multiarch = $(if $(wildcard $(shell echo 
$(SYSTEM_HEADER_DIR))/../../usr/lib/*/crti.o),$(1))
else
  if_multiarch =
endif
@@ -2999,11 +2992,11 @@
sysroot_headers_suffix=`echo $${ml} | sed -e 's/;.*$$//'`; \
multi_dir=`echo $${ml} | sed -e 's/^[^;]*;//'`; \
fix_dir=include-fixed$${multi_dir}; \
-   if ! $(inhibit_libc) && test ! -d ${BUILD_SYSTEM_HEADER_DIR}; then \
+   if ! $(inhibit_libc) && test ! -d ${SYSTEM_HEADER_DIR}; then \
  echo The directory that should contain system headers does not 
exist: >&2 ; \
- echo "  ${BUILD_SYSTEM_HEADER_DIR}" >&2 ; \
+ echo "  ${SYSTEM_HEADER_DIR}" >&2 ; \
  tooldir_sysinc=`echo "${gcc_tooldir}/sys-include" | sed -e :a 
-e "s,[^/]*/\.\.\/,," -e ta`; \
- if test "x${BUILD_SYSTEM_HEADER_DIR}" = "x$${tooldir_sysinc}"; \
+ if test "x${SYSTEM_HEADER_DIR}" = "x$${tooldir_sysinc}"; \
  then sleep 1; else exit 1; fi; \
fi; \
$(mkinstalldirs) $${fix_dir}; \
@@ -3014,7 +3007,7 @@
  export TARGET_MACHINE srcdir SHELL MACRO_LIST && \
  cd $(build_objdir)/fixincludes && \
  $(SHELL) ./fixinc.sh "$${gcc_dir}/$${fix_dir}" \
-   $(BUILD_SYSTEM_HEADER_DIR) $(OTHER_FIXINCLUDES_DIRS) ); \
+   $(SYSTEM_HEADER_DIR) $(OTHER_FIXINCLUDES_DIRS) ); \
rm -f $${fix_dir}/syslimits.h; \
if [ -f $${fix_dir}/limits.h ]; then \
  mv $${fix_dir}/limits.h $${fix_dir}/syslimits.h; \

Re: Fix ICE with -fauto-profile when walking vdefs

2017-04-18 Thread Sebastian Pop

On Mon, Apr 3, 2017 at 5:34 AM, Richard Biener  wrote:
> On Fri, 31 Mar 2017, Sebastian Pop wrote:
>
>> On Fri, Mar 31, 2017 at 12:06 PM, Richard Biener  wrote:
> Does the following fix it?
>
> Index: gcc/auto-profile.c
> ===
> --- gcc/auto-profile.c  (revision 246642)
> +++ gcc/auto-profile.c  (working copy)
> @@ -1511,7 +1511,9 @@ afdo_vpt_for_early_inline (stmt_set *pro
>
>if (has_vpt)
>  {
> -  optimize_inline_calls (current_function_decl);
> +  unsigned todo = optimize_inline_calls (current_function_decl);
> +  if (todo & TODO_update_ssa_any)
> +   update_ssa (TODO_update_ssa);
>return true;
>  }

Yes, this patch solves the problem, and this is also what Dehao has
suggested in his last comment in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

Thanks,
Sebastian

>
>
> afdo really _grossly_ over-does inlining.  And it looks like a total
> hack to me.
>
> It iterates PARAM_EARLY_INLINER_MAX_ITERATIONS but early_inliner does
> that itself already..

Re: [RFA][PATCH][P2][PR middle-end/80422] Fix crossjumping ICE

2017-04-18 Thread Jakub Jelinek

On Mon, Apr 17, 2017 at 10:20:28AM -0600, Jeff Law wrote:
>   PR middle-end/80422
>   * cfgcleanup.c (try_crossjump_to_edge): Verify SRC1 and SRC2 have
>   predecessors after walking up the insn chain.
> 
> 
>   PR middle-end/80422
>   * gcc.c-torture/compile/pr80422.c: New test.

Ok, thanks.

> diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
> index d55b0ce..f68a964 100644
> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -2017,6 +2017,11 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>if (newpos2 != NULL_RTX)
>  src2 = BLOCK_FOR_INSN (newpos2);
>  
> +  /* Check that SRC1 and SRC2 have preds again.  They may have changed
> + above due to the call to flow_find_cross_jump.  */
> +  if (EDGE_COUNT (src1->preds) == 0 || EDGE_COUNT (src2->preds) == 0)
> +return false;
> +
>if (dir == dir_backward)
>  {
>  #define SWAP(T, X, Y) do { T tmp = (X); (X) = (Y); (Y) = tmp; } while (0)

Jakub

Re: Patch ping^2

2017-04-18 Thread Jeff Law


On 04/18/2017 07:55 AM, Jakub Jelinek wrote:

I'd like to ping following patch:

PR debug/80263
http://gcc.gnu.org/ml/gcc-patches/2017-04/msg4.html
avoid emitting sizetype artificial name into debug info

OK.
jeff

[PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-04-18 Thread Sudi Das


Hello all

This patch adds the support for BIC (vector, immediate) and ORR (vector, 
immediate) SIMD patterns to the AArch64 backend.
One of the examples of this is : (with -O2 -ftree-vectorize)

void
bic_s (short *a)
{
  for (int i = 0; i < 1024; i++)
a[i] &= ~(0xff);
}

which now produces :
bic_s:
add x1, x0, 2048
.p2align 2
.L2:
ldr q0, [x0]
bic v0.8h, #255
str q0, [x0], 16
cmp x1, x0
bne .L2
ret

instead of
bic_s:
moviv1.8h, 0xff, lsl 8
add x1, x0, 2048
.p2align 2
.L2:
ldr q0, [x0]
and v0.16b, v0.16b, v1.16b
str q0, [x0], 16
cmp x1, x0
bne .L2
ret

Added new tests and checked for regressions on bootstrapped 
aarch64-none-linux-gnu
Ok for stage 1?

Thanks 
Sudi

2017-04-04 Sudakshina Das  

* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
check type
for aarch64_simd_valid_immediate.
(aarch64_output_simd_general_immediate): New declaration.
(aarch64_simd_valid_immediate): Update prototype.

* config/aarch64/aarch64-simd.md (*bic_imm_3): New pattern.
(*ior_imm_3): Likewise.

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
checks
for valid immediate for BIC and ORR based on new enum argument.
(aarch64_output_simd_general_immediate): New function to output new 
BIC/ORR.
 
* config/aarch64/predicates.md (aarch64_simd_valid_bic_imm_p) : New.
(aarch64_simd_valid_orr_imm_p) : Likewise.

2017-04-04 Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9543f8c..89cc455 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -297,6 +297,15 @@ enum aarch64_parse_opt_result
   AARCH64_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
 };
 
+/* Enum to distinguish which type of check is to be done in
+   aarch64_simd_valid_immediate.  This is used as a bitmask where CHECK_ALL
+   has both bits set.  Adding new types would require changes accordingly.  */
+enum simd_immediate_check {
+  CHECK_I   = 1,	/* Perform only non-inverted immediate checks (ORR).  */
+  CHECK_NI  = 2,	/* Perform only inverted immediate checks (BIC).  */
+  CHECK_ALL = 3		/* Perform all checks (MOVI/MNVI).  */
+};
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -334,6 +343,8 @@ rtx aarch64_reverse_mask (enum machine_mode);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode);
 char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned);
+char *aarch64_output_simd_general_immediate (rtx, machine_mode, unsigned,
+	 const char*);
 bool aarch64_pad_arg_upward (machine_mode, const_tree);
 bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
 bool aarch64_regno_ok_for_base_p (int, bool);
@@ -345,7 +356,8 @@ bool aarch64_simd_imm_zero_p (rtx, machine_mode);
 bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode);
 bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_simd_valid_immediate (rtx, machine_mode, bool,
-   struct simd_immediate_info *);
+   struct simd_immediate_info *,
+   enum simd_immediate_check w = CHECK_ALL);
 bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c462164..92275dc 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -280,6 +280,26 @@
   [(set_attr "type" "neon_logic")]
 )
 
+(define_insn "*bic_imm_3"
+ [(set (match_operand:VDQ_I 0 "register_operand" "=w")
+   (and:VDQ_I (match_operand:VDQ_I 1 "register_operand" "0")
+		(match_operand:VDQ_I 2 "aarch64_simd_valid_bic_imm_p" "")))]
+ "TARGET_SIMD"
+ { return aarch64_output_simd_general_immediate (operands[2],
+			mode, GET_MODE_BITSIZE (mode), "bic"); }
+  [(set_attr "type" "neon_logic")]
+)
+
+(define_insn "*ior_imm_3"
+ [(set (match_operand:VDQ_I 0 "register_operand" "=w")
+   (ior:VDQ_I (match_operand:VDQ_I 1 "register_operand" "0")
+		(match_operand:VDQ_I 2 "aarch64_simd_valid_orr_imm_p" "")))]
+ "TARGET_SIMD"
+ { return aarch64_output_simd_general_immediate (operands[2],
+			mode, GET_MODE_BITSIZE (mode), "orr"); }
+  [(set_attr "type" "neon_logic")]
+)
+
 (define_insn "add3"
   [(set (match_operand:VDQ_I 0 "register_operand" "=w")
 (plus:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4f769a4..450c42d 100644
---

[PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-18 Thread Jakub Jelinek

Hi!

As mentioned in the PR, we now use TYPE_TYPELESS_STORAGE flag on
ARRAY_TYPEs to denote types that need the special C++ alias handling.
The problem is how is that created, we just use build_array_type and
set TYPE_TYPELESS_STORAGE on the result, but build_array_type uses type
caching, so we might modify that way some other array type.
If all the array type creation goes through build_cplus_array_type, that
wouldn't be a problem, as that flag is dependent just on the element
type, but that is not the case, c-family as well as the middle-end has
lots of spots that also create array types.  So in the end whether
one gets TYPE_TYPELESS_STORAGE flag or not is quite random, depends on
GC etc.

The following patch attempts to resolve this, by making the type hashing
take that flag into account.  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2017-04-18  Jakub Jelinek  

PR middle-end/80423
* tree.h (build_array_type_1): New prototype.
* tree.c (type_cache_hasher::equal): Also compare
TYPE_TYPELESS_STORAGE flag for ARRAY_TYPEs.
(build_array_type_1): No longer static.  Add typeless_storage
argument, set TYPE_TYPELESS_STORAGE to it, if shared also
hash it, and pass to recursive call.
(build_array_type, build_nonshared_array_type): Adjust
build_array_type_1 callers.
c-family/
* c-common.c (complete_array_type): Preserve TYPE_TYPELESS_STORAGE.
cp/
* tree.c (build_cplus_array_type): Call build_array_type_1
with the intended TYPE_TYPELESS_STORAGE flag value, instead
of calling build_array_type and modifying TYPE_TYPELESS_STORAGE
on the shared type.
testsuite/
* g++.dg/other/pr80423.C: New test.

--- gcc/tree.h.jj   2017-04-12 13:22:23.0 +0200
+++ gcc/tree.h  2017-04-18 12:38:02.981708334 +0200
@@ -4068,6 +4068,7 @@ extern tree build_truth_vector_type (uns
 extern tree build_same_sized_truth_vector_type (tree vectype);
 extern tree build_opaque_vector_type (tree innertype, int nunits);
 extern tree build_index_type (tree);
+extern tree build_array_type_1 (tree, tree, bool, bool);
 extern tree build_array_type (tree, tree);
 extern tree build_nonshared_array_type (tree, tree);
 extern tree build_array_type_nelts (tree, unsigned HOST_WIDE_INT);
--- gcc/tree.c.jj   2017-04-07 11:46:46.0 +0200
+++ gcc/tree.c  2017-04-18 12:37:38.765024350 +0200
@@ -7073,7 +7073,9 @@ type_cache_hasher::equal (type_hash *a,
 break;
   return 0;
 case ARRAY_TYPE:
-  return TYPE_DOMAIN (a->type) == TYPE_DOMAIN (b->type);
+  return (TYPE_TYPELESS_STORAGE (a->type)
+ == TYPE_TYPELESS_STORAGE (b->type)
+ && TYPE_DOMAIN (a->type) == TYPE_DOMAIN (b->type));
 
 case RECORD_TYPE:
 case UNION_TYPE:
@@ -8352,8 +8354,9 @@ subrange_type_for_debug_p (const_tree ty
and number of elements specified by the range of values of INDEX_TYPE.
If SHARED is true, reuse such a type that has already been constructed.  */
 
-static tree
-build_array_type_1 (tree elt_type, tree index_type, bool shared)
+tree
+build_array_type_1 (tree elt_type, tree index_type, bool shared,
+   bool typeless_storage)
 {
   tree t;
 
@@ -8367,6 +8370,7 @@ build_array_type_1 (tree elt_type, tree
   TREE_TYPE (t) = elt_type;
   TYPE_DOMAIN (t) = index_type;
   TYPE_ADDR_SPACE (t) = TYPE_ADDR_SPACE (elt_type);
+  TYPE_TYPELESS_STORAGE (t) = typeless_storage;
   layout_type (t);
 
   /* If the element type is incomplete at this point we get marked for
@@ -8381,6 +8385,7 @@ build_array_type_1 (tree elt_type, tree
   hstate.add_object (TYPE_HASH (elt_type));
   if (index_type)
hstate.add_object (TYPE_HASH (index_type));
+  hstate.add_flag (typeless_storage);
   t = type_hash_canon (hstate.end (), t);
 }
 
@@ -8396,7 +8401,7 @@ build_array_type_1 (tree elt_type, tree
  = build_array_type_1 (TYPE_CANONICAL (elt_type),
index_type
? TYPE_CANONICAL (index_type) : NULL_TREE,
-   shared);
+   shared, typeless_storage);
 }
 
   return t;
@@ -8407,7 +8412,7 @@ build_array_type_1 (tree elt_type, tree
 tree
 build_array_type (tree elt_type, tree index_type)
 {
-  return build_array_type_1 (elt_type, index_type, true);
+  return build_array_type_1 (elt_type, index_type, true, false);
 }
 
 /* Wrapper around build_array_type_1 with SHARED set to false.  */
@@ -8415,7 +8420,7 @@ build_array_type (tree elt_type, tree in
 tree
 build_nonshared_array_type (tree elt_type, tree index_type)
 {
-  return build_array_type_1 (elt_type, index_type, false);
+  return build_array_type_1 (elt_type, index_type, false, false);
 }
 
 /* Return a representation of ELT_TYPE[NELTS], using indices of type
--- gcc/c-family/c-common.c.jj  2017-04-10 22:26:55.0 +0200
+++ gcc/c-family/c-common.c 2017-04-18 14:43:53.448

[PATCH] Fix VRP intersect_ranges for 1-bit precision signed types (PR tree-optimization/80443)

2017-04-18 Thread Jakub Jelinek

Hi!

As can be seen on the testcase, intersect_ranges in some cases attempts
to add or subtract 1 from one of the bounds.  That is fine except for
1-bit signed type, where 1 is not a value in the range of the type,
so already build_int_cst yields (OVF) constant.

The following patch fixes it by special casing those, instead of
adding/subtracting 1 for those types it subtracts/adds -1.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-04-18  Jakub Jelinek  

PR tree-optimization/80443
* tree-vrp.c (intersect_ranges): For signed 1-bit precision type,
instead of adding 1, subtract -1 and similarly instead of subtracting
1 add -1.

* gcc.c-torture/compile/pr80443.c: New test.

--- gcc/tree-vrp.c.jj   2017-03-23 15:49:55.0 +0100
+++ gcc/tree-vrp.c  2017-04-18 10:09:44.560549718 +0200
@@ -8756,20 +8756,32 @@ intersect_ranges (enum value_range_type
  /* Choose the right gap if the left one is empty.  */
  if (mineq)
{
- if (TREE_CODE (vr1max) == INTEGER_CST)
-   *vr0min = int_const_binop (PLUS_EXPR, vr1max,
-  build_int_cst (TREE_TYPE (vr1max), 
1));
- else
+ if (TREE_CODE (vr1max) != INTEGER_CST)
*vr0min = vr1max;
+ else if (TYPE_PRECISION (TREE_TYPE (vr1max)) == 1
+  && !TYPE_UNSIGNED (TREE_TYPE (vr1max)))
+   *vr0min
+ = int_const_binop (MINUS_EXPR, vr1max,
+build_int_cst (TREE_TYPE (vr1max), -1));
+ else
+   *vr0min
+ = int_const_binop (PLUS_EXPR, vr1max,
+build_int_cst (TREE_TYPE (vr1max), 1));
}
  /* Choose the left gap if the right one is empty.  */
  else if (maxeq)
{
- if (TREE_CODE (vr1min) == INTEGER_CST)
-   *vr0max = int_const_binop (MINUS_EXPR, vr1min,
-  build_int_cst (TREE_TYPE (vr1min), 
1));
- else
+ if (TREE_CODE (vr1min) != INTEGER_CST)
*vr0max = vr1min;
+ else if (TYPE_PRECISION (TREE_TYPE (vr1min)) == 1
+  && !TYPE_UNSIGNED (TREE_TYPE (vr1min)))
+   *vr0max
+ = int_const_binop (PLUS_EXPR, vr1min,
+build_int_cst (TREE_TYPE (vr1min), -1));
+ else
+   *vr0max
+ = int_const_binop (MINUS_EXPR, vr1min,
+build_int_cst (TREE_TYPE (vr1min), 1));
}
  /* Choose the anti-range if the range is effectively varying.  */
  else if (vrp_val_is_min (*vr0min)
@@ -8811,22 +8823,34 @@ intersect_ranges (enum value_range_type
  if (mineq)
{
  *vr0type = VR_RANGE;
- if (TREE_CODE (*vr0max) == INTEGER_CST)
-   *vr0min = int_const_binop (PLUS_EXPR, *vr0max,
-  build_int_cst (TREE_TYPE (*vr0max), 
1));
- else
+ if (TREE_CODE (*vr0max) != INTEGER_CST)
*vr0min = *vr0max;
+ else if (TYPE_PRECISION (TREE_TYPE (*vr0max)) == 1
+  && !TYPE_UNSIGNED (TREE_TYPE (*vr0max)))
+   *vr0min
+ = int_const_binop (MINUS_EXPR, *vr0max,
+build_int_cst (TREE_TYPE (*vr0max), -1));
+ else
+   *vr0min
+ = int_const_binop (PLUS_EXPR, *vr0max,
+build_int_cst (TREE_TYPE (*vr0max), 1));
  *vr0max = vr1max;
}
  /* Choose the left gap if the right is empty.  */
  else if (maxeq)
{
  *vr0type = VR_RANGE;
- if (TREE_CODE (*vr0min) == INTEGER_CST)
-   *vr0max = int_const_binop (MINUS_EXPR, *vr0min,
-  build_int_cst (TREE_TYPE (*vr0min), 
1));
- else
+ if (TREE_CODE (*vr0min) != INTEGER_CST)
*vr0max = *vr0min;
+ else if (TYPE_PRECISION (TREE_TYPE (*vr0min)) == 1
+  && !TYPE_UNSIGNED (TREE_TYPE (*vr0min)))
+   *vr0max
+ = int_const_binop (PLUS_EXPR, *vr0min,
+build_int_cst (TREE_TYPE (*vr0min), -1));
+ else
+   *vr0max
+ = int_const_binop (MINUS_EXPR, *vr0min,
+build_int_cst (TREE_TYPE (*vr0min), 1));
  *vr0min = vr1min;
}
  /* Choose the anti-range if the range is effectively varying.  */
--- gcc/testsuite/gcc.c-torture/compile/pr80443.c.jj2017-04-18 
10:16:35.867952863 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr80443.c   2017-04-18 
10:16:21.0 +0200
@

[committed] Fix -fsanitize-coverage=trace-pc -fcompare-debug bug (PR sanitizer/80444)

2017-04-18 Thread Jakub Jelinek

Hi!

As the pass doesn't want to insert into empty bbs and also copies
locus from the stmt it is inserted before, we have to make sure we insert
it before non-debug stmt (and as empty bb also consider blocks with just
labels and debug stmts).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed as obvious.

2017-04-18  Jakub Jelinek  

PR sanitizer/80444
* sancov.c (sancov_pass): Use gsi_start_nondebug_after_labels_bb
instead of gsi_after_labels.

* gcc.dg/sancov/pr80444.c: New test.

--- gcc/sancov.c.jj 2017-01-01 12:45:38.0 +0100
+++ gcc/sancov.c2017-04-18 09:14:58.567233552 +0200
@@ -46,7 +46,7 @@ sancov_pass (function *fun)
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 {
-  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
   if (gsi_end_p (gsi))
continue;
   gimple *stmt = gsi_stmt (gsi);
--- gcc/testsuite/gcc.dg/sancov/pr80444.c.jj2017-04-18 09:16:34.480933807 
+0200
+++ gcc/testsuite/gcc.dg/sancov/pr80444.c   2017-04-18 09:17:24.322256958 
+0200
@@ -0,0 +1,9 @@
+/* PR sanitizer/80444 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize-coverage=trace-pc -fcompare-debug" } */
+
+void
+foo (void)
+{
+  int a = 0;
+}

Jakub

Re: [RFA][PATCH] Fix mips16 codegen issue in a better way

2017-04-18 Thread Jeff Law


On 04/18/2017 05:16 AM, Richard Sandiford wrote:

Jeff Law  writes:

[RFA is for the regcprop bits, reverting the mips.md hack seems like a
no-brainer with the regcprop change. ]


Per the recent discussion between Richard S. and myself, this is a
better fix for the mips16 codegen issue where it's creating invalid lwu
insns.

As Richard S. pointed out there's been a long standing problem where
regcprop would create a reference to the stack pointer that was unique
from stack_pointer_rtx.  That's the root cause of the codegen issue.

We can't re-use stack_pointer_rtx in the code in question because we're
going to modify the underlying RTX in fun and interesting ways.  Ports
(such as the mips) assume references to the stack pointer are unique
(ie, they can identify a stack pointer reference by stack_pointer_rtx
rather than checking register #s).

So this patch just rejects propagation of the stack pointer.  It's
conservative in that it doesn't reject other special registers.

An alternate approach would be to declare that ports can not depend on
looking at stack_pointer_rtx to find all stack references that instead
they have to look at the underlying regno.  The amount of auditing here
would be significant.

I've bootstrapped and regression tested this on x86_64-linux-gnu.  It's
also built libgcc/glibc/newlib on about 100 different targets.


Thanks for doing this, looks good to me FWIW.  I was wondering
whether we should use gen_rtx_REG instead of gen_raw_rtx_REG
in maybe_mode_change and then disable the assignment to things
like ORIGINAL_REGNO if the returned rtx is one of the global rtxes.
But I'm not sure how safe that would be.  We don't know whether the
original reference to the register number was the global rtx,
which might matter for things like hard_frame_pointer_rtx.
I'd also be concerned that not setting something like REG_POINTER or 
REG_ATTRS could cause problems.


While I can probably get to "safe" for current intersection ports, 
special registers and how they interact with REG_POINTER, it feels 
rather fragile.


Jeff

Re: [PATCH v2] Generate reproducible output independently of the build-path

2017-04-18 Thread Ximin Luo

Ximin Luo:
> [..]
> 
> I will soon test this patch backported to Debian GCC-6 on
> tests.reproducible-builds.org and will have results in a few days or weeks.
> Some preliminary tests earlier gave good results (about +40 packages
> reproducible over ~2 days) but we had to abort due to some misscheduling.
> 
> [..]

This has been completed and we reproduced ~1700 extra packages when building 
with a GCC-6 with this patch, as well as a patched dpkg that sets the 
environment variable appropriately.

This is about 6.5% of ~26100 Debian source packages, and about 1/2 of the ones 
whose irreproducibility is due to build-path issues. Most of the rest are not 
related to GCC, such as things built by R, OCaml, Erlang, LLVM, PDF IDs, etc, 
etc.

https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html

The dip afterwards is due to reverting back to an unpatched GCC-6, but I'll be 
rebasing the patch continually over the next few weeks so the graph should stay 
up.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Re: [PATCH GCC8][04/33]Single interface finding invariant variables

2017-04-18 Thread Trevor Saunders

On Tue, Apr 18, 2017 at 01:58:43PM +0100, Bin.Cheng wrote:
> On Tue, Apr 18, 2017 at 1:20 PM, Trevor Saunders  
> wrote:
> > On Tue, Apr 18, 2017 at 10:39:30AM +, Bin Cheng wrote:
> >> -find_depends (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
> >> +find_inv_vars_cb (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
> >>  {
> >> -  bitmap *inv_vars = (bitmap *) data;
> >> +  struct walk_tree_data *wdata = (struct walk_tree_data*) data;
> >>struct version_info *info;
> >>
> >>if (TREE_CODE (*expr_p) != SSA_NAME)
> >>  return NULL_TREE;
> >> -  info = name_info (fd_ivopts_data, *expr_p);
> >>
> >> +  info = name_info (wdata->idata, *expr_p);
> >>if (!info->inv_id || info->has_nonlin_use)
> >>  return NULL_TREE;
> >>
> >> -  if (!*inv_vars)
> >> -*inv_vars = BITMAP_ALLOC (NULL);
> >> -  bitmap_set_bit (*inv_vars, info->inv_id);
> >> +  if (!*wdata->inv_vars)
> >> +*wdata->inv_vars = BITMAP_ALLOC (NULL);
> >
> > Given below this seems to be dead and inv_vars could just be a bitmap.
> >
> >> +find_inv_vars (struct ivopts_data *data, tree *expr_p, bitmap *inv_vars)
> >> +{
> >> +  struct walk_tree_data wdata;
> >> +
> >> +  if (!inv_vars)
> >> +return;
> >> +
> >> +  wdata.idata = data;
> >> +  wdata.inv_vars = inv_vars;
> >> +  walk_tree (expr_p, find_inv_vars_cb, &wdata, NULL);
> >
> > given this it looks like the null check of inv_vars in find_inv_vars_cb
> > is unnecessary because inv_vars must be nonnull to call walk_tree().
> Hmm, this check is for bitmap* pointer, the one in call back function
> is for bitmap pointer, right?

ah yes, you can pass a pointer to a bitmap that points to null in here
and then have a bitmap allocated.

thanks

Trev

> 
> Thanks,
> bin
> >
> > Thanks
> >
> > Trev

Patch ping^2

2017-04-18 Thread Jakub Jelinek

I'd like to ping following patch:

PR debug/80263
   http://gcc.gnu.org/ml/gcc-patches/2017-04/msg4.html
   avoid emitting sizetype artificial name into debug info

Thanks

Jakub

[patch,avr,committed, trunk + v6] Fix PR79453

2017-04-18 Thread Georg-Johann Lay


Applied the patchlet below to fix a translation hiccup in the AVR backend.

Johann

PR target/79453
* config/avr/avr.c (intl.h): Include it.
(avr_pgm_check_var_decl) [reason]: Wrap diagnostic snippets into _().


Index: config/avr/avr.c
===
--- config/avr/avr.c(revision 246964)
+++ config/avr/avr.c(working copy)
@@ -20,6 +20,7 @@

 #include "config.h"
 #include "system.h"
+#include "intl.h"
 #include "coretypes.h"
 #include "backend.h"
 #include "target.h"
@@ -9797,28 +9798,28 @@ avr_pgm_check_var_decl (tree node)

 case VAR_DECL:
   if (as = avr_nonconst_pointer_addrspace (TREE_TYPE (node)), as)
-reason = "variable";
+reason = _("variable");
   break;

 case PARM_DECL:
   if (as = avr_nonconst_pointer_addrspace (TREE_TYPE (node)), as)
-reason = "function parameter";
+reason = _("function parameter");
   break;

 case FIELD_DECL:
   if (as = avr_nonconst_pointer_addrspace (TREE_TYPE (node)), as)
-reason = "structure field";
+reason = _("structure field");
   break;

 case FUNCTION_DECL:
   if (as = avr_nonconst_pointer_addrspace (TREE_TYPE (TREE_TYPE 
(node))),

   as)
-reason = "return type of function";
+reason = _("return type of function");
   break;

 case POINTER_TYPE:
   if (as = avr_nonconst_pointer_addrspace (node), as)
-reason = "pointer";
+reason = _("pointer");
   break;
 }

Re: [PATCH GCC8][04/33]Single interface finding invariant variables

2017-04-18 Thread Bin.Cheng

On Tue, Apr 18, 2017 at 1:20 PM, Trevor Saunders  wrote:
> On Tue, Apr 18, 2017 at 10:39:30AM +, Bin Cheng wrote:
>> -find_depends (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
>> +find_inv_vars_cb (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
>>  {
>> -  bitmap *inv_vars = (bitmap *) data;
>> +  struct walk_tree_data *wdata = (struct walk_tree_data*) data;
>>struct version_info *info;
>>
>>if (TREE_CODE (*expr_p) != SSA_NAME)
>>  return NULL_TREE;
>> -  info = name_info (fd_ivopts_data, *expr_p);
>>
>> +  info = name_info (wdata->idata, *expr_p);
>>if (!info->inv_id || info->has_nonlin_use)
>>  return NULL_TREE;
>>
>> -  if (!*inv_vars)
>> -*inv_vars = BITMAP_ALLOC (NULL);
>> -  bitmap_set_bit (*inv_vars, info->inv_id);
>> +  if (!*wdata->inv_vars)
>> +*wdata->inv_vars = BITMAP_ALLOC (NULL);
>
> Given below this seems to be dead and inv_vars could just be a bitmap.
>
>> +find_inv_vars (struct ivopts_data *data, tree *expr_p, bitmap *inv_vars)
>> +{
>> +  struct walk_tree_data wdata;
>> +
>> +  if (!inv_vars)
>> +return;
>> +
>> +  wdata.idata = data;
>> +  wdata.inv_vars = inv_vars;
>> +  walk_tree (expr_p, find_inv_vars_cb, &wdata, NULL);
>
> given this it looks like the null check of inv_vars in find_inv_vars_cb
> is unnecessary because inv_vars must be nonnull to call walk_tree().
Hmm, this check is for bitmap* pointer, the one in call back function
is for bitmap pointer, right?

Thanks,
bin
>
> Thanks
>
> Trev

Re: [PATCH GCC8][04/33]Single interface finding invariant variables

2017-04-18 Thread Trevor Saunders

On Tue, Apr 18, 2017 at 10:39:30AM +, Bin Cheng wrote:
> -find_depends (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
> +find_inv_vars_cb (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
>  {
> -  bitmap *inv_vars = (bitmap *) data;
> +  struct walk_tree_data *wdata = (struct walk_tree_data*) data;
>struct version_info *info;
>  
>if (TREE_CODE (*expr_p) != SSA_NAME)
>  return NULL_TREE;
> -  info = name_info (fd_ivopts_data, *expr_p);
>  
> +  info = name_info (wdata->idata, *expr_p);
>if (!info->inv_id || info->has_nonlin_use)
>  return NULL_TREE;
>  
> -  if (!*inv_vars)
> -*inv_vars = BITMAP_ALLOC (NULL);
> -  bitmap_set_bit (*inv_vars, info->inv_id);
> +  if (!*wdata->inv_vars)
> +*wdata->inv_vars = BITMAP_ALLOC (NULL);

Given below this seems to be dead and inv_vars could just be a bitmap.

> +find_inv_vars (struct ivopts_data *data, tree *expr_p, bitmap *inv_vars)
> +{
> +  struct walk_tree_data wdata;
> +
> +  if (!inv_vars)
> +return;
> +
> +  wdata.idata = data;
> +  wdata.inv_vars = inv_vars;
> +  walk_tree (expr_p, find_inv_vars_cb, &wdata, NULL);

given this it looks like the null check of inv_vars in find_inv_vars_cb
is unnecessary because inv_vars must be nonnull to call walk_tree().

Thanks

Trev

Re: [RFA][PATCH] Fix mips16 codegen issue in a better way

2017-04-18 Thread Richard Sandiford

Jeff Law  writes:
> [RFA is for the regcprop bits, reverting the mips.md hack seems like a 
> no-brainer with the regcprop change. ]
>
>
> Per the recent discussion between Richard S. and myself, this is a 
> better fix for the mips16 codegen issue where it's creating invalid lwu 
> insns.
>
> As Richard S. pointed out there's been a long standing problem where 
> regcprop would create a reference to the stack pointer that was unique 
> from stack_pointer_rtx.  That's the root cause of the codegen issue.
>
> We can't re-use stack_pointer_rtx in the code in question because we're 
> going to modify the underlying RTX in fun and interesting ways.  Ports 
> (such as the mips) assume references to the stack pointer are unique 
> (ie, they can identify a stack pointer reference by stack_pointer_rtx 
> rather than checking register #s).
>
> So this patch just rejects propagation of the stack pointer.  It's 
> conservative in that it doesn't reject other special registers.
>
> An alternate approach would be to declare that ports can not depend on 
> looking at stack_pointer_rtx to find all stack references that instead 
> they have to look at the underlying regno.  The amount of auditing here 
> would be significant.
>
> I've bootstrapped and regression tested this on x86_64-linux-gnu.  It's 
> also built libgcc/glibc/newlib on about 100 different targets.

Thanks for doing this, looks good to me FWIW.  I was wondering
whether we should use gen_rtx_REG instead of gen_raw_rtx_REG
in maybe_mode_change and then disable the assignment to things
like ORIGINAL_REGNO if the returned rtx is one of the global rtxes.
But I'm not sure how safe that would be.  We don't know whether the
original reference to the register number was the global rtx,
which might matter for things like hard_frame_pointer_rtx.

I think regcprop could be rejigged to handle global rtxes
"correctly", but that's obviously too invasive at this stage.

Richard

[PATCH] Fix incorrect results from std::boyer_moore_searcher

2017-04-18 Thread Jonathan Wakely


Nico noticed the return value of std::boyer_moore_searcher has the
wrong position for the end of the match. Fixed by this patch, which
also does some minor cleanup (like removing redundant std:: quals that
were copied from the std::experimental::boyer_moore_searcher version).

* include/std/functional (default_searcher, __boyer_moore_array_base)
(__is_std_equal_to, __boyer_moore_base_t, boyer_moore_searcher)
(boyer_moore_horspool_searcher): Remove redundant namespace
qualification.
(default_searcher::operator()): Construct return value early and
advance second member in-place.
(boyer_moore_horspool_searcher::operator()): Increment random access
iterator directly instead of using std::next.
(boyer_moore_searcher::operator()): Fix return value.
* testsuite/20_util/function_objects/searchers.cc: Check both parts
of return values.

Tested ppc64le-linux, committed to trunk.


commit 35bccf225502eaa895321500b20e432605119b93
Author: Jonathan Wakely 
Date:   Tue Apr 18 11:51:35 2017 +0100

Fix incorrect results from std::boyer_moore_searcher

* include/std/functional (default_searcher, __boyer_moore_array_base)
(__is_std_equal_to, __boyer_moore_base_t, boyer_moore_searcher)
(boyer_moore_horspool_searcher): Remove redundant namespace
qualification.
(default_searcher::operator()): Construct return value early and
advance second member in-place.
(boyer_moore_horspool_searcher::operator()): Increment random access
iterator directly instead of using std::next.
(boyer_moore_searcher::operator()): Fix return value.
* testsuite/20_util/function_objects/searchers.cc: Check both parts
of return values.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 0661253..e4a82ee 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -969,17 +969,17 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
operator()(_ForwardIterator2 __first, _ForwardIterator2 __last) const
{
  _ForwardIterator2 __first_ret =
-   std::search(__first, __last,
-   std::get<0>(_M_m), std::get<1>(_M_m),
+   std::search(__first, __last, std::get<0>(_M_m), std::get<1>(_M_m),
std::get<2>(_M_m));
- _ForwardIterator2 __second_ret = __first_ret == __last ?
-   __last :  std::next(__first_ret, std::distance(std::get<0>(_M_m),
-  std::get<1>(_M_m)));
- return std::make_pair(__first_ret, __second_ret);
+ auto __ret = std::make_pair(__first_ret, __first_ret);
+ if (__ret.first != __last)
+   std::advance(__ret.second, std::distance(std::get<0>(_M_m),
+std::get<1>(_M_m)));
+ return __ret;
}
 
 private:
-  std::tuple<_ForwardIterator1, _ForwardIterator1, _BinaryPredicate> _M_m;
+  tuple<_ForwardIterator1, _ForwardIterator1, _BinaryPredicate> _M_m;
 };
 
   template
@@ -1025,7 +1025,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
for (__diff_type __i = 0; __i < __patlen - 1; ++__i)
  {
auto __ch = __pat[__i];
-   using _UCh = std::make_unsigned_t;
+   using _UCh = make_unsigned_t;
auto __uch = static_cast<_UCh>(__ch);
std::get<0>(_M_bad_char)[__uch] = __patlen - 1 - __i;
  }
@@ -1037,7 +1037,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
__diff_type
_M_lookup(_Key __key, __diff_type __not_found) const
{
- auto __ukey = static_cast>(__key);
+ auto __ukey = static_cast>(__key);
  if (__ukey >= _Len)
return __not_found;
  return std::get<0>(_M_bad_char)[__ukey];
@@ -1046,14 +1046,14 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   const _Pred&
   _M_pred() const { return std::get<1>(_M_bad_char); }
 
-  std::tuple<_GLIBCXX_STD_C::array<_Tp, _Len>, _Pred> _M_bad_char;
+  tuple<_GLIBCXX_STD_C::array<_Tp, _Len>, _Pred> _M_bad_char;
 };
 
   template
-struct __is_std_equal_to : std::false_type { };
+struct __is_std_equal_to : false_type { };
 
   template<>
-struct __is_std_equal_to> : std::true_type { };
+struct __is_std_equal_to> : true_type { };
 
   // Use __boyer_moore_array_base when pattern consists of narrow characters
   // and uses std::equal_to as the predicate.
@@ -1061,14 +1061,14 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
typename _Val = typename iterator_traits<_RAIter>::value_type,
   typename _Diff = typename iterator_traits<_RAIter>::difference_type>
 using __boyer_moore_base_t
-  = std::conditional_t::value
-  && __is_std_equal_to<_Pred>::v

[PATCH GCC8][33/33]Fix PR69710/PR68030 by reassociate vect base address and a simple CSE pass

2017-04-18 Thread Bin Cheng

Hi,
This is the same patch posted at 
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02000.html,
after rebase against this patch series.  This patch was blocked because without 
this patch
series, it could generate worse code on targets with limited addressing mode 
support, like
AArch64.
There was some discussion about alternative fix for PRs, but after thinking 
twice I think
this fix is in the correct direction.  A CSE interface is useful to clean up 
code generated
in vectorizer, and we should improve this CSE interface into a region base one. 
 for the
moment, optimal code is not generated on targets like x86, I believe it's 
because the CSE
is weak and doesn't cover all basic blocks generated by vectorizer, the issue 
should be
fixed if region-based CSE is implemented.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

PR tree-optimization/68030
PR tree-optimization/69710
* tree-ssa-dom.c (cse_bbs): New function.
* tree-ssa-dom.h (cse_bbs): New declaration.
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Re-associate address by splitting constant offset.
(vect_create_data_ref_ptr, vect_setup_realignment): Record changed
basic block.
* tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Record
changed basic block.
* tree-vectorizer.c (tree-ssa-dom.h): Include header file.
(changed_bbs): New variable.
(vectorize_loops): Allocate and free CHANGED_BBS.  Call cse_bbs.
* tree-vectorizer.h (changed_bbs): New declaration.diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index d9e5942..6d74c07 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1765,3 +1765,50 @@ optimize_stmt (basic_block bb, gimple_stmt_iterator si,
 }
   return retval;
 }
+
+/* A local CSE interface which runs CSE for basic blocks recorded in
+   CHANGED_BBS.  */
+
+void
+cse_bbs (bitmap changed_bbs)
+{
+  unsigned index;
+  bitmap_iterator bi;
+  gimple_stmt_iterator gsi;
+
+  hash_table *avail_exprs;
+  class avail_exprs_stack *avail_exprs_stack;
+  class const_and_copies *const_and_copies;
+
+  avail_exprs = new hash_table (1024);
+  avail_exprs_stack = new class avail_exprs_stack (avail_exprs);
+  const_and_copies = new class const_and_copies ();
+
+  threadedge_initialize_values ();
+  /* Push a marker on the stacks of local information so that we know how
+ far to unwind when we finalize this block.  */
+  avail_exprs_stack->push_marker ();
+  const_and_copies->push_marker ();
+
+  EXECUTE_IF_SET_IN_BITMAP (changed_bbs, 0, index, bi)
+{
+  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, index);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "\n\nRun local cse on block #%d\n\n", bb->index);
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+   optimize_stmt (bb, gsi, const_and_copies, avail_exprs_stack);
+
+  /* Pop stacks to keep it small.  */
+  avail_exprs_stack->pop_to_marker ();
+  const_and_copies->pop_to_marker ();
+}
+
+  delete avail_exprs;
+  avail_exprs = NULL;
+
+  delete avail_exprs_stack;
+  delete const_and_copies;
+  threadedge_finalize_values ();
+}
diff --git a/gcc/tree-ssa-dom.h b/gcc/tree-ssa-dom.h
index ad1b7ef..88869fd 100644
--- a/gcc/tree-ssa-dom.h
+++ b/gcc/tree-ssa-dom.h
@@ -24,5 +24,6 @@ extern bool simple_iv_increment_p (gimple *);
 extern void record_temporary_equivalences (edge,
   class const_and_copies *,
   class avail_exprs_stack *);
+extern void cse_bbs (bitmap);
 
 #endif /* GCC_TREE_SSA_DOM_H */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index aa504b6..beffa17 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4247,23 +4247,27 @@ vect_create_addr_base_for_vector_ref (gimple *stmt,
   base_name = get_name (DR_REF (dr));
 }
 
-  /* Create base_offset */
-  base_offset = size_binop (PLUS_EXPR,
-   fold_convert (sizetype, base_offset),
-   fold_convert (sizetype, init));
+  base_offset = fold_convert (sizetype, base_offset);
+  init = fold_convert (sizetype, init);
 
   if (offset)
 {
   offset = fold_build2 (MULT_EXPR, sizetype,
fold_convert (sizetype, offset), step);
-  base_offset = fold_build2 (PLUS_EXPR, sizetype,
-base_offset, offset);
+  if (TREE_CODE (offset) == INTEGER_CST)
+   init = fold_build2 (PLUS_EXPR, sizetype, init, offset);
+  else
+   base_offset = fold_build2 (PLUS_EXPR, sizetype,
+  base_offset, offset);
 }
   if (byte_offset)
 {
   byte_offset = fold_convert (sizetype, byte_offset);
-  base_offset = fold_build2 (PLUS_EXPR, sizetype,
-base_offset, byte_offset);
+  if (TREE_CODE (byte_offset) == INTEGER_CST)
+

[PATCH GCC8][32/33]Save niter check for vect peeling if loop versioning is required

2017-04-18 Thread Bin Cheng

Hi,
When loop versioning is required in vectorization, we can merge niter check for 
vect
peeling with the check for loop versioning, thus save one check/branch for 
vectorized
loop.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-vect-loop-manip.c (vect_do_peeling): Don't skip vector loop
if versioning is required.
* tree-vect-loop.c (vect_analyze_loop_2): Merge niter check for loop
peeling with the check for versioning.From bd54e2524a4047328ba4847ad013db2bbe5850fe Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 16 Mar 2017 16:40:50 +
Subject: [PATCH 32/33] save-vect_peeling-niters-check-20170225.txt

---
 gcc/tree-vect-loop-manip.c |  8 +---
 gcc/tree-vect-loop.c   | 30 ++
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 0fc8cd3..0ff474d 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1686,9 +1686,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
 
   /* Prolog loop may be skipped.  */
   bool skip_prolog = (prolog_peeling != 0);
-  /* Skip to epilog if scalar loop may be preferred.  It's only used when
- we peel for epilog loop.  */
-  bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo));
+  /* Skip to epilog if scalar loop may be preferred.  It's only needed
+ when we peel for epilog loop and when it hasn't been checked with
+ loop versioning.  */
+  bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && !LOOP_REQUIRES_VERSIONING (loop_vinfo));
   /* Epilog loop must be executed if the number of iterations for epilog
  loop is known at compile time, otherwise we need to add a check at
  the end of vector loop and skip to the end of epilog loop.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index af874e7..98caa5e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2214,6 +2214,36 @@ start_over:
 }
 }
 
+  /* During peeling, we need to check if number of loop iterations is
+ enough for both peeled prolog loop and vector loop.  This check
+ can be merged along with threshold check of loop versioning, so
+ increase threshold for this case if necessary.  */
+  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
+  && (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+ || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
+{
+  unsigned niters_th;
+
+  /* Niters for peeled prolog loop.  */
+  if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
+   {
+ struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
+ tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
+
+ niters_th = TYPE_VECTOR_SUBPARTS (vectype) - 1;
+   }
+  else
+   niters_th = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+
+  /* Niters for at least one iteration of vectorized loop.  */
+  niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  /* One additional iteration because of peeling for gap.  */
+  if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+   niters_th++;
+  if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
+   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
+}
+
   gcc_assert (vectorization_factor
  == (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));
 
-- 
1.9.1

[PATCH GCC8][31/33]Set range information for niter bound of vectorized loop

2017-04-18 Thread Bin Cheng

Hi,
Based on vect_peeling algorithm, we know for sure that vectorized loop must 
iterates at least once.
This patch sets range information for niter bounds of vectorized loop.  This 
helps niter analysis,
so iv elimination too.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-vect-loop-manip.c (vect_gen_vector_loop_niters): Refactor.
Set range information for vector loop bound variable.
(vect_do_peeling): Ditto.From 0962735526bf474591e410104d6c9576691e0a1f Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 14 Mar 2017 17:46:55 +
Subject: [PATCH 31/33] range_info-for-vect_loop-niters-20170224.txt

---
 gcc/tree-vect-loop-manip.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index faeaa6d..0fc8cd3 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1177,22 +1177,21 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, 
tree niters,
 tree *niters_vector_ptr, bool niters_no_overflow)
 {
   tree ni_minus_gap, var;
-  tree niters_vector;
+  tree niters_vector, type = TREE_TYPE (niters);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
-  tree log_vf = build_int_cst (TREE_TYPE (niters), exact_log2 (vf));
+  tree log_vf = build_int_cst (type, exact_log2 (vf));
 
   /* If epilogue loop is required because of data accesses with gaps, we
  subtract one iteration from the total number of iterations here for
  correct calculation of RATIO.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
 {
-  ni_minus_gap = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
- niters,
- build_one_cst (TREE_TYPE (niters)));
+  ni_minus_gap = fold_build2 (MINUS_EXPR, type, niters,
+ build_one_cst (type));
   if (!is_gimple_val (ni_minus_gap))
{
- var = create_tmp_var (TREE_TYPE (niters), "ni_gap");
+ var = create_tmp_var (type, "ni_gap");
  gimple *stmts = NULL;
  ni_minus_gap = force_gimple_operand (ni_minus_gap, &stmts,
   true, var);
@@ -1208,25 +1207,28 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, 
tree niters,
  (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio
  will be at least one.  */
   if (niters_no_overflow)
-niters_vector = fold_build2 (RSHIFT_EXPR, TREE_TYPE (niters),
-ni_minus_gap, log_vf);
+niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf);
   else
 niters_vector
-  = fold_build2 (PLUS_EXPR, TREE_TYPE (niters),
-fold_build2 (RSHIFT_EXPR, TREE_TYPE (niters),
- fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
-  ni_minus_gap,
-  build_int_cst
-(TREE_TYPE (niters), vf)),
+  = fold_build2 (PLUS_EXPR, type,
+fold_build2 (RSHIFT_EXPR, type,
+ fold_build2 (MINUS_EXPR, type, ni_minus_gap,
+  build_int_cst (type, vf)),
  log_vf),
-build_int_cst (TREE_TYPE (niters), 1));
+build_int_cst (type, 1));
 
   if (!is_gimple_val (niters_vector))
 {
-  var = create_tmp_var (TREE_TYPE (niters), "bnd");
+  var = create_tmp_var (type, "bnd");
   gimple *stmts = NULL;
   niters_vector = force_gimple_operand (niters_vector, &stmts, true, var);
   gsi_insert_seq_on_edge_immediate (pe, stmts);
+  /* Peeling algorithm guarantees that vector loop bound is at least ONE,
+we set range information to make niters analyzer's life easier.  */
+  if (TREE_CODE (niters_vector) == SSA_NAME)
+   set_range_info (niters_vector, VR_RANGE, build_int_cst (type, 1),
+   fold_build2 (RSHIFT_EXPR, type,
+TYPE_MAX_VALUE (type), log_vf));
 }
   *niters_vector_ptr = niters_vector;
 
@@ -1773,6 +1775,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
= fold_build2 (MINUS_EXPR, type,
   LOOP_VINFO_NITERSM1 (loop_vinfo), niters_prolog);
   niters = vect_build_loop_niters (loop_vinfo);
+  /* It's guaranteed that vector loop bound before vectorization is at
+least VF, so set range information.  */
+  if (TREE_CODE (niters) == SSA_NAME)
+   set_range_info (niters, VR_RANGE,
+   build_int_cst (type, vf), TYPE_MAX_VALUE (type));
 
   /* Prolog iterates at most bound_prolog times, latch iterates at
 most bound_prolog - 1 times.  */
-- 
1.9.1

[PATCH GCC8][30/33]Fold more type conversion into binary arithmetic operations

2017-04-18 Thread Bin Cheng

Hi,
Simplification of (T1)(X *+- CST) is already implemented in 
aff_combination_expand,
this patch moves it to tree_to_aff_combination.  It also supports unsigned types
if range information allows the transformation, as well as special case (T1)(X 
+ X).
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-affine.c: Include header file.
(aff_combination_expand): Move (T1)(X *+- CST) simplification to ...
(tree_to_aff_combination): ... here.  Support (T1)(X + X) case, and
unsigned type case if range information allows.From c3d37447c529793bfdab319574a8a065e7871292 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 15 Mar 2017 13:59:09 +
Subject: [PATCH 30/33] affine-fold-conversion-to-bin_op-20170224.txt

---
 gcc/tree-affine.c | 82 ---
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-affine.c b/gcc/tree-affine.c
index 13c477d..6efd1e4 100644
--- a/gcc/tree-affine.c
+++ b/gcc/tree-affine.c
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
+#include "ssa.h"
 #include "tree-pretty-print.h"
 #include "fold-const.h"
 #include "tree-affine.h"
@@ -363,6 +364,61 @@ tree_to_aff_combination (tree expr, tree type, aff_tree 
*comb)
   aff_combination_add (comb, &tmp);
   return;
 
+CASE_CONVERT:
+  {
+   tree otype = TREE_TYPE (expr);
+   tree inner = TREE_OPERAND (expr, 0);
+   tree itype = TREE_TYPE (inner);
+   enum tree_code icode = TREE_CODE (inner);
+
+   /* In principle this is a valid folding, but it isn't necessarily
+  an optimization, so do it here and not in fold_unary.  */
+   if ((icode == PLUS_EXPR || icode == MINUS_EXPR || icode == MULT_EXPR)
+   && TREE_CODE (itype) == INTEGER_TYPE
+   && TREE_CODE (otype) == INTEGER_TYPE
+   && TYPE_PRECISION (otype) > TYPE_PRECISION (itype))
+ {
+   tree op0 = TREE_OPERAND (inner, 0), op1 = TREE_OPERAND (inner, 1);
+
+   /* Convert (T1)(X *+- CST) into (T1)X *+- (T1)CST if X's type has
+  undefined overflow behavior.  Convert (T1)(X + X) as well.  */
+   if (TYPE_OVERFLOW_UNDEFINED (itype)
+   && (TREE_CODE (op1) == INTEGER_CST
+   || (icode == PLUS_EXPR && operand_equal_p (op0, op1, 0
+ {
+   op0 = fold_convert (otype, op0);
+   op1 = fold_convert (otype, op1);
+   expr = fold_build2 (icode, otype, op0, op1);
+   tree_to_aff_combination (expr, type, comb);
+   return;
+ }
+   wide_int minv, maxv;
+   /* In case X's type has wrapping overflow behavior, we can still
+  convert (T1)(X - CST) into (T1)X - (T1)CST if X - CST doesn't
+  overflow by range information.  Also Convert (T1)(X + CST) as
+  if it's (T1)(X - (-CST)).  */
+   if (TYPE_UNSIGNED (itype)
+   && TYPE_OVERFLOW_WRAPS (itype)
+   && TREE_CODE (op0) == SSA_NAME
+   && TREE_CODE (op1) == INTEGER_CST
+   && (icode == PLUS_EXPR || icode == MINUS_EXPR)
+   && get_range_info (op0, &minv, &maxv) == VR_RANGE)
+ {
+   if (icode == PLUS_EXPR)
+ op1 = fold_build1 (NEGATE_EXPR, itype, op1);
+   if (wi::geu_p (minv, op1))
+ {
+   op0 = fold_convert (otype, op0);
+   op1 = fold_convert (otype, op1);
+   expr = fold_build2 (MINUS_EXPR, otype, op0, op1);
+   tree_to_aff_combination (expr, type, comb);
+   return;
+ }
+ }
+ }
+  }
+  break;
+
 default:
   break;
 }
@@ -639,28 +695,10 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_UNUSED,
  exp = XNEW (struct name_expansion);
  exp->in_progress = 1;
  *slot = exp;
- /* In principle this is a generally valid folding, but
-it is not unconditionally an optimization, so do it
-here and not in fold_unary.  */
- /* Convert (T1)(X *+- CST) into (T1)X *+- (T1)CST if T1 is wider
-than the type of X and overflow for the type of X is
-undefined.  */
- if (e != name
- && INTEGRAL_TYPE_P (type)
- && INTEGRAL_TYPE_P (TREE_TYPE (name))
- && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (name))
- && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (name))
- && (code == PLUS_EXPR || code == MINUS_EXPR || code == MULT_EXPR)
- && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST)
-   rhs = fold_build2 (code, type,
-  fold_convert (type, gimple_assign_rhs1 (def)),
-  fold_convert (type, gimple_assign_rhs2 (def)));
- else
-

[PATCH GCC8][29/33]New register pressure estimation

2017-04-18 Thread Bin Cheng

Hi,
Currently IVOPTs shares the same register pressure computation with RTL loop 
invariant pass,
which doesn't work very well.  This patch introduces specific interface for 
IVOPTs.
The general idea is described in the cover message as below:
  C) Current implementation shares the same register pressure computation with 
RTL loop
 inv pass.  It has difficulty in handling (especially large) loop nest, and 
quite
 often generating too many candidates (especially for outer loops).  This 
change
 introduces new register pressure computation.  The brief idea is to 
differentiate
 (hot) innermost loop and outer loop.  for (possibly hot) inner most, more 
registers
 are allowed as long as the register pressure is within the range of number 
of target
 available registers.
It can also help to restrict number of candidates for outer loop.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
(ivopts_estimate_reg_pressure): New reg_pressure model function.
(ivopts_global_cost_for_size): Delete.
(determine_set_costs, iv_ca_recount_cost): Call new model function
ivopts_estimate_reg_pressure.
(determine_hot_innermost_loop): New.
(tree_ssa_iv_optimize_loop): Call above function.From 2b6f11666a86f740a7f813eca26905ce15691d5e Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 10 Mar 2017 11:03:16 +
Subject: [PATCH 29/33] ivopt-reg_pressure-model-20170223.txt

---
 gcc/tree-ssa-loop-ivopts.c | 82 --
 1 file changed, 72 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index db8254c..464f96e 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -589,6 +589,9 @@ struct ivopts_data
 
   /* Whether the loop body can only be exited via single exit.  */
   bool loop_single_exit_p;
+
+  /* The current loop is the innermost loop and maybe hot.  */
+  bool hot_innermost_loop_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -5537,17 +5540,51 @@ determine_iv_costs (struct ivopts_data *data)
 fprintf (dump_file, "\n");
 }
 
-/* Calculates cost for having N_REGS registers.  This number includes
-   induction variables, invariant variables and invariant expressions.  */
+/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
+   induction variables.  Note N_INVS includes both invariant variables and
+   invariant expressions.  */
 
 static unsigned
-ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
+ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
+ unsigned n_cands)
 {
-  unsigned cost = estimate_reg_pressure_cost (n_regs,
- data->regs_used, data->speed,
- data->body_includes_call);
-  /* Add n_regs to the cost, so that we prefer eliminating ivs if possible.  */
-  return n_regs + cost;
+  unsigned cost;
+  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
+  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
+  bool speed = data->speed, hot_p = data->hot_innermost_loop_p;
+
+  /* If there is a call in the loop body, the call-clobbered registers
+ are not available for loop invariants.  */
+  if (data->body_includes_call)
+available_regs = available_regs - target_clobbered_regs;
+
+  /* If we have enough registers.  */
+  if (regs_needed + target_res_regs < available_regs)
+{
+  /* For the maybe hot innermost loop, we use available registers and
+not restrict the transformations unnecessarily.  For other loops,
+we want to use fewer register.  */
+  cost = hot_p ? 0 : target_reg_cost [speed] * n_new; //regs_needed;
+}
+  /* If close to running out of registers, try to preserve them.  */
+  else if (regs_needed <= available_regs)
+cost = target_reg_cost [speed] * regs_needed;
+  /* If we run out of available registers but the number of candidates
+ does not, we penalize extra registers using target_spill_cost.  */
+  else if (n_cands <= available_regs)
+cost = target_reg_cost [speed] * available_regs
+  + target_spill_cost [speed] * (regs_needed - available_regs);
+  /* If the number of candidates runs out available registers, we penalize
+ extra candidate registers using target_spill_cost * 2.  Because it is
+ more expensive to spill induction variable than invariant.  */
+  else
+cost = target_reg_cost [speed] * available_regs
+  + target_spill_cost [speed] * (n_cands - available_regs) * 2
+  + target_spill_cost [speed] * (regs_needed - n_cands);
+
+  /* Finally, add the number of candidates, so that we prefer eliminating
+ induction variables if possible.  */
+  return cost + n_cands;
 }
 
 /* For each size of the induction variable set determine the penalty.  */
@@

[PATCH GCC8][28/33]Don't count non-interger PHIs for register pressure

2017-04-18 Thread Bin Cheng

Hi,
Given only integer variables are meaningful for register pressure estimation in 
IVOPTs,
this patch skips non-integer type PHIs when counting register pressure.
Is it OK?
Thanks,
bin

2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (determine_set_costs): Skip non-interger
when counting register pressure.

From ea74dcacc97e4aee0de952dc0142d71502cc5252 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 7 Mar 2017 16:26:27 +
Subject: [PATCH 28/33] skip-non_int-phi-reg-pressure-20170221.txt

---
 gcc/tree-ssa-loop-ivopts.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 0b9170c..db8254c 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5583,6 +5583,9 @@ determine_set_costs (struct ivopts_data *data)
   if (get_iv (data, op))
continue;
 
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+   continue;
+
   n++;
 }
 
-- 
1.9.1

[PATCH GCC8][27/33]Extend candidate set if new_cp has cheaper dependence

2017-04-18 Thread Bin Cheng

Hi,
Currently we only allow iv_ca extension if new_cp has cheaper cost and less 
deps than old_cp.
This is inaccurate because it's possible the overall deps is reduced even 
new_cp has more deps
than old_cp.  This happens in case that new_cp's deps are already in iv_ca.  
This patch allows
more iv_ca extension by checking overall deps in iv_ca.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (compare_cost_pair): New.
(iv_ca_more_deps): Renamed to ...
(iv_ca_compare_deps): ... this.
(iv_ca_extend): Extend iv_ca if NEW_CP is cheaper than OLD_CP.From 9f45d3762e60b3c1d2de17ef3683f944be408f96 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 31 Mar 2017 14:18:55 +0100
Subject: [PATCH 27/33] extend-cand-with-cheaper-deps-20170310.txt

---
 gcc/tree-ssa-loop-ivopts.c | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b93a589..0b9170c 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5633,6 +5633,19 @@ cheaper_cost_pair (struct cost_pair *a, struct cost_pair 
*b)
   return false;
 }
 
+/* Compare if A is a more expensive cost pair than B.  Return 1, 0 and -1
+   for more expensive, equal and cheaper respectively.  */
+
+static int
+compare_cost_pair (struct cost_pair *a, struct cost_pair *b)
+{
+  if (cheaper_cost_pair (a, b))
+return -1;
+  if (cheaper_cost_pair (b, a))
+return 1;
+
+  return 0;
+}
 
 /* Returns candidate by that USE is expressed in IVS.  */
 
@@ -5818,13 +5831,14 @@ iv_ca_cost (struct iv_ca *ivs)
 return ivs->cost;
 }
 
-/* Returns true if applying NEW_CP to GROUP for IVS introduces more
-   invariants than OLD_CP.  */
+/* Compare if applying NEW_CP to GROUP for IVS introduces more invariants
+   than OLD_CP.  Return 1, 0 and -1 for more, equal and fewer invariants
+   respectively.  */
 
-static bool
-iv_ca_more_deps (struct ivopts_data *data, struct iv_ca *ivs,
-struct iv_group *group, struct cost_pair *old_cp,
-struct cost_pair *new_cp)
+static int
+iv_ca_compare_deps (struct ivopts_data *data, struct iv_ca *ivs,
+   struct iv_group *group, struct cost_pair *old_cp,
+   struct cost_pair *new_cp)
 {
   gcc_assert (old_cp && new_cp && old_cp != new_cp);
   unsigned old_n_invs = ivs->n_invs;
@@ -5832,7 +5846,7 @@ iv_ca_more_deps (struct ivopts_data *data, struct iv_ca 
*ivs,
   unsigned new_n_invs = ivs->n_invs;
   iv_ca_set_cp (data, ivs, group, old_cp);
 
-  return (new_n_invs > old_n_invs);
+  return new_n_invs > old_n_invs ? 1 : (new_n_invs < old_n_invs ? -1 : 0);
 }
 
 /* Creates change of expressing GROUP by NEW_CP instead of OLD_CP and chains
@@ -6064,11 +6078,18 @@ iv_ca_extend (struct ivopts_data *data, struct iv_ca 
*ivs,
   if (!new_cp)
continue;
 
-  if (!min_ncand && iv_ca_more_deps (data, ivs, group, old_cp, new_cp))
-   continue;
+  if (!min_ncand)
+   {
+ int cmp_invs = iv_ca_compare_deps (data, ivs, group, old_cp, new_cp);
+ /* Skip if new_cp depends on more invariants.  */
+ if (cmp_invs > 0)
+   continue;
 
-  if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
-   continue;
+ int cmp_cost = compare_cost_pair (new_cp, old_cp);
+ /* Skip if new_cp is not cheaper.  */
+ if (cmp_cost > 0 || (cmp_cost == 0 && cmp_invs == 0))
+   continue;
+   }
 
   *delta = iv_ca_delta_add (group, old_cp, new_cp, *delta);
 }
-- 
1.9.1

[PATCH GCC8][26/33]Record newly used inv_vars during cost computation

2017-04-18 Thread Bin Cheng

Hi,
At the moment, inv_vars are recognized during finding iv_uses.  It's also
possible that inv_vars are used when expressing iv_use with specific cand.
Unfortunately, such inv_vars are neither recognized nor considered in reg
pressure estimation.  This patch modifies find_inv_vars_cb so that such
invariant variables are also recognized and recorded.  The patch also moves
dump information later so that all invariant variables can be dumped.
Is it OK?

Thanks
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (find_interesting_uses): Move inv vars dump
to ...
(determine_group_iv_costs): ... here.
(find_inv_vars_cb): Record inv var if it's not recorded before.From 35a08dbd8d12cb186bf725b5af7837a86c29167d Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 14 Mar 2017 13:48:17 +
Subject: [PATCH 26/33] record-newly-used-inv_var-20170224.txt

---
 gcc/tree-ssa-loop-ivopts.c | 56 +-
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 8469782..b93a589 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2686,32 +2686,16 @@ find_interesting_uses (struct ivopts_data *data)
if (!is_gimple_debug (gsi_stmt (bsi)))
  find_interesting_uses_stmt (data, gsi_stmt (bsi));
 }
+  free (body);
 
   split_address_groups (data);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  bitmap_iterator bi;
-
-  fprintf (dump_file, "\n:\n");
-  EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, i, bi)
-   {
- struct version_info *info = ver_info (data, i);
- if (info->inv_id)
-   {
- fprintf (dump_file, "Inv %d:\t", info->inv_id);
- print_generic_expr (dump_file, info->name, TDF_SLIM);
- fprintf (dump_file, "%s\n",
-  info->has_nonlin_use ? "" : "\t(eliminable)");
-   }
-   }
-
   fprintf (dump_file, "\n:\n");
   dump_groups (dump_file, data);
   fprintf (dump_file, "\n");
 }
-
-  free (body);
 }
 
 /* Strips constant offsets from EXPR and stores them to OFFSET.  If INSIDE_ADDR
@@ -2928,13 +2912,28 @@ struct walk_tree_data
 static tree
 find_inv_vars_cb (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
 {
-  struct walk_tree_data *wdata = (struct walk_tree_data*) data;
+  tree op = *expr_p;
   struct version_info *info;
+  struct walk_tree_data *wdata = (struct walk_tree_data*) data;
 
-  if (TREE_CODE (*expr_p) != SSA_NAME)
+  if (TREE_CODE (op) != SSA_NAME)
 return NULL_TREE;
 
-  info = name_info (wdata->idata, *expr_p);
+  info = name_info (wdata->idata, op);
+  /* Because we expand simple operations when finding IVs, loop invariant
+ variable that isn't referred by the original loop could be used now.
+ Record such invariant variables here.  */
+  if (!info->iv)
+{
+  struct ivopts_data *idata = wdata->idata;
+  basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (op));
+
+  if (!bb || !flow_bb_inside_loop_p (idata->current_loop, bb))
+   {
+ set_iv (idata, op, op, build_int_cst (TREE_TYPE (op), 0), true);
+ record_invariant (idata, op, false);
+   }
+}
   if (!info->inv_id || info->has_nonlin_use)
 return NULL_TREE;
 
@@ -5395,6 +5394,23 @@ determine_group_iv_costs (struct ivopts_data *data)
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
+  bitmap_iterator bi;
+
+  /* Dump invariant variables.  */
+  fprintf (dump_file, "\n:\n");
+  EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, i, bi)
+   {
+ struct version_info *info = ver_info (data, i);
+ if (info->inv_id)
+   {
+ fprintf (dump_file, "Inv %d:\t", info->inv_id);
+ print_generic_expr (dump_file, info->name, TDF_SLIM);
+ fprintf (dump_file, "%s\n",
+  info->has_nonlin_use ? "" : "\t(eliminable)");
+   }
+   }
+
+  /* Dump invariant expressions.  */
   fprintf (dump_file, "\n:\n");
   auto_vec  list (data->inv_expr_tab->elements ());
 
-- 
1.9.1

[PATCH GCC8][25/33]New loop constraint flags

2017-04-18 Thread Bin Cheng

Hi,
This patch adds new loop constraint flags marking prologue, epilogue and 
versioned loops generated
by vectorizer, unroller and versioning.  These flags will be used in IVOPTs in 
order to differentiate
possible hot innermost loop from others.  I also plan to use them to avoid 
unnecessary cunroll on
such loops.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* cfgloop.h (LOOP_C_PROLOG, LOOP_C_EPILOG, LOOP_C_VERSION): New.
* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Set
LOOP_C_EPILOG for unrolled epilogue loop.
(vect_do_peeling): Set LOOP_C_PROLOG and LOOP_C_EPILOG for peeled
loops.
(vect_loop_versioning): Set LOOP_C_VERSION for versioned loop.From 432006f72b95826eadb4c972a55b1aeb89c9998b Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 30 Mar 2017 10:56:19 +0100
Subject: [PATCH 25/33] new-loop-constraint-flags-20170310.txt

---
 gcc/cfgloop.h  | 6 ++
 gcc/tree-ssa-loop-manip.c  | 1 +
 gcc/tree-vect-loop-manip.c | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index a8bec1d..90be4cc 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -248,6 +248,12 @@ struct GTY ((chain_next ("%h.next"))) loop {
 #define LOOP_C_INFINITE(1 << 0)
 /* Set if the loop is known to be finite without any assumptions.  */
 #define LOOP_C_FINITE  (1 << 1)
+/* Set if the loop is peeled prologue loop.  */
+#define LOOP_C_PROLOG  (1 << 2)
+/* Set if the loop is peeled epilogue loop.  */
+#define LOOP_C_EPILOG  (1 << 3)
+/* Set if the loop is versioned loop.  */
+#define LOOP_C_VERSION (1 << 4)
 
 /* Set C to the LOOP constraint.  */
 static inline void
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 22c832a..0559904 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1233,6 +1233,7 @@ tree_transform_and_unroll_loop (struct loop *loop, 
unsigned factor,
   scale_unrolled, scale_rest, true);
   gcc_assert (new_loop != NULL);
   update_ssa (TODO_update_ssa);
+  loop_constraint_set (new_loop, LOOP_C_EPILOG);
 
   /* Prepare the cfg and update the phi nodes.  Move the loop exit to the
  loop latch (and make its condition dummy, for the moment).  */
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index f48336b..faeaa6d 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1735,6 +1735,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  gcc_unreachable ();
}
   slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
+  loop_constraint_set (prolog, LOOP_C_PROLOG);
   first_loop = prolog;
   reset_original_copy_tables ();
 
@@ -1799,6 +1800,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  gcc_unreachable ();
}
   slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
+  loop_constraint_set (epilog, LOOP_C_EPILOG);
 
   /* Scalar version loop may be preferred.  In this case, add guard
 and skip to epilog.  Note this only happens when the number of
@@ -2400,6 +2402,7 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
  prob, REG_BR_PROB_BASE - prob,
  prob, REG_BR_PROB_BASE - prob, true);
 
+  loop_constraint_set (nloop, LOOP_C_VERSION);
   if (version_niter)
 {
   /* The versioned loop could be infinite, we need to clear existing
-- 
1.9.1

[PATCH GCC8][23/33]Simple comment adjustment

2017-04-18 Thread Bin Cheng

Hi,
This is simple comment adjustment.

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (allow_ip_end_pos_p): Refine comments.
(get_shiftadd_cost): Ditto.From 9f2ca115ef2e376a472d591864a2d7a7dd9daacf Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 17:36:30 +
Subject: [PATCH 23/33] comment-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 21761e2..dcc4618 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3112,9 +3112,9 @@ add_candidate_1 (struct ivopts_data *data,
 
The purpose is to avoid splitting latch edge with a biv increment, thus
creating a jump, possibly confusing other optimization passes and leaving
-   less freedom to scheduler.  So we allow IP_END_POS only if IP_NORMAL_POS
-   is not available (so we do not have a better alternative), or if the latch
-   edge is already nonempty.  */
+   less freedom to scheduler.  So we allow IP_END only if IP_NORMAL is not
+   available (so we do not have a better alternative), or if the latch edge
+   is already nonempty.  */
 
 static bool
 allow_ip_end_pos_p (struct loop *loop)
@@ -3959,10 +3959,10 @@ adjust_setup_cost (struct ivopts_data *data, unsigned 
cost,
 return cost;
 }
 
- /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
-EXPR operand holding the shift.  COST0 and COST1 are the costs for
-calculating the operands of EXPR.  Returns true if successful, and returns
-the cost in COST.  */
+/* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
+   EXPR operand holding the shift.  COST0 and COST1 are the costs for
+   calculating the operands of EXPR.  Returns true if successful, and returns
+   the cost in COST.  */
 
 static bool
 get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
-- 
1.9.1

[PATCH 24/33]New parameter bound on number of selected candidates

2017-04-18 Thread Bin Cheng

Hi,
IVOPTs still have difficulty for outer loop (especially for large loop nest), 
and tend to select too many candidates.
It's generally bad because of unavoidable register spilling.  In this case, we 
probably want to compute iv_uses with
small number of bivs.  Though this results in more computation inside of loop, 
it could improve spilling.
This patch adds new parameter bound on number of selected candidates, it simply 
gives up if too many candidates are
selected.  So far it works loop by loop, I am not sure if we want to by pass 
whole loop nest once this bound is hit.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* doc/invoke.texi (iv-max-selected-candidates): New.
* params.def (PARAM_IV_MAX_SELECTED_CANDIDATES): New.
* tree-ssa-loop-ivopts.c (MAX_SELECTED_CANDIDATES): New.
(tree_ssa_iv_optimize_loop): Skip if too many cands are selected.From 40517ca836f868b8bd79bde56aa7c053ffef4fc2 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 7 Mar 2017 13:53:04 +
Subject: [PATCH 24/33] add-bound-on-selected-cands-20170221.txt

---
 gcc/doc/invoke.texi| 4 
 gcc/params.def | 8 
 gcc/tree-ssa-loop-ivopts.c | 8 +++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 19a85b6..f9cbdbb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9922,6 +9922,10 @@ If the number of candidates in the set is smaller than 
this value,
 always try to remove unnecessary ivs from the set
 when adding a new one.
 
+@item iv-max-selected-candidates
+The induction variable optimizations give up on loops that more induction
+variable candidates are selected.
+
 @item avg-loop-niter
 Average number of iterations of a loop.
 
diff --git a/gcc/params.def b/gcc/params.def
index 1b058e4..7daab14 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -527,6 +527,14 @@ DEFPARAM(PARAM_IV_ALWAYS_PRUNE_CAND_SET_BOUND,
 "If number of candidates in the set is smaller, we always try to 
remove unused ivs during its optimization.",
 10, 0, 0)
 
+/* The induction variable optimizations give up on loops that more induction
+   variable candidates are selected.  */
+
+DEFPARAM(PARAM_IV_MAX_SELECTED_CANDIDATES,
+"iv-max-selected-candidates",
+"Bound on number of selected iv candidates for loops in iv 
optimizations.",
+48, 0, 0)
+
 DEFPARAM(PARAM_AVG_LOOP_NITER,
 "avg-loop-niter",
 "Average number of iterations of a loop.",
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index dcc4618..8469782 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -667,6 +667,12 @@ struct iv_ca_delta
 #define ALWAYS_PRUNE_CAND_SET_BOUND \
   ((unsigned) PARAM_VALUE (PARAM_IV_ALWAYS_PRUNE_CAND_SET_BOUND))
 
+/* If there are more candidates slected, we just give up because it usually
+   causes high register pressure issue.  */
+
+#define MAX_SELECTED_CANDIDATES \
+  ((unsigned) PARAM_VALUE (PARAM_IV_MAX_SELECTED_CANDIDATES))
+
 /* The list of trees for that the decl_rtl field must be reset is stored
here.  */
 
@@ -7382,7 +7388,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, 
struct loop *loop)
 
   /* Find the optimal set of induction variables (item 3, part 2).  */
   iv_ca = find_optimal_iv_set (data);
-  if (!iv_ca)
+  if (!iv_ca || iv_ca->n_cands > MAX_SELECTED_CANDIDATES)
 goto finish;
   changed = true;
 
-- 
1.9.1

[PATCH GCC8][21/33]Support compare iv_use which both sides of comparison are IVs

2017-04-18 Thread Bin Cheng

Hi,
This patch supports compare iv_use for comparison whose both sides are IVs.
With this patch, optimal code is generated for PR53090.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

    PR tree-optimization/53090
    * tree-ssa-loop-ivopts.c (enum comp_iv_rewrite): New enum value
    COMP_IV_EXPR_2.
    (extract_cond_operands): Detect condition with IV on both sides
    and return COMP_IV_EXPR_2.
    (find_interesting_uses_cond): Add iv_use for both IVs in condition.
    (rewrite_use_compare): Simplify by removing call to function
    extract_cond_operands.From e1037de191416b437f6b1709dd70bb42606d8400 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 10 Apr 2017 12:06:25 +0100
Subject: [PATCH 21/33] iv-comparing-against-another-iv-20170313.txt

---
 gcc/tree-ssa-loop-ivopts.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 08b6245..21761e2 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -1651,6 +1651,9 @@ enum comp_iv_rewrite
   COMP_IV_NA,
   /* We may rewrite compare type iv_use by expressing value of the iv_use.  */
   COMP_IV_EXPR,
+  /* We may rewrite compare type iv_uses on both sides of comparison by
+ expressing value of each iv_use.  */
+  COMP_IV_EXPR_2,
   /* We may rewrite compare type iv_use by expressing value of the iv_use
  or by eliminating it with other iv_cand.  */
   COMP_IV_ELIM
@@ -1696,9 +1699,12 @@ extract_cond_operands (struct ivopts_data *data, gimple 
*stmt,
   if (TREE_CODE (*op1) == SSA_NAME)
 iv1 = get_iv (data, *op1);
 
-  /* If both sides of comparison are IVs.  */
+  /* If both sides of comparison are IVs.  We can express ivs on both end.  */
   if (iv0 && iv1 && !integer_zerop (iv0->step) && !integer_zerop (iv1->step))
-goto end;
+{
+  rewrite_type = COMP_IV_EXPR_2;
+  goto end;
+}
 
   /* If none side of comparison is IV.  */
   if ((!iv0 || integer_zerop (iv0->step))
@@ -1738,10 +1744,11 @@ static void
 find_interesting_uses_cond (struct ivopts_data *data, gimple *stmt)
 {
   tree *var_p, *bound_p;
-  struct iv *var_iv;
+  struct iv *var_iv, *bound_iv;
   enum comp_iv_rewrite ret;
 
-  ret = extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL);
+  ret = extract_cond_operands (data, stmt,
+  &var_p, &bound_p, &var_iv, &bound_iv);
   if (ret == COMP_IV_NA)
 {
   find_interesting_uses_op (data, *var_p);
@@ -1750,6 +1757,9 @@ find_interesting_uses_cond (struct ivopts_data *data, 
gimple *stmt)
 }
 
   record_group_use (data, var_p, var_iv, stmt, USE_COMPARE);
+  /* Record compare type iv_use for iv on the other side of comparison.  */
+  if (ret == COMP_IV_EXPR_2)
+record_group_use (data, bound_p, bound_iv, stmt, USE_COMPARE);
 }
 
 /* Returns the outermost loop EXPR is obviously invariant in
@@ -6953,12 +6963,11 @@ static void
 rewrite_use_compare (struct ivopts_data *data,
 struct iv_use *use, struct iv_cand *cand)
 {
-  tree comp, *var_p, op, bound;
+  tree comp, op, bound;
   gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
   enum tree_code compare;
   struct iv_group *group = data->vgroups[use->group_id];
   struct cost_pair *cp = get_group_iv_cost (data, group, cand);
-  enum comp_iv_rewrite rewrite_type;
 
   bound = cp->value;
   if (bound)
@@ -6991,13 +7000,10 @@ rewrite_use_compare (struct ivopts_data *data,
  giv.  */
   comp = get_computation_at (data->current_loop, use->stmt, use, cand);
   gcc_assert (comp != NULL_TREE);
-
-  rewrite_type = extract_cond_operands (data, use->stmt,
-   &var_p, NULL, NULL, NULL);
-  gcc_assert (rewrite_type != COMP_IV_NA);
-
-  *var_p = force_gimple_operand_gsi (&bsi, comp, true, SSA_NAME_VAR (*var_p),
-true, GSI_SAME_STMT);
+  gcc_assert (use->op_p != NULL);
+  *use->op_p = force_gimple_operand_gsi (&bsi, comp, true,
+SSA_NAME_VAR (*use->op_p),
+true, GSI_SAME_STMT);
 }
 
 /* Rewrite the groups using the selected induction variables.  */
-- 
1.9.1

[PATCH GCC8][22/33]Generate TMR in new reassociation order

2017-04-18 Thread Bin Cheng

Hi,
This patch generates TMR for ivopts in new re-association order.  General idea 
is,
by querying target's addressing mode, we put as much address computation as 
possible
in memory reference.  For computation that has to be done outside of memory 
reference,
we re-associate the address expression in new order so that loop invariant 
expression
is kept and exposed for later lim pass.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-address.c: Include header file.
(move_hint_to_base): Return TRUE if BASE_HINT is moved to memory
address.
(add_to_parts): Refactor.
(addr_to_parts): New parameter.  Update use of move_hint_to_base.
(create_mem_ref): Update use of addr_to_parts.  Re-associate addr
in new order.From 4261c98f8e6dca7a38ed53b9b49c9f59e1906c30 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 2 Mar 2017 09:29:50 +
Subject: [PATCH 22/33] address-iv_use-rewrite-20170220.txt

---
 gcc/tree-ssa-address.c | 160 -
 1 file changed, 106 insertions(+), 54 deletions(-)

diff --git a/gcc/tree-ssa-address.c b/gcc/tree-ssa-address.c
index 8aefed6..1b73034 100644
--- a/gcc/tree-ssa-address.c
+++ b/gcc/tree-ssa-address.c
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "dumpfile.h"
 #include "tree-affine.h"
+#include "gimplify.h"
 
 /* FIXME: We compute address costs using RTL.  */
 #include "tree-ssa-address.h"
@@ -427,9 +428,10 @@ move_fixed_address_to_symbol (struct mem_address *parts, 
aff_tree *addr)
   aff_combination_remove_elt (addr, i);
 }
 
-/* If ADDR contains an instance of BASE_HINT, move it to PARTS->base.  */
+/* Return true if ADDR contains an instance of BASE_HINT and it's moved to
+   PARTS->base.  */
 
-static void
+static bool
 move_hint_to_base (tree type, struct mem_address *parts, tree base_hint,
   aff_tree *addr)
 {
@@ -448,7 +450,7 @@ move_hint_to_base (tree type, struct mem_address *parts, 
tree base_hint,
 }
 
   if (i == addr->n)
-return;
+return false;
 
   /* Cast value to appropriate pointer type.  We cannot use a pointer
  to TYPE directly, as the back-end will assume registers of pointer
@@ -458,6 +460,7 @@ move_hint_to_base (tree type, struct mem_address *parts, 
tree base_hint,
   type = build_qualified_type (void_type_node, qual);
   parts->base = fold_convert (build_pointer_type (type), val);
   aff_combination_remove_elt (addr, i);
+  return true;
 }
 
 /* If ADDR contains an address of a dereferenced pointer, move it to
@@ -535,8 +538,7 @@ add_to_parts (struct mem_address *parts, tree elt)
   if (POINTER_TYPE_P (type))
 parts->base = fold_build_pointer_plus (parts->base, elt);
   else
-parts->base = fold_build2 (PLUS_EXPR, type,
-  parts->base, elt);
+parts->base = fold_build2 (PLUS_EXPR, type, parts->base, elt);
 }
 
 /* Returns true if multiplying by RATIO is allowed in an address.  Test the
@@ -668,7 +670,8 @@ most_expensive_mult_to_index (tree type, struct mem_address 
*parts,
 /* Splits address ADDR for a memory access of type TYPE into PARTS.
If BASE_HINT is non-NULL, it specifies an SSA name to be used
preferentially as base of the reference, and IV_CAND is the selected
-   iv candidate used in ADDR.
+   iv candidate used in ADDR.  Store true to VAR_IN_BASE if variant
+   part of address is split to PARTS.base.
 
TODO -- be more clever about the distribution of the elements of ADDR
to PARTS.  Some architectures do not support anything but single
@@ -678,9 +681,8 @@ most_expensive_mult_to_index (tree type, struct mem_address 
*parts,
addressing modes is useless.  */
 
 static void
-addr_to_parts (tree type, aff_tree *addr, tree iv_cand,
-  tree base_hint, struct mem_address *parts,
-   bool speed)
+addr_to_parts (tree type, aff_tree *addr, tree iv_cand, tree base_hint,
+  struct mem_address *parts, bool *var_in_base, bool speed)
 {
   tree part;
   unsigned i;
@@ -698,23 +700,20 @@ addr_to_parts (tree type, aff_tree *addr, tree iv_cand,
   /* Try to find a symbol.  */
   move_fixed_address_to_symbol (parts, addr);
 
-  /* No need to do address parts reassociation if the number of parts
- is <= 2 -- in that case, no loop invariant code motion can be
- exposed.  */
-
-  if (!base_hint && (addr->n > 2))
+  /* Since at the moment there is no reliable way to know how to
+ distinguish between pointer and its offset, we decide if var
+ part is the pointer based on guess.  */
+  *var_in_base = (base_hint != NULL && parts->symbol == NULL);
+  if (*var_in_base)
+*var_in_base = move_hint_to_base (type, parts, base_hint, addr);
+  else
 move_variant_to_index (parts, addr, iv_cand);
 
-  /* First move the most expensive feasible multiplication
- to index.  */
+  /* First move the most expensive feasible multiplication to index.  */
   if (!parts->index)

[PATCH GCC8][20/33]Support compare iv_use which is compared against arbitrary variable

2017-04-18 Thread Bin Cheng

Hi,
Currently we only support compare iv_use if the other side of comparison is 
loop invariant.
Such compare iv_use can never be eliminated, but it still can be expressed.  
This patch
supports the case that the other side of comparison is an arbitrary non-iv 
variables.
Is it OK?

Thanks,
bin

2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (enum comp_iv_rewrite): New.
(extract_cond_operands): Detect condition comparing against non-
invariant bound and return appropriate enum value.
(find_interesting_uses_cond): Update use of extract_cond_operands.
Handle its return value accordingly.
(determine_group_iv_cost_cond, rewrite_use_compare): Ditto.
From e280e72f2019f1fab7048a5f81534ac509066825 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 17:17:52 +
Subject: [PATCH 20/33] iv-comparing-against-non_invariant-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 71 ++
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 6e9df43..08b6245 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -1645,6 +1645,17 @@ find_interesting_uses_op (struct ivopts_data *data, tree 
op)
   return use;
 }
 
+/* Indicate how compare type iv_use can be handled.  */
+enum comp_iv_rewrite
+{
+  COMP_IV_NA,
+  /* We may rewrite compare type iv_use by expressing value of the iv_use.  */
+  COMP_IV_EXPR,
+  /* We may rewrite compare type iv_use by expressing value of the iv_use
+ or by eliminating it with other iv_cand.  */
+  COMP_IV_ELIM
+};
+
 /* Given a condition in statement STMT, checks whether it is a compare
of an induction variable and an invariant.  If this is the case,
CONTROL_VAR is set to location of the iv, BOUND to the location of
@@ -1653,7 +1664,7 @@ find_interesting_uses_op (struct ivopts_data *data, tree 
op)
the case, CONTROL_VAR and BOUND are set to the arguments of the
condition and false is returned.  */
 
-static bool
+static enum comp_iv_rewrite
 extract_cond_operands (struct ivopts_data *data, gimple *stmt,
   tree **control_var, tree **bound,
   struct iv **iv_var, struct iv **iv_bound)
@@ -1663,7 +1674,7 @@ extract_cond_operands (struct ivopts_data *data, gimple 
*stmt,
   static tree zero;
   tree *op0 = &zero, *op1 = &zero;
   struct iv *iv0 = &const_iv, *iv1 = &const_iv;
-  bool ret = false;
+  enum comp_iv_rewrite rewrite_type = COMP_IV_NA;
 
   if (gimple_code (stmt) == GIMPLE_COND)
 {
@@ -1685,18 +1696,27 @@ extract_cond_operands (struct ivopts_data *data, gimple 
*stmt,
   if (TREE_CODE (*op1) == SSA_NAME)
 iv1 = get_iv (data, *op1);
 
-  /* Exactly one of the compared values must be an iv, and the other one must
- be an invariant.  */
-  if (!iv0 || !iv1)
+  /* If both sides of comparison are IVs.  */
+  if (iv0 && iv1 && !integer_zerop (iv0->step) && !integer_zerop (iv1->step))
 goto end;
 
-  if (integer_zerop (iv0->step))
+  /* If none side of comparison is IV.  */
+  if ((!iv0 || integer_zerop (iv0->step))
+  && (!iv1 || integer_zerop (iv1->step)))
+goto end;
+
+  /* Control variable may be on the other side.  */
+  if (!iv0 || integer_zerop (iv0->step))
 {
-  /* Control variable may be on the other side.  */
   std::swap (op0, op1);
   std::swap (iv0, iv1);
 }
-  ret = !integer_zerop (iv0->step) && integer_zerop (iv1->step);
+  /* If one side is IV and the other side isn't loop invariant.  */
+  if (!iv1)
+rewrite_type = COMP_IV_EXPR;
+  /* If one side is IV and the other side is loop invariant.  */
+  else if (!integer_zerop (iv0->step) && integer_zerop (iv1->step))
+rewrite_type = COMP_IV_ELIM;
 
 end:
   if (control_var)
@@ -1708,7 +1728,7 @@ end:
   if (iv_bound)
 *iv_bound = iv1;
 
-  return ret;
+  return rewrite_type;
 }
 
 /* Checks whether the condition in STMT is interesting and if so,
@@ -1719,15 +1739,17 @@ find_interesting_uses_cond (struct ivopts_data *data, 
gimple *stmt)
 {
   tree *var_p, *bound_p;
   struct iv *var_iv;
+  enum comp_iv_rewrite ret;
 
-  if (!extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL))
+  ret = extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL);
+  if (ret == COMP_IV_NA)
 {
   find_interesting_uses_op (data, *var_p);
   find_interesting_uses_op (data, *bound_p);
   return;
 }
 
-  record_group_use (data, NULL, var_iv, stmt, USE_COMPARE);
+  record_group_use (data, var_p, var_iv, stmt, USE_COMPARE);
 }
 
 /* Returns the outermost loop EXPR is obviously invariant in
@@ -5068,15 +5090,21 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   struct iv *cmp_iv;
   bitmap inv_exprs = NULL;
   bitmap inv_vars_elim = NULL, inv_vars_express = NULL, inv_vars;
-  comp_cost elim_cost, express_cost, cost, bound_cost;
-  bool ok;
+  comp_cost elim_cost = infinite_cost, express_cost, cost, boun

[PATCH GCC8][19/33]Rewrite nonlinear iv_use by re-associating invariant and induction parts separately

2017-04-18 Thread Bin Cheng

Hi,
This patch rewrites nonlinear iv_use by re-associating invariant part and
induction part separately so that invariant expressions are exposed to
later lim pass.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (rewrite_use_nonlinear_expr): Re-associate
nonlinear iv_use computation in loop invariant sensitive way.From 12aaf8c773f6215205ddadd182d162fa68195198 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 16:22:13 +
Subject: [PATCH 19/33] nonlinear-iv_use-rewrite-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 52 +++---
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 0f78a46..6e9df43 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -6659,10 +6659,9 @@ static void
 rewrite_use_nonlinear_expr (struct ivopts_data *data,
struct iv_use *use, struct iv_cand *cand)
 {
-  tree comp;
-  tree tgt;
   gassign *ass;
   gimple_stmt_iterator bsi;
+  tree comp, type = get_use_type (use), tgt;
 
   /* An important special case -- if we are asked to express value of
  the original iv by itself, just exit; there is no need to
@@ -6706,9 +6705,6 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
}
 }
 
-  comp = get_computation_at (data->current_loop, use->stmt, use, cand);
-  gcc_assert (comp != NULL_TREE);
-
   switch (gimple_code (use->stmt))
 {
 case GIMPLE_PHI:
@@ -6730,6 +6726,47 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
   gcc_unreachable ();
 }
 
+  aff_tree aff_inv, aff_var;
+  if (!get_computation_aff_1 (data->current_loop, use->stmt,
+ use, cand, &aff_inv, &aff_var))
+gcc_unreachable ();
+
+  unshare_aff_combination (&aff_inv);
+  unshare_aff_combination (&aff_var);
+  /* Prefer CSE opportunity than loop invariantby adding offset at last
+ so that iv_uses only have different offsets can be CSEed.  */
+  widest_int offset = aff_inv.offset;
+  aff_inv.offset = 0;
+
+  gimple_seq stmt_list = NULL, seq = NULL;
+  tree comp_op1 = aff_combination_to_tree (&aff_inv);
+  tree comp_op2 = aff_combination_to_tree (&aff_var);
+  gcc_assert (comp_op1 && comp_op2);
+
+  comp_op1 = force_gimple_operand (comp_op1, &seq, true, NULL);
+  gimple_seq_add_seq (&stmt_list, seq);
+  comp_op2 = force_gimple_operand (comp_op2, &seq, true, NULL);
+  gimple_seq_add_seq (&stmt_list, seq);
+
+  if (POINTER_TYPE_P (TREE_TYPE (comp_op2)))
+std::swap (comp_op1, comp_op2);
+
+  if (POINTER_TYPE_P (TREE_TYPE (comp_op1)))
+{
+  comp = fold_build_pointer_plus (comp_op1,
+ fold_convert (sizetype, comp_op2));
+  comp = fold_build_pointer_plus (comp,
+ wide_int_to_tree (sizetype, offset));
+}
+  else
+{
+  comp = fold_build2 (PLUS_EXPR, TREE_TYPE (comp_op1), comp_op1,
+ fold_convert (TREE_TYPE (comp_op1), comp_op2));
+  comp = fold_build2 (PLUS_EXPR, TREE_TYPE (comp_op1), comp,
+ wide_int_to_tree (TREE_TYPE (comp_op1), offset));
+}
+
+  comp = fold_convert (type, comp);
   if (!valid_gimple_rhs_p (comp)
   || (gimple_code (use->stmt) != GIMPLE_PHI
  /* We can't allow re-allocating the stmt as it might be pointed
@@ -6737,8 +6774,8 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
  && (get_gimple_rhs_num_ops (TREE_CODE (comp))
  >= gimple_num_ops (gsi_stmt (bsi)
 {
-  comp = force_gimple_operand_gsi (&bsi, comp, true, NULL_TREE,
-  true, GSI_SAME_STMT);
+  comp = force_gimple_operand (comp, &seq, true, NULL);
+  gimple_seq_add_seq (&stmt_list, seq);
   if (POINTER_TYPE_P (TREE_TYPE (tgt)))
{
  duplicate_ssa_name_ptr_info (comp, SSA_NAME_PTR_INFO (tgt));
@@ -6749,6 +6786,7 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
}
 }
 
+  gsi_insert_seq_before (&bsi, stmt_list, GSI_SAME_STMT);
   if (gimple_code (use->stmt) == GIMPLE_PHI)
 {
   ass = gimple_build_assign (tgt, comp);
-- 
1.9.1

[PATCH GCC8][17/33]Treat complex cand step as invriant expression

2017-04-18 Thread Bin Cheng

Hi,
We generally need to compute cand step in loop preheader and use it in loop 
body.
Unless it's an SSA_NAME of constant integer, an invariant expression is needed.

Thanks,
bin

2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct iv_cand): New field inv_exprs.
(dump_cand): Support iv_cand.inv_exprs.
(add_candidate_1): Record invariant exprs in iv_cand.inv_exprs
for candidates.
(iv_ca_set_no_cp, iv_ca_set_cp, free_loop_data): Support
iv_cand.inv_exprs.From 06806d09f557854a5987b83a044a5eb5433cda60 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 14:50:11 +
Subject: [PATCH 17/33] treat-cand_step-as-inv_expr-20170225.txt

---
 gcc/tree-ssa-loop-ivopts.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index c3e9bce..2c6fa76 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -421,6 +421,7 @@ struct iv_cand
  where it is incremented.  */
   bitmap inv_vars; /* The list of invariants that are used in step of the
   biv.  */
+  bitmap inv_exprs;/* Hanlde step as inv expr if it's not simple.  */
   struct iv *orig_iv;  /* The original iv if this cand is added from biv with
   smaller type.  */
 };
@@ -790,6 +791,11 @@ dump_cand (FILE *file, struct iv_cand *cand)
   fprintf (file, "  Depend on inv.vars: ");
   dump_bitmap (file, cand->inv_vars);
 }
+  if (cand->inv_exprs)
+{
+  fprintf (file, "  Depend on inv.exprs: ");
+  dump_bitmap (file, cand->inv_exprs);
+}
 
   if (cand->var_before)
 {
@@ -3032,7 +3038,23 @@ add_candidate_1 (struct ivopts_data *data,
   data->vcands.safe_push (cand);
 
   if (TREE_CODE (step) != INTEGER_CST)
-   find_inv_vars (data, &step, &cand->inv_vars);
+   {
+ find_inv_vars (data, &step, &cand->inv_vars);
+
+ iv_inv_expr_ent *inv_expr = get_loop_invariant_expr (data, step);
+ /* Share bitmap between inv_vars and inv_exprs for cand.  */
+ if (inv_expr != NULL)
+   {
+ cand->inv_exprs = cand->inv_vars;
+ cand->inv_vars = NULL;
+ if (cand->inv_exprs)
+   bitmap_clear (cand->inv_exprs);
+ else
+   cand->inv_exprs = BITMAP_ALLOC (NULL);
+
+ bitmap_set_bit (cand->inv_exprs, inv_expr->id);
+   }
+   }
 
   if (pos == IP_AFTER_USE || pos == IP_BEFORE_USE)
cand->ainc_use = use;
@@ -5606,6 +5628,7 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca 
*ivs,
   ivs->n_cands--;
   ivs->cand_cost -= cp->cand->cost;
   iv_ca_set_remove_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses);
+  iv_ca_set_remove_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses);
 }
 
   ivs->cand_use_cost -= cp->cost;
@@ -5662,6 +5685,7 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
  ivs->n_cands++;
  ivs->cand_cost += cp->cand->cost;
  iv_ca_set_add_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses);
+ iv_ca_set_add_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses);
}
 
   ivs->cand_use_cost += cp->cost;
@@ -7143,6 +7167,8 @@ free_loop_data (struct ivopts_data *data)
 
   if (cand->inv_vars)
BITMAP_FREE (cand->inv_vars);
+  if (cand->inv_exprs)
+   BITMAP_FREE (cand->inv_exprs);
   free (cand);
 }
   data->vcands.truncate (0);
-- 
1.9.1

[PATCH GCC8][18/33]Relate compare iv_use with all candidates

2017-04-18 Thread Bin Cheng

Hi,
For compare type iv_use, we want to relate it with all candidates in order
to achieve smaller candidate set.  Generally this doesn't hurt compilation
time because compare iv_use is already related with most candidates and
the number of compare iv_uses is also small.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (relate_compare_use_with_all_cands): New.
(find_iv_candidates): Call relate_compare_use_with_all_cands.From 05d21bf9421fb312a3470eb232bf98cd16d072a6 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 6 Apr 2017 10:09:18 +0100
Subject: [PATCH 18/33] relate-comp_use-with-all-cands-20170312.txt

---
 gcc/tree-ssa-loop-ivopts.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 2c6fa76..0f78a46 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5250,6 +5250,21 @@ set_autoinc_for_original_candidates (struct ivopts_data 
*data)
 }
 }
 
+/* Relate compare use with all candidates.  */
+
+static void
+relate_compare_use_with_all_cands (struct ivopts_data *data)
+{
+  unsigned i, max_id = data->vcands.length () - 1;
+  for (i = 0; i < data->vgroups.length (); i++)
+{
+  struct iv_group *group = data->vgroups[i];
+
+  if (group->type == USE_COMPARE)
+   bitmap_set_range (group->related_cands, 0, max_id);
+}
+}
+
 /* Finds the candidates for the induction variables.  */
 
 static void
@@ -5269,6 +5284,10 @@ find_iv_candidates (struct ivopts_data *data)
   /* Record the important candidates.  */
   record_important_candidates (data);
 
+  /* Relate compare iv_use with all candidates.  */
+  if (!data->consider_all_candidates)
+relate_compare_use_with_all_cands (data);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   unsigned i;
-- 
1.9.1

[PATCH GCC8][16/33]Move multiplier_allowed_in_address_p to tree-ssa-address

2017-04-18 Thread Bin Cheng

Hi,
Now function multiplier_allowed_in_address_p is no longer referred in ivopts,
so move to the only file using it and make it static.

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (multiplier_allowed_in_address_p): Move
from ...
* tree-ssa-address.c (multiplier_allowed_in_address_p): ... to here
as local function.  Include necessary header files.
* tree-ssa-loop-ivopts.h (multiplier_allowed_in_address_p): Delete.From d55aefd90a2443eb97a3de5ed411df77cf66d46a Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 14:28:55 +
Subject: [PATCH 16/33] multiplier_allowed_in_address_p-20170220.txt

---
 gcc/tree-ssa-address.c | 58 ++
 gcc/tree-ssa-loop-ivopts.c | 57 -
 gcc/tree-ssa-loop-ivopts.h |  2 --
 3 files changed, 58 insertions(+), 59 deletions(-)

diff --git a/gcc/tree-ssa-address.c b/gcc/tree-ssa-address.c
index e35d323..8aefed6 100644
--- a/gcc/tree-ssa-address.c
+++ b/gcc/tree-ssa-address.c
@@ -28,11 +28,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
+#include "memmodel.h"
 #include "stringpool.h"
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
 #include "expmed.h"
 #include "insn-config.h"
+#include "emit-rtl.h"
 #include "recog.h"
 #include "tree-pretty-print.h"
 #include "fold-const.h"
@@ -537,6 +539,62 @@ add_to_parts (struct mem_address *parts, tree elt)
   parts->base, elt);
 }
 
+/* Returns true if multiplying by RATIO is allowed in an address.  Test the
+   validity for a memory reference accessing memory of mode MODE in address
+   space AS.  */
+
+static bool
+multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
+addr_space_t as)
+{
+#define MAX_RATIO 128
+  unsigned int data_index = (int) as * MAX_MACHINE_MODE + (int) mode;
+  static vec valid_mult_list;
+  sbitmap valid_mult;
+
+  if (data_index >= valid_mult_list.length ())
+valid_mult_list.safe_grow_cleared (data_index + 1);
+
+  valid_mult = valid_mult_list[data_index];
+  if (!valid_mult)
+{
+  machine_mode address_mode = targetm.addr_space.address_mode (as);
+  rtx reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
+  rtx reg2 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 2);
+  rtx addr, scaled;
+  HOST_WIDE_INT i;
+
+  valid_mult = sbitmap_alloc (2 * MAX_RATIO + 1);
+  bitmap_clear (valid_mult);
+  scaled = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX);
+  addr = gen_rtx_fmt_ee (PLUS, address_mode, scaled, reg2);
+  for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
+   {
+ XEXP (scaled, 1) = gen_int_mode (i, address_mode);
+ if (memory_address_addr_space_p (mode, addr, as)
+ || memory_address_addr_space_p (mode, scaled, as))
+   bitmap_set_bit (valid_mult, i + MAX_RATIO);
+   }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "  allowed multipliers:");
+ for (i = -MAX_RATIO; i <= MAX_RATIO; i++)
+   if (bitmap_bit_p (valid_mult, i + MAX_RATIO))
+ fprintf (dump_file, " %d", (int) i);
+ fprintf (dump_file, "\n");
+ fprintf (dump_file, "\n");
+   }
+
+  valid_mult_list[data_index] = valid_mult;
+}
+
+  if (ratio > MAX_RATIO || ratio < -MAX_RATIO)
+return false;
+
+  return bitmap_bit_p (valid_mult, ratio + MAX_RATIO);
+}
+
 /* Finds the most expensive multiplication in ADDR that can be
expressed in an addressing mode and move the corresponding
element(s) to PARTS.  */
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4948a47..c3e9bce 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3905,63 +3905,6 @@ adjust_setup_cost (struct ivopts_data *data, unsigned 
cost,
 return cost;
 }
 
-/* Returns true if multiplying by RATIO is allowed in an address.  Test the
-   validity for a memory reference accessing memory of mode MODE in
-   address space AS.  */
-
-
-bool
-multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
-addr_space_t as)
-{
-#define MAX_RATIO 128
-  unsigned int data_index = (int) as * MAX_MACHINE_MODE + (int) mode;
-  static vec valid_mult_list;
-  sbitmap valid_mult;
-
-  if (data_index >= valid_mult_list.length ())
-valid_mult_list.safe_grow_cleared (data_index + 1);
-
-  valid_mult = valid_mult_list[data_index];
-  if (!valid_mult)
-{
-  machine_mode address_mode = targetm.addr_space.address_mode (as);
-  rtx reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
-  rtx reg2 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 2);
-  rtx addr, scaled;
-  HOST_WIDE_INT i;
-
-  valid_mult = sbitmap_alloc (2 * MAX_RATIO + 1);
-  bitmap_clear (valid_mult);

[PATCH GCC8][15/33]Simplify function autoinc_possible_for_pair

2017-04-18 Thread Bin Cheng

Hi,
This patch simplifies function autoinc_possible_for_pair by deleting 
unnecessary local variables.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (autoinc_possible_for_pair): Simplify.From 09ea8cf6f542ab270ba9994f8491b926fc78237d Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 14:34:54 +
Subject: [PATCH 15/33] autoinc_possible_for_pair-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index c9cf9cf..4948a47 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5232,19 +5232,12 @@ static bool
 autoinc_possible_for_pair (struct ivopts_data *data, struct iv_use *use,
   struct iv_cand *cand)
 {
-  bitmap inv_vars;
-  bool can_autoinc;
-  comp_cost cost;
-
   if (use->type != USE_ADDRESS)
 return false;
 
-  cost = get_computation_cost (data, use, cand, true, &inv_vars,
-  &can_autoinc, NULL);
-
-  BITMAP_FREE (inv_vars);
-
-  return !cost.infinite_cost_p () && can_autoinc;
+  bool can_autoinc = false;
+  get_computation_cost (data, use, cand, true, NULL, &can_autoinc, NULL);
+  return can_autoinc;
 }
 
 /* Examine IP_ORIGINAL candidates to see if they are incremented next to a
-- 
1.9.1

[PATCH GCC8][13/33]Rewrite cost computation of ivopts

2017-04-18 Thread Bin Cheng

Hi,
This is the major part of this patch series.  It rewrites cost computation of 
ivopts using tree affine.
Apart from description given by cover message:
  A) New computation cost model.  Currently, there are big amount code trying 
to understand
 tree expression and estimate its computation cost.  The model is designed 
long ago
 for generic tree expressions.  In order to process generic expression 
(even address
 expression of array/memory references), it has code for too many corner 
cases.  The
 problem is it's somehow impossible to handle all complicated expressions, 
even with
 complicated logic in functions like get_computation_cost_at, 
difference_cost,
 ptr_difference_cost, get_address_cost and so on...  The second problem is 
it's hard
 to keep cost model consistent among special cases.  As special cases being 
added
 from time to time, the model is no long unified any more.  There are cases 
that right
 cost results in bad code, or vice versa, wrong cost results in good code.  
Finally,
 it's difficult to add code for new cases.
 This patch introduces a new cost computation model by using tree affine.  
Tree exprs
 are lowered to aff_tree which is simple arithmetic operation usually.  
Code handling
 special cases is no longer necessary, which brings us quite simplicity.  
It is also
 easier to compute consistent costs among different expressions using tree 
affine,
 which gives us a unified cost model.
This patch also fixes issue that cost computation for address type iv_use is 
inconsistent
with how it is re-rewritten in the end.  It greatly simplified cost computation.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (get_loop_invariant_expr): Simplify.
(adjust_setup_cost): New parameter supporting round up adjustment.
(struct address_cost_data): Delete.
(force_expr_to_var_cost): Don't bound cost with spill_cost.
(split_address_cost, ptr_difference_cost): Delete.
(difference_cost, compare_aff_trees, record_inv_expr): Delete.
(struct ainc_cost_data): New struct.
(get_address_cost_ainc): New function.
(get_address_cost, get_computation_cost): Reimplement.
(determine_group_iv_cost_address): Record inv_expr for all uses of
a group.
(determine_group_iv_cost_cond): Call get_loop_invariant_expr.
(iv_ca_has_deps): Reimplemented to ...
(iv_ca_more_deps): ... this.  Check if NEW_CP introduces more deps
than OLD_CP.
(iv_ca_extend): Call iv_ca_more_deps.From 8eb3665f71ab5449496694a11fa62d20c0c1109c Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 12:45:50 +
Subject: [PATCH 13/33] rewrite-cost-computation-20170225.txt

---
 gcc/tree-ssa-loop-ivopts.c | 1132 +---
 1 file changed, 332 insertions(+), 800 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 556bdc8..6f64d71 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2917,6 +2917,45 @@ find_inv_vars (struct ivopts_data *data, tree *expr_p, 
bitmap *inv_vars)
   walk_tree (expr_p, find_inv_vars_cb, &wdata, NULL);
 }
 
+/* Get entry from invariant expr hash table for INV_EXPR.  New entry
+   will be recorded if it doesn't exist yet.  Given below two exprs:
+ inv_expr + cst1, inv_expr + cst2
+   It's hard to make decision whether constant part should be stripped
+   or not.  We choose to not strip based on below facts:
+ 1) We need to count ADD cost for constant part if it's stripped,
+   which is't always trivial where this functions is called.
+ 2) Stripping constant away may be conflict with following loop
+   invariant hoisting pass.
+ 3) Not stripping constant away results in more invariant exprs,
+   which usually leads to decision preferring lower reg pressure.  */
+
+static iv_inv_expr_ent *
+get_loop_invariant_expr (struct ivopts_data *data, tree inv_expr)
+{
+  STRIP_NOPS (inv_expr);
+
+  if (TREE_CODE (inv_expr) == INTEGER_CST || TREE_CODE (inv_expr) == SSA_NAME)
+return NULL;
+
+  /* Don't strip constant part away as we used to.  */
+
+  /* Stores EXPR in DATA->inv_expr_tab, return pointer to iv_inv_expr_ent.  */
+  struct iv_inv_expr_ent ent;
+  ent.expr = inv_expr;
+  ent.hash = iterative_hash_expr (inv_expr, 0);
+  struct iv_inv_expr_ent **slot = data->inv_expr_tab->find_slot (&ent, INSERT);
+
+  if (!*slot)
+{
+  *slot = XNEW (struct iv_inv_expr_ent);
+  (*slot)->expr = inv_expr;
+  (*slot)->hash = ent.hash;
+  (*slot)->id = ++data->max_inv_expr_id;
+}
+
+  return *slot;
+}
+
 /* Adds a candidate BASE + STEP * i.  Important field is set to IMPORTANT and
position to POS.  If USE is not NULL, the candidate is set as related to
it.  If both BASE and STEP are NULL, we add a pseudocandidate for the
@@ -3848,14 +3887,20 @@ get_computation_at (struct loo

[PATCH GCC8][14/33]Handle more cheap operations in force_expr_to_var_cost

2017-04-18 Thread Bin Cheng

Hi,
This patch handles more cheap cases in function force_expr_to_var_cost, 
specifically,
TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more
operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR
and BIT_NOT_EXPR.From 83045d32b974cb657e1d471c15f67a5b190f2534 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 17 Mar 2017 10:04:29 +
Subject: [PATCH 14/33] cheap-arith_op-in-force_expr-20170225.txt

---
 gcc/tree-ssa-loop-ivopts.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 6f64d71..c9cf9cf 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4087,6 +4087,11 @@ force_expr_to_var_cost (tree expr, bool speed)
 case PLUS_EXPR:
 case MINUS_EXPR:
 case MULT_EXPR:
+case TRUNC_DIV_EXPR:
+case BIT_AND_EXPR:
+case BIT_IOR_EXPR:
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
   op0 = TREE_OPERAND (expr, 0);
   op1 = TREE_OPERAND (expr, 1);
   STRIP_NOPS (op0);
@@ -4095,6 +4100,7 @@ force_expr_to_var_cost (tree expr, bool speed)
 
 CASE_CONVERT:
 case NEGATE_EXPR:
+case BIT_NOT_EXPR:
   op0 = TREE_OPERAND (expr, 0);
   STRIP_NOPS (op0);
   op1 = NULL_TREE;
@@ -4163,6 +4169,23 @@ force_expr_to_var_cost (tree expr, bool speed)
return comp_cost (target_spill_cost [speed], 0);
   break;
 
+case TRUNC_DIV_EXPR:
+  /* Division by power of two is usually cheap, so we allow it.  Forbid
+anything else.  */
+  if (integer_pow2p (TREE_OPERAND (expr, 1)))
+   cost = comp_cost (add_cost (speed, mode), 0);
+  else
+   cost = comp_cost (target_spill_cost[speed], 0);
+  break;
+
+case BIT_AND_EXPR:
+case BIT_IOR_EXPR:
+case BIT_NOT_EXPR:
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
+  cost = comp_cost (add_cost (speed, mode), 0);
+  break;
+
 default:
   gcc_unreachable ();
 }
-- 
1.9.1

[PATCH GCC8][11/33]New interfaces for tree affine

2017-04-18 Thread Bin Cheng

Hi,
This patch adds three simple interfaces for tree affine which will be used in
cost computation later.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-affine.h (aff_combination_type): New interface.
(aff_combination_const_p, aff_combination_simple_p): New interfaces.From 1646cba4cc5576919d6bbaf8bcc99b5ca9b4210a Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 11:55:17 +
Subject: [PATCH 11/33] add-tree-affine-interfaces-20170221.txt

---
 gcc/tree-affine.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/gcc/tree-affine.h b/gcc/tree-affine.h
index b8eb8cc..557e363 100644
--- a/gcc/tree-affine.h
+++ b/gcc/tree-affine.h
@@ -88,6 +88,12 @@ bool aff_comb_cannot_overlap_p (aff_tree *, const widest_int 
&,
 /* Debugging functions.  */
 void debug_aff (aff_tree *);
 
+static inline tree
+aff_combination_type (aff_tree *aff)
+{
+  return aff->type;
+}
+
 /* Return true if AFF is actually ZERO.  */
 static inline bool
 aff_combination_zero_p (aff_tree *aff)
@@ -101,4 +107,35 @@ aff_combination_zero_p (aff_tree *aff)
   return false;
 }
 
+/* Return true if AFF is actually const.  */
+static inline bool
+aff_combination_const_p (aff_tree *aff)
+{
+  if (!aff)
+return true;
+
+  if (aff->n == 0)
+return true;
+
+  return false;
+}
+
+/* Return true if AFF is simple enough.  */
+static inline bool
+aff_combination_simple_p (aff_tree *aff)
+{
+  if (!aff || aff->n == 0)
+return true;
+
+  if (aff->n > 1)
+return false;
+
+  if (aff->offset != 0)
+return false;
+
+  if (aff->elts[0].coef != 1 && aff->elts[0].coef != -1)
+return false;
+
+  return true;
+}
 #endif /* GCC_TREE_AFFINE_H */
-- 
1.9.1

[PATCH GCC8][12/33]Expose interfaces of tree-ssa-address.c

2017-04-18 Thread Bin Cheng

Hi,
This patch exposes several interfaces in tree-ssa-address.c so that they can be 
used
in cost computation for address iv_use.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-address.c (struct mem_address): Move to header file.
(valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
* tree-ssa-address.h (struct mem_address): Move from C file.
(valid_mem_ref_p, move_fixed_address_to_symbol): Declare.From 859b2cfc74bb7a022047008e2b80d6600b50db5f Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 17:43:10 +
Subject: [PATCH 12/33] expose-ssa-address-interface-20170220.txt

---
 gcc/tree-ssa-address.c | 11 ++-
 gcc/tree-ssa-address.h | 10 ++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-address.c b/gcc/tree-ssa-address.c
index 8d46a3e..e35d323 100644
--- a/gcc/tree-ssa-address.c
+++ b/gcc/tree-ssa-address.c
@@ -178,13 +178,6 @@ gen_addr_rtx (machine_mode address_mode,
 *addr = const0_rtx;
 }
 
-/* Description of a memory address.  */
-
-struct mem_address
-{
-  tree symbol, base, index, step, offset;
-};
-
 /* Returns address for TARGET_MEM_REF with parameters given by ADDR
in address space AS.
If REALLY_EXPAND is false, just make fake registers instead
@@ -330,7 +323,7 @@ tree_mem_ref_addr (tree type, tree mem_ref)
 /* Returns true if a memory reference in MODE and with parameters given by
ADDR is valid on the current target.  */
 
-static bool
+bool
 valid_mem_ref_p (machine_mode mode, addr_space_t as,
 struct mem_address *addr)
 {
@@ -408,7 +401,7 @@ fixed_address_object_p (tree obj)
 /* If ADDR contains an address of object that is a link time constant,
move it to PARTS->symbol.  */
 
-static void
+void
 move_fixed_address_to_symbol (struct mem_address *parts, aff_tree *addr)
 {
   unsigned i;
diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
index 311348e..cd62ed9 100644
--- a/gcc/tree-ssa-address.h
+++ b/gcc/tree-ssa-address.h
@@ -20,10 +20,20 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_TREE_SSA_ADDRESS_H
 #define GCC_TREE_SSA_ADDRESS_H
 
+/* Description of a memory address.  */
+
+struct mem_address
+{
+  tree symbol, base, index, step, offset;
+};
+
 extern rtx addr_for_mem_ref (struct mem_address *, addr_space_t, bool);
 extern rtx addr_for_mem_ref (tree exp, addr_space_t as, bool really_expand);
 extern void get_address_description (tree, struct mem_address *);
 extern tree tree_mem_ref_addr (tree, tree);
+extern bool valid_mem_ref_p (machine_mode, addr_space_t, struct mem_address *);
+extern void move_fixed_address_to_symbol (struct mem_address *,
+ struct aff_tree *);
 tree create_mem_ref (gimple_stmt_iterator *, tree,
 struct aff_tree *, tree, tree, tree, bool);
 extern void copy_ref_info (tree, tree);
-- 
1.9.1

[PATCH GCC8][10/33]Clean get_scaled_computation_cost_at and the dump info

2017-04-18 Thread Bin Cheng

Hi,
This patch simplifies function get_scaled_computation_cost_at and the dump 
information.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Delete
parameter cand.  Update dump information.
(get_computation_cost): Update uses.From e31e329139d5d37f0553904635097fac815ae6b4 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 11:24:02 +
Subject: [PATCH 10/33] cost-scaling-dump-info-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4b6eda1..556bdc8 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4783,12 +4783,11 @@ get_loop_invariant_expr (struct ivopts_data *data, tree 
ubase,
 }
 
 /* Scale (multiply) the computed COST (except scratch part that should be
-   hoisted out a loop) by header->frequency / AT->frequency,
-   which makes expected cost more accurate.  */
+   hoisted out a loop) by header->frequency / AT->frequency, which makes
+   expected cost more accurate.  */
 
 static comp_cost
-get_scaled_computation_cost_at (ivopts_data *data, gimple *at, iv_cand *cand,
-   comp_cost cost)
+get_scaled_computation_cost_at (ivopts_data *data, gimple *at, comp_cost cost)
 {
int loop_freq = data->current_loop->header->frequency;
int bb_freq = gimple_bb (at)->frequency;
@@ -4799,9 +4798,9 @@ get_scaled_computation_cost_at (ivopts_data *data, gimple 
*at, iv_cand *cand,
 = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
 
if (dump_file && (dump_flags & TDF_DETAILS))
-fprintf (dump_file, "Scaling iv_use based on cand %d "
+fprintf (dump_file, "Scaling cost based on bb prob "
  "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
- cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
+ 1.0f * bb_freq / loop_freq, cost.cost,
  cost.scratch, scaled_cost, bb_freq, loop_freq);
 
cost.cost = scaled_cost;
@@ -4997,7 +4996,7 @@ get_computation_cost (struct ivopts_data *data, struct 
iv_use *use,
mem_mode,
TYPE_ADDR_SPACE (TREE_TYPE (utype)),
speed, stmt_is_after_inc, can_autoinc);
-  return get_scaled_computation_cost_at (data, at, cand, cost);
+  return get_scaled_computation_cost_at (data, at, cost);
 }
 
   /* Otherwise estimate the costs for computing the expression.  */
@@ -5005,7 +5004,7 @@ get_computation_cost (struct ivopts_data *data, struct 
iv_use *use,
 {
   if (ratio != 1)
cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-  return get_scaled_computation_cost_at (data, at, cand, cost);
+  return get_scaled_computation_cost_at (data, at, cost);
 }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5025,7 +5024,7 @@ get_computation_cost (struct ivopts_data *data, struct 
iv_use *use,
   if (aratio != 1)
 cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
 
-  return get_scaled_computation_cost_at (data, at, cand, cost);
+  return get_scaled_computation_cost_at (data, at, cost);
 
 fallback:
   if (can_autoinc)
@@ -5042,7 +5041,7 @@ fallback:
 
   cost = comp_cost (computation_cost (comp, speed), 0);
 
-  return get_scaled_computation_cost_at (data, at, cand, cost);
+  return get_scaled_computation_cost_at (data, at, cost);
 }
 
 /* Determines cost of computing the use in GROUP with CAND in a generic
-- 
1.9.1

[PATCH GCC8][09/33]Compute separate aff_trees for invariant and induction parts

2017-04-18 Thread Bin Cheng

Hi,
This patch computes and returns separate aff_trees for invariant expression
and induction expression, so that invariant and induction parts can be handled
separately in both cost computation and code generation.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (get_computation_aff_1): New.
(get_computation_aff): Reorder parameters.  Use get_computation_aff_1.
(get_computation_at, rewrite_use_address): Update use of
get_computation_aff.From a6ffd9fc3a2f8c6e3b8764e31ae72c67f896b469 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 11:07:11 +
Subject: [PATCH 09/33] compute-inv-var-affine-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 73 --
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 22c0ea5..4b6eda1 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3684,29 +3684,26 @@ determine_common_wider_type (tree *a, tree *b)
 }
 
 /* Determines the expression by that USE is expressed from induction variable
-   CAND at statement AT in LOOP.  The expression is stored in a decomposed
-   form into AFF.  Returns false if USE cannot be expressed using CAND.  */
+   CAND at statement AT in LOOP.  The expression is stored in two parts in a
+   decomposed form.  The invariant part is stored in AFF_INV; while variant
+   part in AFF_VAR.  Store ratio of CAND.step over USE.step in PRAT if it's
+   non-null.  Returns false if USE cannot be expressed using CAND.  */
 
 static bool
-get_computation_aff (struct loop *loop,
-struct iv_use *use, struct iv_cand *cand, gimple *at,
-struct aff_tree *aff)
-{
-  tree ubase = use->iv->base;
-  tree ustep = use->iv->step;
-  tree cbase = cand->iv->base;
-  tree cstep = cand->iv->step, cstep_common;
+get_computation_aff_1 (struct loop *loop, gimple *at, struct iv_use *use,
+  struct iv_cand *cand, struct aff_tree *aff_inv,
+  struct aff_tree *aff_var, widest_int *prat = NULL)
+{
+  tree ubase = use->iv->base, ustep = use->iv->step;
+  tree cbase = cand->iv->base, cstep = cand->iv->step;
+  tree common_type, uutype, var, cstep_common;
   tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
-  tree common_type, var;
-  tree uutype;
-  aff_tree cbase_aff, var_aff;
+  aff_tree aff_cbase;
   widest_int rat;
 
+  /* We must have a precision to express the values of use.  */
   if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
-{
-  /* We do not have a precision to express the values of use.  */
-  return false;
-}
+return false;
 
   var = var_at_stmt (loop, cand, at);
   uutype = unsigned_type_for (utype);
@@ -3736,8 +3733,8 @@ get_computation_aff (struct loop *loop,
  cstep = inner_step;
}
}
-  cstep = fold_convert (uutype, cstep);
   cbase = fold_convert (uutype, cbase);
+  cstep = fold_convert (uutype, cstep);
   var = fold_convert (uutype, var);
 }
 
@@ -3756,6 +3753,9 @@ get_computation_aff (struct loop *loop,
   else if (!constant_multiple_of (ustep, cstep, &rat))
 return false;
 
+  if (prat)
+*prat = rat;
+
   /* In case both UBASE and CBASE are shortened to UUTYPE from some common
  type, we achieve better folding by computing their difference in this
  wider type, and cast the result to UUTYPE.  We do not need to worry about
@@ -3764,9 +3764,9 @@ get_computation_aff (struct loop *loop,
   common_type = determine_common_wider_type (&ubase, &cbase);
 
   /* use = ubase - ratio * cbase + ratio * var.  */
-  tree_to_aff_combination (ubase, common_type, aff);
-  tree_to_aff_combination (cbase, common_type, &cbase_aff);
-  tree_to_aff_combination (var, uutype, &var_aff);
+  tree_to_aff_combination (ubase, common_type, aff_inv);
+  tree_to_aff_combination (cbase, common_type, &aff_cbase);
+  tree_to_aff_combination (var, uutype, aff_var);
 
   /* We need to shift the value if we are after the increment.  */
   if (stmt_after_increment (loop, cand, at))
@@ -3779,17 +3779,32 @@ get_computation_aff (struct loop *loop,
cstep_common = cstep;
 
   tree_to_aff_combination (cstep_common, common_type, &cstep_aff);
-  aff_combination_add (&cbase_aff, &cstep_aff);
+  aff_combination_add (&aff_cbase, &cstep_aff);
 }
 
-  aff_combination_scale (&cbase_aff, -rat);
-  aff_combination_add (aff, &cbase_aff);
+  aff_combination_scale (&aff_cbase, -rat);
+  aff_combination_add (aff_inv, &aff_cbase);
   if (common_type != uutype)
-aff_combination_convert (aff, uutype);
+aff_combination_convert (aff_inv, uutype);
 
-  aff_combination_scale (&var_aff, rat);
-  aff_combination_add (aff, &var_aff);
+  aff_combination_scale (aff_var, rat);
+  return true;
+}
+
+/* Determines the expression by that USE is expressed from induction variable
+   CAND at statement AT in LOOP.  The expressi

[PATCH GCC8][08/33]Clean get_computation_*interfaces

2017-04-18 Thread Bin Cheng

Hi,
This patch cleans get_computation* interfaces.  Specifically, it removes
get_computation and get_computation_cost_at.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (get_computation_at): Reorder parameters.
(get_computation): Delete.
(get_computation_cost): Implement like get_computation_cost_at.
Use get_computation_at.
(get_computation_cost_at): Delete.
(rewrite_use_nonlinear_expr): Use get_computation_at.
(rewrite_use_compare, remove_unused_ivs): Ditto.From 6d34e0ad6d0ddfc9069f12e43f2fe801d4d65531 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 1 Mar 2017 10:39:18 +
Subject: [PATCH 08/33] clean-get_computation-interface-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 56 --
 1 file changed, 14 insertions(+), 42 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 5ab1d29..22c0ea5 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3819,8 +3819,8 @@ get_use_type (struct iv_use *use)
CAND at statement AT in LOOP.  The computation is unshared.  */
 
 static tree
-get_computation_at (struct loop *loop,
-   struct iv_use *use, struct iv_cand *cand, gimple *at)
+get_computation_at (struct loop *loop, gimple *at,
+   struct iv_use *use, struct iv_cand *cand)
 {
   aff_tree aff;
   tree type = get_use_type (use);
@@ -3831,15 +3831,6 @@ get_computation_at (struct loop *loop,
   return fold_convert (type, aff_combination_to_tree (&aff));
 }
 
-/* Determines the expression by that USE is expressed from induction variable
-   CAND in LOOP.  The computation is unshared.  */
-
-static tree
-get_computation (struct loop *loop, struct iv_use *use, struct iv_cand *cand)
-{
-  return get_computation_at (loop, use, cand, use->stmt);
-}
-
 /* Adjust the cost COST for being in loop setup rather than loop body.
If we're optimizing for space, the loop setup overhead is constant;
if we're optimizing for speed, amortize it over the per-iteration cost.  */
@@ -4807,18 +4798,17 @@ get_scaled_computation_cost_at (ivopts_data *data, 
gimple *at, iv_cand *cand,
 /* Determines the cost of the computation by that USE is expressed
from induction variable CAND.  If ADDRESS_P is true, we just need
to create an address from it, otherwise we want to get it into
-   register.  A set of invariants we depend on is stored in
-   INV_VARS.  AT is the statement at that the value is computed.
+   register.  A set of invariants we depend on is stored in INV_VARS.
If CAN_AUTOINC is nonnull, use it to record whether autoinc
-   addressing is likely.  */
+   addressing is likely.  If INV_EXPR is nonnull, record invariant
+   expr entry in it.  */
 
 static comp_cost
-get_computation_cost_at (struct ivopts_data *data,
-struct iv_use *use, struct iv_cand *cand,
-bool address_p, bitmap *inv_vars, gimple *at,
-bool *can_autoinc,
-iv_inv_expr_ent **inv_expr)
+get_computation_cost (struct ivopts_data *data, struct iv_use *use,
+ struct iv_cand *cand, bool address_p, bitmap *inv_vars,
+ bool *can_autoinc, iv_inv_expr_ent **inv_expr)
 {
+  gimple *at = use->stmt;
   tree ubase = use->iv->base, ustep = use->iv->step;
   tree cbase, cstep;
   tree utype = TREE_TYPE (ubase), ctype;
@@ -5027,7 +5017,7 @@ fallback:
 *can_autoinc = false;
 
   /* Just get the expression, expand it and measure the cost.  */
-  tree comp = get_computation_at (data->current_loop, use, cand, at);
+  tree comp = get_computation_at (data->current_loop, at, use, cand);
 
   if (!comp)
 return infinite_cost;
@@ -5040,24 +5030,6 @@ fallback:
   return get_scaled_computation_cost_at (data, at, cand, cost);
 }
 
-/* Determines the cost of the computation by that USE is expressed
-   from induction variable CAND.  If ADDRESS_P is true, we just need
-   to create an address from it, otherwise we want to get it into
-   register.  A set of invariants we depend on is stored in
-   INV_VARS.  If CAN_AUTOINC is nonnull, use it to record whether
-   autoinc addressing is likely.  */
-
-static comp_cost
-get_computation_cost (struct ivopts_data *data,
- struct iv_use *use, struct iv_cand *cand,
- bool address_p, bitmap *inv_vars,
- bool *can_autoinc, iv_inv_expr_ent **inv_expr)
-{
-  return get_computation_cost_at (data,
- use, cand, address_p, inv_vars, use->stmt,
- can_autoinc, inv_expr);
-}
-
 /* Determines cost of computing the use in GROUP with CAND in a generic
expression.  */
 
@@ -7186,7 +7158,7 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
}
 }
 
-  comp = get_computation (data->current_loop, use, cand);
+  comp = get_computation_at (data->current_loo

[PATCH GCC8][07/33]Offset validity check in address expression

2017-04-18 Thread Bin Cheng

Hi,
For now, we check validity of offset by computing the maximum offset then 
checking if
offset is smaller than the max offset.  This is inaccurate, for example, some 
targets
may require offset to be aligned by power of 2.  This patch introduces new 
interface
checking validity of offset.  It also buffers rtx among different calls.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (compute_max_addr_offset): Delete.
(addr_offset_valid_p): New function.
(split_address_groups): Check offset validity with above function.From fe33bd3fe9a1dbf1a10ae50bfeecc5a9d0b6c759 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 28 Feb 2017 16:19:27 +
Subject: [PATCH 07/33] offset-check-in-group-split-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 103 +
 1 file changed, 38 insertions(+), 65 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 840bde4..5ab1d29 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2460,67 +2460,42 @@ find_interesting_uses_outside (struct ivopts_data 
*data, edge exit)
 }
 }
 
-/* Compute maximum offset of [base + offset] addressing mode
-   for memory reference represented by USE.  */
+/* Return TRUE if OFFSET is within the range of [base + offset] addressing
+   mode for memory reference represented by USE.  */
 
-static HOST_WIDE_INT
-compute_max_addr_offset (struct iv_use *use)
+static bool
+addr_offset_valid_p (struct iv_use *use, HOST_WIDE_INT offset)
 {
-  int width;
   rtx reg, addr;
-  HOST_WIDE_INT i, off;
   unsigned list_index, num;
   addr_space_t as;
   machine_mode mem_mode, addr_mode;
-  static vec max_offset_list;
-
+  auto_vec addr_list;
   as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
   mem_mode = TYPE_MODE (TREE_TYPE (*use->op_p));
 
-  num = max_offset_list.length ();
+  num = addr_list.length ();
   list_index = (unsigned) as * MAX_MACHINE_MODE + (unsigned) mem_mode;
   if (list_index >= num)
 {
-  max_offset_list.safe_grow (list_index + MAX_MACHINE_MODE);
-  for (; num < max_offset_list.length (); num++)
-   max_offset_list[num] = -1;
+  addr_list.safe_grow_cleared (list_index + MAX_MACHINE_MODE);
+  for (; num < addr_list.length (); num++)
+   addr_list[num] = NULL;
 }
 
-  off = max_offset_list[list_index];
-  if (off != -1)
-return off;
-
-  addr_mode = targetm.addr_space.address_mode (as);
-  reg = gen_raw_REG (addr_mode, LAST_VIRTUAL_REGISTER + 1);
-  addr = gen_rtx_fmt_ee (PLUS, addr_mode, reg, NULL_RTX);
-
-  width = GET_MODE_BITSIZE (addr_mode) - 1;
-  if (width > (HOST_BITS_PER_WIDE_INT - 1))
-width = HOST_BITS_PER_WIDE_INT - 1;
-
-  for (i = width; i > 0; i--)
+  addr = addr_list[list_index];
+  if (!addr)
 {
-  off = (HOST_WIDE_INT_1U << i) - 1;
-  XEXP (addr, 1) = gen_int_mode (off, addr_mode);
-  if (memory_address_addr_space_p (mem_mode, addr, as))
-   break;
-
-  /* For some strict-alignment targets, the offset must be naturally
-aligned.  Try an aligned offset if mem_mode is not QImode.  */
-  off = (HOST_WIDE_INT_1U << i);
-  if (off > GET_MODE_SIZE (mem_mode) && mem_mode != QImode)
-   {
- off -= GET_MODE_SIZE (mem_mode);
- XEXP (addr, 1) = gen_int_mode (off, addr_mode);
- if (memory_address_addr_space_p (mem_mode, addr, as))
-   break;
-   }
+  addr_mode = targetm.addr_space.address_mode (as);
+  reg = gen_raw_REG (addr_mode, LAST_VIRTUAL_REGISTER + 1);
+  addr = gen_rtx_fmt_ee (PLUS, addr_mode, reg, NULL_RTX);
+  addr_list[list_index] = addr;
 }
-  if (i == 0)
-off = 0;
+  else
+addr_mode = GET_MODE (addr);
 
-  max_offset_list[list_index] = off;
-  return off;
+  XEXP (addr, 1) = gen_int_mode (offset, addr_mode);
+  return (memory_address_addr_space_p (mem_mode, addr, as));
 }
 
 /* Comparison function to sort group in ascending order of addr_offset.  */
@@ -2599,14 +2574,12 @@ static void
 split_address_groups (struct ivopts_data *data)
 {
   unsigned int i, j;
-  HOST_WIDE_INT max_offset = -1;
-
-  /* Reset max offset to split all small groups.  */
-  if (split_small_address_groups_p (data))
-max_offset = 0;
+  /* Always split group.  */
+  bool split_p = split_small_address_groups_p (data);
 
   for (i = 0; i < data->vgroups.length (); i++)
 {
+  struct iv_group *new_group = NULL;
   struct iv_group *group = data->vgroups[i];
   struct iv_use *use = group->vuses[0];
 
@@ -2615,29 +2588,29 @@ split_address_groups (struct ivopts_data *data)
   if (group->vuses.length () == 1)
continue;
 
-  if (max_offset != 0)
-   max_offset = compute_max_addr_offset (use);
+  gcc_assert (group->type == USE_ADDRESS);
 
-  for (j = 1; j < group->vuses.length (); j++)
+  for (j = 1; j < group->vuses.length ();)
{
  struct iv_use *next = group->vuses[j];
+ HOST_WIDE_INT offset = ne

[PATCH GCC8][06/33]Simple refactor of function rewrite_use_address

2017-04-18 Thread Bin Cheng

Hi,
Simple refactor for function rewrite_use_address.

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (rewrite_use_address): Simple refactor.From 1eea1956061dedd62bcbec119d9febb8a18d2a0d Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 28 Feb 2017 14:54:32 +
Subject: [PATCH 06/33] refactor-rewrite_use_address-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 409a35d..840bde4 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -7361,9 +7361,6 @@ rewrite_use_address (struct ivopts_data *data,
 struct iv_use *use, struct iv_cand *cand)
 {
   aff_tree aff;
-  gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
-  tree base_hint = NULL_TREE;
-  tree ref, iv;
   bool ok;
 
   adjust_iv_update_pos (cand, use);
@@ -7382,17 +7379,18 @@ rewrite_use_address (struct ivopts_data *data,
  based on an object, the base of the reference is in some subexpression
  of the use -- but these will use pointer types, so they are recognized
  by the create_mem_ref heuristics anyway.  */
-  if (cand->iv->base_object)
-base_hint = var_at_stmt (data->current_loop, cand, use->stmt);
-
-  iv = var_at_stmt (data->current_loop, cand, use->stmt);
+  tree iv = var_at_stmt (data->current_loop, cand, use->stmt);
+  tree base_hint = (cand->iv->base_object) ? iv : NULL_TREE;
+  gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
   tree type = TREE_TYPE (*use->op_p);
   unsigned int align = get_object_alignment (*use->op_p);
   if (align != TYPE_ALIGN (type))
 type = build_aligned_type (type, align);
-  ref = create_mem_ref (&bsi, type, &aff,
-   reference_alias_ptr_type (*use->op_p),
-   iv, base_hint, data->speed);
+
+  tree ref = create_mem_ref (&bsi, type, &aff,
+reference_alias_ptr_type (*use->op_p),
+iv, base_hint, data->speed);
+
   copy_ref_info (ref, *use->op_p);
   *use->op_p = ref;
 }
-- 
1.9.1

[PATCH GCC8][05/33]Count invariant and candidate separately

2017-04-18 Thread Bin Cheng

Hi,
Simple refactor counting invariant (both variables and expressions) and 
induction variables separately.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct iv_ca): Rename n_regs to n_invs.
(ivopts_global_cost_for_size): Rename parameter and update uses.
(iv_ca_recount_cost): Update uses.
(iv_ca_set_remove_invs, iv_ca_set_no_cp): Record invariants and
candidates seperately in n_invs and n_cands.
(iv_ca_set_add_invs, iv_ca_set_cp, iv_ca_new): Ditto.From 3918161fb0dd5092a9eca3b08907493607abbda2 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 28 Feb 2017 14:25:10 +
Subject: [PATCH 05/33] count-inv-cand-separately-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 3a5b1b9..409a35d 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -612,8 +612,9 @@ struct iv_ca
   /* The number of candidates in the set.  */
   unsigned n_cands;
 
-  /* Total number of registers needed.  */
-  unsigned n_regs;
+  /* The number of invariants needed, including both invariant variants and
+ invariant expressions.  */
+  unsigned n_invs;
 
   /* Total cost of expressing uses.  */
   comp_cost cand_use_cost;
@@ -5989,15 +5990,17 @@ determine_iv_costs (struct ivopts_data *data)
 fprintf (dump_file, "\n");
 }
 
-/* Calculates cost for having SIZE induction variables.  */
+/* Calculates cost for having N_REGS registers.  This number includes
+   induction variables, invariant variables and invariant expressions.  */
 
 static unsigned
-ivopts_global_cost_for_size (struct ivopts_data *data, unsigned size)
+ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
 {
-  /* We add size to the cost, so that we prefer eliminating ivs
- if possible.  */
-  return size + estimate_reg_pressure_cost (size, data->regs_used, data->speed,
-   data->body_includes_call);
+  unsigned cost = estimate_reg_pressure_cost (n_regs,
+ data->regs_used, data->speed,
+ data->body_includes_call);
+  /* Add n_regs to the cost, so that we prefer eliminating ivs if possible.  */
+  return n_regs + cost;
 }
 
 /* For each size of the induction variable set determine the penalty.  */
@@ -6100,9 +6103,7 @@ iv_ca_recount_cost (struct ivopts_data *data, struct 
iv_ca *ivs)
   comp_cost cost = ivs->cand_use_cost;
 
   cost += ivs->cand_cost;
-
-  cost += ivopts_global_cost_for_size (data, ivs->n_regs);
-
+  cost += ivopts_global_cost_for_size (data, ivs->n_invs + ivs->n_cands);
   ivs->cost = cost;
 }
 
@@ -6123,7 +6124,7 @@ iv_ca_set_remove_invs (struct iv_ca *ivs, bitmap invs, 
unsigned *n_inv_uses)
 {
   n_inv_uses[iid]--;
   if (n_inv_uses[iid] == 0)
-   ivs->n_regs--;
+   ivs->n_invs--;
 }
 }
 
@@ -6148,10 +6149,8 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca 
*ivs,
   if (ivs->n_cand_uses[cid] == 0)
 {
   bitmap_clear_bit (ivs->cands, cid);
-  ivs->n_regs--;
   ivs->n_cands--;
   ivs->cand_cost -= cp->cand->cost;
-
   iv_ca_set_remove_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses);
 }
 
@@ -6178,7 +6177,7 @@ iv_ca_set_add_invs (struct iv_ca *ivs, bitmap invs, 
unsigned *n_inv_uses)
 {
   n_inv_uses[iid]++;
   if (n_inv_uses[iid] == 1)
-   ivs->n_regs++;
+   ivs->n_invs++;
 }
 }
 
@@ -6206,10 +6205,8 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca 
*ivs,
   if (ivs->n_cand_uses[cid] == 1)
{
  bitmap_set_bit (ivs->cands, cid);
- ivs->n_regs++;
  ivs->n_cands++;
  ivs->cand_cost += cp->cand->cost;
-
  iv_ca_set_add_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses);
}
 
@@ -6421,7 +6418,7 @@ iv_ca_new (struct ivopts_data *data)
   nw->n_cand_uses = XCNEWVEC (unsigned, data->vcands.length ());
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
-  nw->n_regs = 0;
+  nw->n_invs = 0;
   nw->cand_use_cost = no_cost;
   nw->cand_cost = 0;
   nw->n_inv_var_uses = XCNEWVEC (unsigned, data->max_inv_var_id + 1);
-- 
1.9.1

[PATCH GCC8][04/33]Single interface finding invariant variables

2017-04-18 Thread Bin Cheng

Hi,
This patch refactors interface finding invariant variables.  Now customers
only need to call find_inv_vars, rather than set global variable fd_ivopts_data
then call walk_tree.
Is it OK?

Thanks,
bin

2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct walk_tree_data): New.
(find_inv_vars_cb): New.
(find_depends): Renamed to ...
(find_inv_vars): ... this.
(add_candidate_1, force_var_cost): Call find_inv_vars.
(split_address_cost, determine_group_iv_cost_cond): Ditto.From d3d0df5f794f83ab03edced03e268ff635b95ec9 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 28 Feb 2017 14:12:37 +
Subject: [PATCH 04/33] refactor-find-inv-variables-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 66 --
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index f9914e0..3a5b1b9 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2896,30 +2896,53 @@ generic_type_for (tree type)
   return unsigned_type_for (type);
 }
 
-/* Records invariants in *EXPR_P.  Callback for walk_tree.  DATA contains
-   the bitmap to that we should store it.  */
+/* Private data for walk_tree.  */
+
+struct walk_tree_data
+{
+  bitmap *inv_vars;
+  struct ivopts_data *idata;
+};
+
+/* Callback function for walk_tree, it records invariants and symbol
+   reference in *EXPR_P.  DATA is the structure storing result info.  */
 
-static struct ivopts_data *fd_ivopts_data;
 static tree
-find_depends (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
+find_inv_vars_cb (tree *expr_p, int *ws ATTRIBUTE_UNUSED, void *data)
 {
-  bitmap *inv_vars = (bitmap *) data;
+  struct walk_tree_data *wdata = (struct walk_tree_data*) data;
   struct version_info *info;
 
   if (TREE_CODE (*expr_p) != SSA_NAME)
 return NULL_TREE;
-  info = name_info (fd_ivopts_data, *expr_p);
 
+  info = name_info (wdata->idata, *expr_p);
   if (!info->inv_id || info->has_nonlin_use)
 return NULL_TREE;
 
-  if (!*inv_vars)
-*inv_vars = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (*inv_vars, info->inv_id);
+  if (!*wdata->inv_vars)
+*wdata->inv_vars = BITMAP_ALLOC (NULL);
+  bitmap_set_bit (*wdata->inv_vars, info->inv_id);
 
   return NULL_TREE;
 }
 
+/* Records invariants in *EXPR_P.  INV_VARS is the bitmap to that we should
+   store it.  */
+
+static inline void
+find_inv_vars (struct ivopts_data *data, tree *expr_p, bitmap *inv_vars)
+{
+  struct walk_tree_data wdata;
+
+  if (!inv_vars)
+return;
+
+  wdata.idata = data;
+  wdata.inv_vars = inv_vars;
+  walk_tree (expr_p, find_inv_vars_cb, &wdata, NULL);
+}
+
 /* Adds a candidate BASE + STEP * i.  Important field is set to IMPORTANT and
position to POS.  If USE is not NULL, the candidate is set as related to
it.  If both BASE and STEP are NULL, we add a pseudocandidate for the
@@ -2996,10 +3019,7 @@ add_candidate_1 (struct ivopts_data *data,
   data->vcands.safe_push (cand);
 
   if (TREE_CODE (step) != INTEGER_CST)
-   {
- fd_ivopts_data = data;
- walk_tree (&step, find_depends, &cand->inv_vars, NULL);
-   }
+   find_inv_vars (data, &step, &cand->inv_vars);
 
   if (pos == IP_AFTER_USE || pos == IP_BEFORE_USE)
cand->ainc_use = use;
@@ -4486,15 +4506,12 @@ force_expr_to_var_cost (tree expr, bool speed)
invariants the computation depends on.  */
 
 static comp_cost
-force_var_cost (struct ivopts_data *data,
-   tree expr, bitmap *inv_vars)
+force_var_cost (struct ivopts_data *data, tree expr, bitmap *inv_vars)
 {
-  if (inv_vars)
-{
-  fd_ivopts_data = data;
-  walk_tree (&expr, find_depends, inv_vars, NULL);
-}
+  if (!expr)
+return no_cost;
 
+  find_inv_vars (data, &expr, inv_vars);
   return force_expr_to_var_cost (expr, data->speed);
 }
 
@@ -4525,10 +4542,7 @@ split_address_cost (struct ivopts_data *data,
 {
   *symbol_present = false;
   *var_present = true;
-  fd_ivopts_data = data;
-  if (inv_vars)
-   walk_tree (&addr, find_depends, inv_vars, NULL);
-
+  find_inv_vars (data, &addr, inv_vars);
   return comp_cost (target_spill_cost[data->speed], 0);
 }
 
@@ -5624,8 +5638,8 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   express_cost = get_computation_cost (data, use, cand, false,
   &inv_vars_express, NULL,
   &inv_expr_express);
-  fd_ivopts_data = data;
-  walk_tree (&cmp_iv->base, find_depends, &inv_vars_express, NULL);
+  if (cmp_iv != NULL)
+find_inv_vars (data, &cmp_iv->base, &inv_vars_express);
 
   /* Count the cost of the original bound as well.  */
   bound_cost = force_var_cost (data, *bound_cst, NULL);
-- 
1.9.1

[PATCH GCC8][03/33]Refactor invariant variable/expression handling

2017-04-18 Thread Bin Cheng

Hi,
This patch refactors how invariant variable/expressions are handled.  Now they 
are
recorded in the same kind data structure and handled similarly, which makes code
easier to understand.

Is it OK?

Thanks,
bin

2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct cost_pair): Rename depends_on to
inv_vars.  Add inv_exprs.
(struct iv_cand): Rename depends_on to inv_vars.
(struct ivopts_data): Rename max_inv_id/n_invariant_uses to
max_inv_var_id/n_inv_var_uses.  Move max_inv_expr_id around.
Refactor field used_inv_exprs from has_map to array n_inv_expr_uses.
(dump_cand): Dump inv_vars.
(tree_ssa_iv_optimize_init): Support inv_vars and inv_exprs.
(record_invariant, find_depends, add_candidate_1): Ditto.
(set_group_iv_cost, force_var_cost): Ditto.
(split_address_cost, ptr_difference_cost, difference_cost): Ditto.
(get_computation_cost_at, get_computation_cost): Ditto.
(determine_group_iv_cost_generic): Ditto.
(determine_group_iv_cost_address): Ditto.
(determine_group_iv_cost_cond, autoinc_possible_for_pair): Ditto.
(determine_group_iv_costs): Ditto.
(iv_ca_recount_cost): Update call to ivopts_global_cost_for_size.
(iv_ca_set_remove_invariants): Renamed to ...
(iv_ca_set_remove_invs): ... this.  Support inv_vars and inv_exprs.
(iv_ca_set_no_cp): Use iv_ca_set_remove_invs.
(iv_ca_set_add_invariants):  Renamed to ...
(iv_ca_set_add_invs): ... this.  Support inv_vars and inv_exprs.
(iv_ca_set_cp): Use iv_ca_set_add_invs.
(iv_ca_has_deps): Support inv_vars and inv_exprs.
(iv_ca_new, iv_ca_free, iv_ca_dump, free_loop_data): Ditto.
(create_new_ivs): Remove useless dump.

gcc/testsuite/ChangeLog
2017-04-11  Bin Cheng  

* g++.dg/tree-ssa/ivopts-3.C: Adjust test string.From f9b58925e95869ef1fd22d06cb976db4caf818a3 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 28 Feb 2017 13:01:43 +
Subject: [PATCH 03/33] refactor-invariant-var-expr-20170225.txt

---
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
 gcc/tree-ssa-loop-ivopts.c   | 337 ---
 2 files changed, 176 insertions(+), 163 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index eb72581..07ff1b7 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -72,4 +72,4 @@ int main ( int , char** ) {
 
 // Verify that on x86_64 and i?86 we use a single IV for the innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64, 3 avg niters, 1 expressions, 1 IVs" "ivopts" { target x86_64-*-* 
i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64, 3 avg niters, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9312849..f9914e0 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -347,8 +347,9 @@ struct cost_pair
   struct iv_cand *cand;/* The candidate.  */
   comp_cost cost;  /* The cost.  */
   enum tree_code comp; /* For iv elimination, the comparison.  */
-  bitmap depends_on;   /* The list of invariants that have to be
+  bitmap inv_vars; /* The list of invariants that have to be
   preserved.  */
+  bitmap inv_exprs;/* Loop invariant expressions.  */
   tree value;  /* For final value elimination, the expression for
   the final value of the iv.  For iv elimination,
   the new bound to compare with.  */
@@ -418,7 +419,7 @@ struct iv_cand
   unsigned cost_step;  /* Cost of the candidate's increment operation.  */
   struct iv_use *ainc_use; /* For IP_{BEFORE,AFTER}_USE candidates, the place
  where it is incremented.  */
-  bitmap depends_on;   /* The list of invariants that are used in step of the
+  bitmap inv_vars; /* The list of invariants that are used in step of the
   biv.  */
   struct iv *orig_iv;  /* The original iv if this cand is added from biv with
   smaller type.  */
@@ -542,9 +543,6 @@ struct ivopts_data
  by ivopt.  */
   hash_table *inv_expr_tab;
 
-  /* Loop invariant expression id.  */
-  int max_inv_expr_id;
-
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -566,8 +564,11 @@ struct ivopts_data
   /* The common candidates.  */
   vec iv_common_cands;
 
-  /* The maximum invariant id.  */
-  unsigned max_inv_id;
+  /* The maximum invariant variable id.  */
+  unsigned max_inv_var_id;
+
+  /* The maximum invariant expression id.  */
+  unsigned max_inv_expr_id;
 
   /* Number of no_overflow BIVs which are not used in memory address.  */
   unsigned bivs_not_used_in_addr

[PATCH GCC8][02/33]Remove code handling pseudo candidate

2017-04-18 Thread Bin Cheng

Hi,
We don't have pseudo candidate nowadays, so remove any related code.

Is it OK?

Thanks,
bin

2017-04-11  Bin Cheng  

* tree-ssa-loop-ivopts.c (get_computation_cost_at): Remove pseudo
iv_cand code.
(determine_group_iv_cost_cond, determine_iv_cost): Ditto.
(iv_ca_set_no_cp, create_new_iv): Ditto.From a6aa9abdb16a5465431ef81c9d5dbc204c31c4c9 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 27 Feb 2017 17:22:51 +
Subject: [PATCH 02/33] remove-pseudo-iv_cand-20170220.txt

---
 gcc/tree-ssa-loop-ivopts.c | 23 ---
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4fc35fa..9312849 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4845,10 +4845,6 @@ get_computation_cost_at (struct ivopts_data *data,
   if (depends_on)
 *depends_on = NULL;
 
-  /* Only consider real candidates.  */
-  if (!cand->iv)
-return infinite_cost;
-
   cbase = cand->iv->base;
   cstep = cand->iv->step;
   ctype = TREE_TYPE (cbase);
@@ -5568,8 +5564,6 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   enum tree_code comp = ERROR_MARK;
   struct iv_use *use = group->vuses[0];
 
-  gcc_assert (cand->iv);
-
   /* Try iv elimination.  */
   if (may_eliminate_iv (data, use, cand, &bound, &comp))
 {
@@ -5898,11 +5892,7 @@ determine_iv_cost (struct ivopts_data *data, struct 
iv_cand *cand)
   unsigned cost, cost_step;
   tree base;
 
-  if (!cand->iv)
-{
-  cand->cost = 0;
-  return;
-}
+  gcc_assert (cand->iv != NULL);
 
   /* There are two costs associated with the candidate -- its increment
  and its initialization.  The second is almost negligible for any loop
@@ -6123,9 +6113,7 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca 
*ivs,
   if (ivs->n_cand_uses[cid] == 0)
 {
   bitmap_clear_bit (ivs->cands, cid);
-  /* Do not count the pseudocandidates.  */
-  if (cp->cand->iv)
-   ivs->n_regs--;
+  ivs->n_regs--;
   ivs->n_cands--;
   ivs->cand_cost -= cp->cand->cost;
 
@@ -6189,9 +6177,7 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
   if (ivs->n_cand_uses[cid] == 1)
{
  bitmap_set_bit (ivs->cands, cid);
- /* Do not count the pseudocandidates.  */
- if (cp->cand->iv)
-   ivs->n_regs++;
+ ivs->n_regs++;
  ivs->n_cands++;
  ivs->cand_cost += cp->cand->cost;
 
@@ -7076,8 +7062,7 @@ create_new_iv (struct ivopts_data *data, struct iv_cand 
*cand)
   struct iv_group *group;
   bool after = false;
 
-  if (!cand->iv)
-return;
+  gcc_assert (cand->iv != NULL);
 
   switch (cand->pos)
 {
-- 
1.9.1

[PATCH GCC8][01/33]Handle TRUNCATE between tieable modes in rtx_cost

2017-04-18 Thread Bin Cheng

Hi,
This patch series rewrites parts of IVOPTs.  The change consists of below 
described parts:
  A) New cost computation model.  Currently, there are big amount code trying 
to understand
 tree expression and estimate its computation cost.  The model is designed 
long ago
 for generic tree expressions.  In order to process generic expression 
(even address
 expression of array/memory references), it has code for too many corner 
cases.  The
 problem is it's somehow impossible to handle all complicated expressions, 
even with
 complicated logic in functions like get_computation_cost_at, 
difference_cost,
 ptr_difference_cost, get_address_cost and so on...  The second problem is 
it's hard
 to keep cost model consistent among special cases.  As special cases being 
added
 from time to time, the model is no long unified any more.  There are cases 
that right
 cost results in bad code, or vice versa, wrong cost results in good code.  
Finally,
 it's also difficult to add code for new cases.
 This patch introduces a new cost computation model by using tree affine.  
Tree exprs
 are lowered to aff_tree which is simple arithmetic operation usually.  
Code handling
 special cases is no longer necessary, which brings us quite simplicity.  
It is also
 easier to compute consistent costs among different expressions using tree 
affine,
 which gives us a unified cost model.
 This change is implemented in [PATCH rewrite-cost-computation-*.txt].
  B) In rewriting both nonlinear iv_use and address iv_use, current code does 
bad association
 by mixing computation of invariant and induction.  This introduces 
inconsistency
 between cost computation and code generation because costs of invariant 
and induction
 are computed separately.  This also prevents loop inv from being hoisted 
out of loop.
 This change fixes the issue by re-associating invariant and induction 
parts separately
 for both nonlinear and address iv_use.
 This patch is implemented in two patches:
 [PATCH nonlinear-iv_use-rewrite-*.txt]
 [PATCH address-iv_use-rewrite-*.txt]
  C) Current implementation shares the same register pressure computation with 
RTL loop
 inv pass.  It has difficulty in handling (especially large) loop nest, and 
quite
 often generating too many candidates (especially for outer loops).  This 
change
 introduces new register pressure estimation.  The brief idea is to 
differentiate
 (hot) innermost loop and outer loop.  for (possibly hot) innermost loop, 
more registers
 are allowed as long as overall register pressure is within the range of 
number of
 target available registers.
 This change is implemented in below patches:
 [PATCH record-newly-used-inv_var-*.txt]
 [PATCH skip-non_int-phi-reg-pressure-*.txt]
 [PATCH ivopt-reg_pressure-model-*.txt]
  D) Other small refactors and improvements.  These will be described in each 
patch's review
 message.
  E) Patches allow better induction variable optimizations for vectorized 
loops.  These
 patches are blocked at the moment because current IVOPTs implementation 
can generate
 worse code on targets with limited addressing mode support.
 [PATCH range_info-for-vect_loop-niters-*.txt]
 [PATCH pr69710-*.txt]

As a bonus, issues like PR53090/PR71361 are now fixed with better code 
generation than what
the two PRs were expecting.

I collected spec2k6 data on my local AArch64 and X86_64 machines.  Overall FP 
is improved
+1% on both machines; while INT mainly remains neutral.  I think part of 
improvement comes
from IVOPTs itself, and rest of it comes from opportunities enabled as 
described by E).  Also It
would be great if other targets can run some benchmarks with this patch series 
in case of any
performance breakage.

The patch series is bootstrap and test on X86_64 and AArch64, no real 
regression found,
though some tests do need further adjustment.

As the start, this is the first patch of the series.  It simply handles 
TRUNCATE between
tieable modes in rtx_cost.  Since we don't need additional instruction for such 
truncate,
it simply return 0 cost.

Is it OK?

Thanks,
bin

2017-04-11  Bin Cheng  

* rtlanal.c (rtx_cost): Handle TRUNCATE between tieable modes.From d9b17e5d303d5fb1c75f489753b4578f8c907453 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 27 Feb 2017 14:51:56 +
Subject: [PATCH 01/33] no_cost-for-tieable-type-truncate-20170220.txt

---
 gcc/rtlanal.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index acb4230..6019c3e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4146,6 +4146,14 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
outer_code,
return COSTS_N_INSNS (2 + factor);
   break;
 
+case TRUNCATE:
+  /* If we can tie these modes, make this cheap.  */
+  if (MODES_TIEABLE_P (mode, GET_MODE (SUBREG_REG (x
+   {
+

Re: [PATCH], Fix PR/target 80099 (internal error with -mno-upper-regs-sf)

2017-04-18 Thread Segher Boessenkool

On Sat, Apr 15, 2017 at 01:43:18AM -0400, Michael Meissner wrote:
> You are right, and my patch was too complicated.  All I needed to do was 
> remove
> the upper register checks.  In looking at it, since the insn before being 
> split
> has both register and memory versions, if the register allocator can't 
> allocate
> a register, it will push the value on to the stack, and adjust the address 
> with
> the variable index and do a load.  Performance with the store and load, likely
> will not be ideal, but it should work.
> 
> Because of the interactions with the debug switches -mno-upper-regs-, I
> decided to add tests for all of the variable extract built-ins with each of 
> the
> no-upper regs switches.
> 
> I've tested this on a little endian power8 system and it bootstrapped and ran
> make check with no regressions.  Is it ok for the trunk?

Much simpler indeed :-)  Looks fine to me, please commit.  Thanks,


Segher


> 2017-04-15  Michael Meissner  
> 
>   PR target/80099
>   * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Eliminate
>   unneeded test for TARGET_UPPER_REGS_SF.
>   * config/rs6000/vsx.md (vsx_extract_v4sf_var): Likewise.
> 
> [gcc/testsuite]
> 2017-04-15  Michael Meissner  
> 
>   PR target/80099
>   * gcc.target/powerpc/pr80099-1.c: New test.
>   * gcc.target/powerpc/pr80099-2.c: Likewise.
>   * gcc.target/powerpc/pr80099-3.c: Likewise.
>   * gcc.target/powerpc/pr80099-4.c: Likewise.
>   * gcc.target/powerpc/pr80099-5.c: Likewise.

Re: [PATCH] IRA: Don't create new regs for debug insns (PR80429)

2017-04-18 Thread Jakub Jelinek

On Tue, Apr 18, 2017 at 09:30:19AM +, Segher Boessenkool wrote:
> In split_live_ranges_for_shrink_wrap IRA also splits regs that are
> only used in debug insns, leading to -fcompare-debug failures.
> 
> Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  This happens
> on at least GCC 5, so not a regression; but it is trivial and obvious,
> is it okay for trunk anyway?

I think the first question here is if it is beneficial to replace the
regs in DEBUG_INSNs or not.  If it is beneficial, then it of course
has to be changed such that it never does
newreg = ira_create_new_reg (dest);
just because it sees DEBUG_INSN uses, but otherwise it should do what it
does now.

So one option is e.g. to split that loop into two, one doing analysis,
i.e. that newreg = ira_create_new_reg (dest);, but not
validate_change (uin, DF_REF_REAL_LOC (use), newreg, true);
and this loop would also ignore DEBUG_INSN_P.
And the other loop, with the same content, just without that newreg = ...
and with the validate_change, as whole guarded with if (newreg),
replacing stuff on all insns.

Or another way to write this is have the loop as currently is, with
  if (DEBUG_INSN_P (uin) && (!newreg || debug_insns_seen_p))
{
  debug_insns_seen_p = true;
  continue;
}
and then another loop done only if (debug_insns_seen_p && newreg)
that would only adjust DEBUG_INSNs.

Jakub

[PATCH] IRA: Don't create new regs for debug insns (PR80429)

2017-04-18 Thread Segher Boessenkool

In split_live_ranges_for_shrink_wrap IRA also splits regs that are
only used in debug insns, leading to -fcompare-debug failures.

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  This happens
on at least GCC 5, so not a regression; but it is trivial and obvious,
is it okay for trunk anyway?


Segher


2017-04-18  Segher Boessenkool  

PR rtl-optimization/80429
* ira.c (split_live_ranges_for_shrink_wrap): Skip DEBUG_INSNs.

---
 gcc/ira.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/ira.c b/gcc/ira.c
index 7079573..1f760fe 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -4999,6 +4999,9 @@ split_live_ranges_for_shrink_wrap (void)
  rtx_insn *uin = DF_REF_INSN (use);
  next = DF_REF_NEXT_REG (use);
 
+ if (DEBUG_INSN_P (uin))
+   continue;
+
  basic_block ubb = BLOCK_FOR_INSN (uin);
  if (ubb == call_dom
  || dominated_by_p (CDI_DOMINATORS, ubb, call_dom))
-- 
1.9.3

81 matches

Mail list logo