Fix false negatives of ODR type checking on non-aggregates

2015-04-24 Thread Jan Hubicka
Hi,
this patch makes us to diagnoze correctly ODR violations such as:
$ cat tt.C  
enum test {val1};   
enum test a;
$ cat tt2.C 
enum test {val1, val2}; 
enum test b;
$ ./xgcc -B ./ -O2 -flto tt.C tt2.C 
tt.C:1:6: warning: type 'test' violates one definition rule [-Wodr] 
 enum test {val1};  
  ^ 
tt2.C:1:6: note: an enum with mismatching number of values is defined in 
another translation unit
 enum test {val1, val2};
  ^ 

OR

$ cat t.C
char a;
$ ./xgcc -B ./ -O2 t.C -o t1.o -fno-signed-char -c -flto
$ ./xgcc -B ./ -O2 t.C -o t2.o -fsigned-char -c -flto
$ ./xgcc -B ./ -O2 t1.o t2.o -flto -fno-signed-char  -flto
: warning: type �char� violates one definition rule [-Wodr]
: note: a type with different signedness is defined in another 
translation unit
t.C:1:6: warning: type of �a� does not match original declaration
 char a;
  ^
t.C:1:6: note: previously declared here
 char a;
  ^

The char signedness differences happen in Firefox.

Previously we would miss this because tree.c saves mangled name only for
aggregates.  This was useful optimization at a time we used the mangled name
only for type inhertiance graph construction.  I measured its effect at a time
Richard suggested it and the difference was sub 1% on memory use/LTO file sizes
that is hopefully acceptable. (most named types are class types, enums are quite
rare and integer types adds just constant number of new manglings).

One case we still do not handle well in all cases is mismatch in bulitin
types, such as -fsigned-char -fno-signed-char and/or -malign-double.  This is
caused by preloading these types and I guess it comes from bigger problem that
those types gets merged despite their differences.

Bootstrapped/regtested x86_64-linux, will commit it later today.

Honza

* tree.c (need_assembler_name_p): Compute mangled name for
non-builtin types.
* ipa-devirt.c (odr_subtypes_equivalent_p): Non-aggregate types
may match unnamed to named.
Index: tree.c
===
--- tree.c  (revision 222391)
+++ tree.c  (working copy)
@@ -5138,7 +5138,17 @@ need_assembler_name_p (tree decl)
   && DECL_NAME (decl)
   && decl == TYPE_NAME (TREE_TYPE (decl))
   && !is_lang_specific (TREE_TYPE (decl))
-  && AGGREGATE_TYPE_P (TREE_TYPE (decl))
+  /* Save some work. Names of builtin types are always derived from
+properties of its main variant.  A special case are integer types
+where mangling do make differences between char/signed char/unsigned
+char etc.  Storing name for these makes e.g.
+-fno-signed-char/-fsigned-char mismatches to be handled well.
+
+See cp/mangle.c:write_builtin_type for details.  */
+  && (TREE_CODE (TREE_TYPE (decl)) != VOID_TYPE
+ && TREE_CODE (TREE_TYPE (decl)) != BOOLEAN_TYPE
+ && TREE_CODE (TREE_TYPE (decl)) != REAL_TYPE
+ && TREE_CODE (TREE_TYPE (decl)) != FIXED_POINT_TYPE)
   && !TYPE_ARTIFICIAL (TREE_TYPE (decl))
   && !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)
   && !type_in_anonymous_namespace_p (TREE_TYPE (decl)))
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 222391)
+++ ipa-devirt.c(working copy)
@@ -675,7 +675,8 @@ odr_subtypes_equivalent_p (tree t1, tree
  have to be compared structurally.  */
   if (TREE_CODE (t1) != TREE_CODE (t2))
 return false;
-  if ((TYPE_NAME (t1) == NULL_TREE) != (TYPE_NAME (t2) == NULL_TREE))
+  if (AGGREGATE_TYPE_P (t1)
+  && (TYPE_NAME (t1) == NULL_TREE) != (TYPE_NAME (t2) == NULL_TREE))
 return false;
 
   type_pair pair={t1,t2};


[ARM] Fix RTX cost for vector SET

2015-04-24 Thread Kugan
> 
> Thanks for the review. I have updated the patch based on the comments
> with some other minor changes. Bootstrapped and regression tested on
> aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?
> 
> 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2015-04-24  Kugan Vivekanandarajah  
>   Jim Wilson  
> 
>   * config/arm/aarch-common-protos.h (struct mem_cost_table): Added
>   new  fields loadv and storev.
>   * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
>   Initialize loadv and storev.
>   * config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
>   (cortexa53_extra_costs): Likewise.
>   (cortexa57_extra_costs): Likewise.
>   (xgene1_extra_costs): Likewise.
>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
>   rtx_costs.
> 

Due to the struct mem_cost_table update for aarch64, arm cost tables
also need to be updated. Please find the patch that does this.
Regression tested on arm-none-linux-gnu with no-new regressions. Is this
OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2015-04-25  Kugan Vivekanandarajah  

* config/arm/arm.c (cortexa9_extra_costs): Initialize loadv and
 storev.
(cortexa8_extra_costs): Likewise.
(cortexa5_extra_costs): Likewise.
(cortexa7_extra_costs): Likewise.
(cortexa12_extra_costs): Likewise.
(cortexa15_extra_costs): Likewise.
(v7m_extra_costs): Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6826c78..d43239a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1027,7 +1027,9 @@ const struct cpu_cost_table cortexa9_extra_costs =
 2, /* stm_regs_per_insn_subsequent.  */
 COSTS_N_INSNS (1), /* storef.  */
 COSTS_N_INSNS (1), /* stored.  */
-COSTS_N_INSNS (1)  /* store_unaligned.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */
@@ -1128,7 +1130,9 @@ const struct cpu_cost_table cortexa8_extra_costs =
 2, /* stm_regs_per_insn_subsequent.  */
 COSTS_N_INSNS (1), /* storef.  */
 COSTS_N_INSNS (1), /* stored.  */
-COSTS_N_INSNS (1)  /* store_unaligned.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */
@@ -1230,7 +1234,9 @@ const struct cpu_cost_table cortexa5_extra_costs =
 2, /* stm_regs_per_insn_subsequent.  */
 COSTS_N_INSNS (2), /* storef.  */
 COSTS_N_INSNS (2), /* stored.  */
-COSTS_N_INSNS (1)  /* store_unaligned.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */
@@ -1333,7 +1339,9 @@ const struct cpu_cost_table cortexa7_extra_costs =
 2, /* stm_regs_per_insn_subsequent.  */
 COSTS_N_INSNS (2), /* storef.  */
 COSTS_N_INSNS (2), /* stored.  */
-COSTS_N_INSNS (1)  /* store_unaligned.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */
@@ -1434,7 +1442,9 @@ const struct cpu_cost_table cortexa12_extra_costs =
 2, /* stm_regs_per_insn_subsequent.  */
 COSTS_N_INSNS (2), /* storef.  */
 COSTS_N_INSNS (2), /* stored.  */
-0  /* store_unaligned.  */
+0, /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */
@@ -1535,7 +1545,9 @@ const struct cpu_cost_table cortexa15_extra_costs =
 2, /* stm_regs_per_insn_subsequent.  */
 0, /* storef.  */
 0, /* stored.  */
-0  /* store_unaligned.  */
+0, /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */
@@ -1636,7 +1648,9 @@ const struct cpu_cost_table v7m_extra_costs =
 1, /* stm_regs_per_insn_subsequent.  */
 COSTS_N_INSNS (2), /* storef.  */
 COSTS_N_INSNS (3), /* stored.  */
-COSTS_N_INSNS (1)  /* store_unaligned.  */
+COSTS_N_INSNS (1),  /* store_unaligned.  */
+COSTS_N_INSNS (1), /* loadv.  */
+COSTS_N_INSNS (1)  /* storev.  */
   },
   {
 /* FP SFmode */


Re: [debug-early] fix problem with C_TYPE_INCOMPLETE_VARS and TYPE_VFIELD overloading

2015-04-24 Thread Richard Henderson
On 04/24/2015 10:52 AM, Aldy Hernandez wrote:
> In the debug-early work we call dwarf2out early from 
> rest_of_decl_compilation. 
> Dwarf2out, via gen_struct_or_union_type_die(), will eventually look at
> TYPE_VFIELD, which is currently being overloaded by the C front-end to keep
> incomplete variables.
> 
> Nobody should be looking at the type too in depth if it's incomplete, but in
> this case, the type has just been laid out (layout_decl) so it's considered
> complete, just as we're about to iterate through C_TYPE_INCOMPLETE_VARS and 
> fix
> things up.
> 
> To fix my dwarf problem, I've just cached C_TYPE_INCOMPLETE_VARS and
> immediately clear it, as it was going to be cleared after anyhow.

The patch is ok.


r~

> 
> Attached is what I'm committing to the branch, but ideally y'all^Wyall
> front-end folks should use some language specific node.  Nothing was obvious 
> in
> tree-core.h, as most front-end specific things are flags (booleans), so I've
> left this as an exercise to the front-end groupie. That being said, if you
> violently oppose this solution, I'd be more than happy to entertain another
> (hopefully simple) approach.
> 
> Aldy
> 
> p.s. I wonder how many things are being overloaded by the front-end that are
> being looked at by dwarf2out incorrectly.  Well, nothing that triggers a gdb
> regression



Re: [AArch64][PR65375] Fix RTX cost for vector SET

2015-04-24 Thread Kugan

On 21/04/15 06:22, James Greenhalgh wrote:
> On Fri, Apr 17, 2015 at 12:19:14PM +0100, Kugan wrote:
 My point is that adding your patch while keeping the logic at the top
 which claims to catch ALL vector operations makes for less readable
 code.

 At the very least you'll need to update this comment:

   /* TODO: The cost infrastructure currently does not handle
  vector operations.  Assume that all vector operations
  are equally expensive.  */

 to make it clear that this doesn't catch vector set operations.

 But fixing the comment doesn't improve the messy code so I'd certainly
 prefer to see one of the other approaches which have been discussed.
>>>
>>> I see your point. Let me work on this based on your suggestions above.
>>
>> Hi James,
>>
>> Here is an attempt along this line. Is this what you have in mind?
>> Trying to keep functionality as before so that we can tune the
>> parameters later. Not fully tested yet.
> 
> Hi Kugan,
> 
> Sorry to have dropped out of the thread for a while, I'm currently
> travelling in the US.
> 
> This is along the lines of what I had in mind, thanks for digging through
> and doing it. It needs a little polishing, just neaten up the rough edges
> of comments and where they sit next to the new if conditionals, and of course,
> testing, and I have a few comments below.
> 
> Thanks,
> James
> 
>> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
>> b/gcc/config/aarch64/aarch64-cost-tables.h
>> index ae2b547..ed9432e 100644
>> --- a/gcc/config/aarch64/aarch64-cost-tables.h
>> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
>> @@ -121,7 +121,9 @@ const struct cpu_cost_table thunderx_extra_costs =
>>},
>>/* Vector */
>>{
>> -COSTS_N_INSNS (1)   /* Alu.  */
>> +COSTS_N_INSNS (1),  /* Alu.  */
>> +COSTS_N_INSNS (1),  /* Load.  */
>> +COSTS_N_INSNS (1)   /* Store.  */
>>}
>>  };
> 
> Can you push the Load/Stores in to the LD/ST section above and give
> them a name like loadv/storev.
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index cba3c1a..c2d4a53 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
> 
> 
> 
>> @@ -5570,6 +5569,7 @@ aarch64_rtx_costs (rtx x, int code, int outer 
>> ATTRIBUTE_UNUSED,
>>&& (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
>>>= INTVAL (XEXP (op0, 1
>>  op1 = XEXP (op1, 0);
>> +  gcc_assert (!VECTOR_MODE_P (mode));
> 
> As Kyrill asked, please drop this.



Thanks for the review. I have updated the patch based on the comments
with some other minor changes. Bootstrapped and regression tested on
aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?


Thanks,
Kugan


gcc/ChangeLog:

2015-04-24  Kugan Vivekanandarajah  
Jim Wilson  

* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
new  fields loadv and storev.
* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
Initialize loadv and storev.
* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
(cortexa53_extra_costs): Likewise.
(cortexa57_extra_costs): Likewise.
(xgene1_extra_costs): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
rtx_costs.
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index ae2b547..939125c 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -83,7 +83,9 @@ const struct cpu_cost_table thunderx_extra_costs =
 0, /* N/A: Stm_regs_per_insn_subsequent.  */
 0, /* Storef.  */
 0, /* Stored.  */
-COSTS_N_INSNS (1)  /* Store_unaligned.  */
+COSTS_N_INSNS (1), /* Store_unaligned.  */
+COSTS_N_INSNS (1), /* Loadv.  */
+COSTS_N_INSNS (1)  /* Storev.  */
   },
   {
 /* FP SFmode */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..13425fc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5499,16 +5499,6 @@ aarch64_rtx_costs (rtx x, int code, int outer 
ATTRIBUTE_UNUSED,
  above this default.  */
   *cost = COSTS_N_INSNS (1);
 
-  /* TODO: The cost infrastructure currently does not handle
- vector operations.  Assume that all vector operations
- are equally expensive.  */
-  if (VECTOR_MODE_P (mode))
-{
-  if (speed)
-   *cost += extra_cost->vect.alu;
-  return true;
-}
-
   switch (code)
 {
 case SET:
@@ -5523,7 +5513,9 @@ aarch64_rtx_costs (rtx x, int code, int outer 
ATTRIBUTE_UNUSED,
  if (speed)
{
  rtx address = XEXP (op0, 0);
- if (GET_MODE_CLASS (mode) == MODE_INT)
+ if (VECTOR_MODE_P (mode))
+   *cost += extra_cost->ldst.storev;
+

Re: RFA (stor-layout): PATCH for c++/65734 (attribute aligned and templates)

2015-04-24 Thread Jakub Jelinek
On Fri, Apr 24, 2015 at 06:34:28PM -0400, Jason Merrill wrote:
> On 04/20/2015 10:35 AM, Jakub Jelinek wrote:
> >Wonder what will happen if finalize_type_size or fixup_attribute_variants
> >is called on a type variant with TYPE_USER_ALIGN before it is called
> >on the TYPE_MAIN_VARIANT; I'd guess that in that case all the variants
> >including the main variant would be marked as TYPE_USER_ALIGN and might have
> >incorrect TYPE_ALIGN.
> 
> Good point.  Changing layout_type to always work on the TYPE_MAIN_VARIANT
> passes testing:

LGTM.

> commit 0f520c7c862aa3c8850c3d3c024d19e4b8f1a757
> Author: Jason Merrill 
> Date:   Fri Apr 10 18:13:56 2015 -0400
> 
>   PR c++/65734
> gcc/
>   * stor-layout.c (layout_type): Layout the TYPE_MAIN_VARIANT.
>   (finalize_type_size): Respect TYPE_USER_ALIGN.
>   (layout_type) [ARRAY_TYPE]: Likewise.
> gcc/cp/
>   * class.c (fixup_attribute_variants): Respect TYPE_USER_ALIGN.

Jakub


Re: RFA (stor-layout): PATCH for c++/65734 (attribute aligned and templates)

2015-04-24 Thread Jason Merrill

On 04/20/2015 10:35 AM, Jakub Jelinek wrote:

Wonder what will happen if finalize_type_size or fixup_attribute_variants
is called on a type variant with TYPE_USER_ALIGN before it is called
on the TYPE_MAIN_VARIANT; I'd guess that in that case all the variants
including the main variant would be marked as TYPE_USER_ALIGN and might have
incorrect TYPE_ALIGN.


Good point.  Changing layout_type to always work on the 
TYPE_MAIN_VARIANT passes testing:




commit 0f520c7c862aa3c8850c3d3c024d19e4b8f1a757
Author: Jason Merrill 
Date:   Fri Apr 10 18:13:56 2015 -0400

	PR c++/65734
gcc/
	* stor-layout.c (layout_type): Layout the TYPE_MAIN_VARIANT.
	(finalize_type_size): Respect TYPE_USER_ALIGN.
	(layout_type) [ARRAY_TYPE]: Likewise.
gcc/cp/
	* class.c (fixup_attribute_variants): Respect TYPE_USER_ALIGN.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index d80d312e..8103e60 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1989,14 +1989,23 @@ fixup_attribute_variants (tree t)
   if (!t)
 return;
 
+  tree attrs = TYPE_ATTRIBUTES (t);
+  unsigned align = TYPE_ALIGN (t);
+  bool user_align = TYPE_USER_ALIGN (t);
+
   for (variants = TYPE_NEXT_VARIANT (t);
variants;
variants = TYPE_NEXT_VARIANT (variants))
 {
   /* These are the two fields that check_qualified_type looks at and
 	 are affected by attributes.  */
-  TYPE_ATTRIBUTES (variants) = TYPE_ATTRIBUTES (t);
-  TYPE_ALIGN (variants) = TYPE_ALIGN (t);
+  TYPE_ATTRIBUTES (variants) = attrs;
+  unsigned valign = align;
+  if (TYPE_USER_ALIGN (variants))
+	valign = MAX (valign, TYPE_ALIGN (variants));
+  else
+	TYPE_USER_ALIGN (variants) = user_align;
+  TYPE_ALIGN (variants) = valign;
 }
 }
 
diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index f18f1ac..5bc8a29 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -1831,9 +1831,13 @@ finalize_type_size (tree type)
 	{
 	  TYPE_SIZE (variant) = size;
 	  TYPE_SIZE_UNIT (variant) = size_unit;
-	  TYPE_ALIGN (variant) = align;
+	  unsigned valign = align;
+	  if (TYPE_USER_ALIGN (variant))
+	valign = MAX (valign, TYPE_ALIGN (variant));
+	  else
+	TYPE_USER_ALIGN (variant) = user_align;
+	  TYPE_ALIGN (variant) = valign;
 	  TYPE_PRECISION (variant) = precision;
-	  TYPE_USER_ALIGN (variant) = user_align;
 	  SET_TYPE_MODE (variant, mode);
 	}
 }
@@ -2154,6 +2158,10 @@ layout_type (tree type)
   if (type == error_mark_node)
 return;
 
+  /* We don't want finalize_type_size to copy an alignment attribute to
+ variants that don't have it.  */
+  type = TYPE_MAIN_VARIANT (type);
+
   /* Do nothing if type has been laid out before.  */
   if (TYPE_SIZE (type))
 return;
@@ -2350,13 +2358,17 @@ layout_type (tree type)
 	/* Now round the alignment and size,
 	   using machine-dependent criteria if any.  */
 
+	unsigned align = TYPE_ALIGN (element);
+	if (TYPE_USER_ALIGN (type))
+	  align = MAX (align, TYPE_ALIGN (type));
+	else
+	  TYPE_USER_ALIGN (type) = TYPE_USER_ALIGN (element);
 #ifdef ROUND_TYPE_ALIGN
-	TYPE_ALIGN (type)
-	  = ROUND_TYPE_ALIGN (type, TYPE_ALIGN (element), BITS_PER_UNIT);
+	align = ROUND_TYPE_ALIGN (type, align, BITS_PER_UNIT);
 #else
-	TYPE_ALIGN (type) = MAX (TYPE_ALIGN (element), BITS_PER_UNIT);
+	align = MAX (align, BITS_PER_UNIT);
 #endif
-	TYPE_USER_ALIGN (type) = TYPE_USER_ALIGN (element);
+	TYPE_ALIGN (type) = align;
 	SET_TYPE_MODE (type, BLKmode);
 	if (TYPE_SIZE (type) != 0
 	&& ! targetm.member_type_forces_blk (type, VOIDmode)
diff --git a/gcc/testsuite/g++.dg/cpp0x/alignas1.C b/gcc/testsuite/g++.dg/cpp0x/alignas1.C
new file mode 100644
index 000..d73c875
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alignas1.C
@@ -0,0 +1,16 @@
+// PR c++/65734
+// { dg-do compile { target c++11 } }
+
+template  struct A
+{
+  T t;
+};
+
+typedef A T[4] alignas (2 * alignof (int));
+A a[4];
+
+typedef A T2[4] alignas (2 * alignof (int));
+
+#define SA(X) static_assert((X),#X)
+SA(alignof (T) == 2 * alignof(int));
+SA(alignof (T2) == 2 * alignof(int));
diff --git a/gcc/testsuite/g++.dg/cpp0x/alignas2.C b/gcc/testsuite/g++.dg/cpp0x/alignas2.C
new file mode 100644
index 000..2e7d051
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alignas2.C
@@ -0,0 +1,20 @@
+// PR c++/65734
+// { dg-do compile { target c++11 } }
+
+template 
+struct BVector
+{
+  T t;
+};
+BVector m;
+
+template  class T>
+struct BV2
+{
+  typedef T value_type alignas (16);
+  value_type v;
+};
+BV2 m2;
+
+#define SA(X) static_assert((X),#X)
+SA(alignof (BV2::value_type) == 16);


libgo patch committed: Compile go-main with -fPIC

2015-04-24 Thread Ian Lance Taylor
PR 65616 points out that you can't use gccgo to build a PIE because
go-main.c is not compiled with -fPIC.  This patch fixes that.  I could
have used -fPIE, but -fPIC is essentially the same here and seems more
flexible.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and GCC 5 branch.

Ian
diff -r c713e818f342 libgo/Makefile.am
--- a/libgo/Makefile.am Fri Apr 17 14:58:05 2015 -0700
+++ b/libgo/Makefile.am Fri Apr 24 09:40:23 2015 -0700
@@ -2032,6 +2032,10 @@
 libgobegin_llgo_a_SOURCES = \
runtime/go-main.c
 
+# Use -fPIC for libgobegin so that it can be put in a PIE.
+libgobegin_a_CFLAGS = $(AM_CFLAGS) -fPIC
+libgobegin_llgo_a_CFLAGS = $(AM_CFLAGS) -fPIC
+
 libnetgo_a_SOURCES = $(go_netgo_files)
 libnetgo_a_LIBADD = netgo.o
 


Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

2015-04-24 Thread Martin Sebor

On 04/24/2015 03:31 PM, Martin Sebor wrote:

On 04/24/2015 10:27 AM, Marek Polacek wrote:

On Thu, Apr 23, 2015 at 09:11:39PM -0600, Martin Sebor wrote:

I wonder if the tests where the left shift operands are both
constants really do invoke undefined behavior in GCC. For
example, AFAICS, in (-1 << 0) and other constant expressions
gcc computes the shift in unsigned HOST_WIDE_INT which is well
defined.


Yeah, two INTEGER_CSTs are computed in wide-int using uhwi.  But -1 << 0
certainly is UB according to ISO C/C++ and I think we should follow the
standards.


It seems the warning would be more valuable (and less likely
dismissed as overly pedantic) if it was issued when the second
operand was not constant and the computation had to be done in
hardware (or even better, in hardware not known to  use the
same instructions for positive and negative operands).


What I've tried to do in the patch was to mimic the other two Wshift-*
warnings.  While the new warning triggered a few times in the
testsuite, GCC
bootstrap itself was clean.  You raise a very good point though, we don't
want to be overly pedantic.

I suppose we could go with the patch as-is, and if users complain too
much,
warn only if the second operand is not constant or so.  But I hope that
left-shifting a negative value is not a common thing.


The warning would also be valuable in some sort of a portability
mode (with -pedantic?) where the code is intended to be portable
to implementations that don't provide well-defined semantics for
left shifts of negative values.


I really think -Wshift-negative-value should be kin to
-Wshift-count-negative
and -Wshift-count-overflow, those are enabled by default.


There's a significant difference between the reasons why
the behavior of the left shift is undefined when the left
operand is negative vs when the right operand is, and
between the results of such expressions computed by GCC
and common hardware.

When the right operand is negative the operation simply
has no meaning (some languages define the operation as
shifting right by the inverse number of bits but that's
by no means universal). In practice, the right operand
is sometimes limited by the hardware to a small non-
negative value (e.g., 32 for the i386 shll instruction)
so there's no way to shift a value by a negative number
of bits or by more than there are bits in the first
operand. As a result, GCC can compute a different result
than common hardware. For example, it substitutes 0 for
the result of 1 << -1 while x86 computes INT_MIN)

In contrast, shifting a negative value by a positive number
of bits does have a natural meaning (i.e., shifting the bit
pattern the same way as unsigned). The reason why it's
undefined in C and C++ is because some processors don't
shift the sign bit out and may raise an overflow when
a one bit is shifted into the sign position (typically
those that provide an arithmetic left shift). But most
processors implement a logical left shift and behave
the same way for signed operands as for unsigned.

The result of a left shift of a negative number computed
by GCC matches that of hardware that doesn't differentiate
between arithmetic and logical left shifts (which is all
the common CPUs, including ARM, MIPS, PowerPC, x86), so
the only value in diagnosing it is portability to rare
CPUs or to compilers that behave differently than GCC
(if there are any).

I checked Clang to see what it does. It generates the
same code as GCC but only issues a warning for negative
left shifts. With -Wpedantic, it does give a similar
warning for left shifts of negative values.


Actually, I had misread the diagnostic. Clang only warns
for invalid shift counts and doesn't warn for negative
operands even with -Wpedantic.


I recommend
GCC to do the same.

Martin




[debug-early] make gen_typedef_die gracefully handle error marks

2015-04-24 Thread Aldy Hernandez
Now that we call dwarf2out earlier in the compilation process, we may 
get some error_mark_node's.  And of course, nobody likes an ICE...


Committed to branch.
commit 92437112d3d4966eecb31df59a6fa4a1014a198c
Author: Aldy Hernandez 
Date:   Fri Apr 24 14:24:09 2015 -0700

Make gen_typedef_die handle a type of error_mark_node.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 39046f5..11bbade 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -20617,6 +20617,9 @@ gen_typedef_die (tree decl, dw_die_ref context_die)
{
  type = DECL_ORIGINAL_TYPE (decl);
 
+ if (type == error_mark_node)
+   return;
+
  gcc_assert (type != TREE_TYPE (decl));
  equate_type_number_to_die (TREE_TYPE (decl), type_die);
}
@@ -20624,6 +20627,9 @@ gen_typedef_die (tree decl, dw_die_ref context_die)
{
  type = TREE_TYPE (decl);
 
+ if (type == error_mark_node)
+   return;
+
  if (is_naming_typedef_decl (TYPE_NAME (type)))
{
  /* Here, we are in the case of decl being a typedef naming


Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

2015-04-24 Thread Martin Sebor

On 04/24/2015 10:27 AM, Marek Polacek wrote:

On Thu, Apr 23, 2015 at 09:11:39PM -0600, Martin Sebor wrote:

I wonder if the tests where the left shift operands are both
constants really do invoke undefined behavior in GCC. For
example, AFAICS, in (-1 << 0) and other constant expressions
gcc computes the shift in unsigned HOST_WIDE_INT which is well
defined.


Yeah, two INTEGER_CSTs are computed in wide-int using uhwi.  But -1 << 0
certainly is UB according to ISO C/C++ and I think we should follow the
standards.


It seems the warning would be more valuable (and less likely
dismissed as overly pedantic) if it was issued when the second
operand was not constant and the computation had to be done in
hardware (or even better, in hardware not known to  use the
same instructions for positive and negative operands).


What I've tried to do in the patch was to mimic the other two Wshift-*
warnings.  While the new warning triggered a few times in the testsuite, GCC
bootstrap itself was clean.  You raise a very good point though, we don't
want to be overly pedantic.

I suppose we could go with the patch as-is, and if users complain too much,
warn only if the second operand is not constant or so.  But I hope that
left-shifting a negative value is not a common thing.


The warning would also be valuable in some sort of a portability
mode (with -pedantic?) where the code is intended to be portable
to implementations that don't provide well-defined semantics for
left shifts of negative values.


I really think -Wshift-negative-value should be kin to -Wshift-count-negative
and -Wshift-count-overflow, those are enabled by default.


There's a significant difference between the reasons why
the behavior of the left shift is undefined when the left
operand is negative vs when the right operand is, and
between the results of such expressions computed by GCC
and common hardware.

When the right operand is negative the operation simply
has no meaning (some languages define the operation as
shifting right by the inverse number of bits but that's
by no means universal). In practice, the right operand
is sometimes limited by the hardware to a small non-
negative value (e.g., 32 for the i386 shll instruction)
so there's no way to shift a value by a negative number
of bits or by more than there are bits in the first
operand. As a result, GCC can compute a different result
than common hardware. For example, it substitutes 0 for
the result of 1 << -1 while x86 computes INT_MIN)

In contrast, shifting a negative value by a positive number
of bits does have a natural meaning (i.e., shifting the bit
pattern the same way as unsigned). The reason why it's
undefined in C and C++ is because some processors don't
shift the sign bit out and may raise an overflow when
a one bit is shifted into the sign position (typically
those that provide an arithmetic left shift). But most
processors implement a logical left shift and behave
the same way for signed operands as for unsigned.

The result of a left shift of a negative number computed
by GCC matches that of hardware that doesn't differentiate
between arithmetic and logical left shifts (which is all
the common CPUs, including ARM, MIPS, PowerPC, x86), so
the only value in diagnosing it is portability to rare
CPUs or to compilers that behave differently than GCC
(if there are any).

I checked Clang to see what it does. It generates the
same code as GCC but only issues a warning for negative
left shifts. With -Wpedantic, it does give a similar
warning for left shifts of negative values. I recommend
GCC to do the same.

Martin


[debug-early] make DW_AT_inline of 0 equivalent to no inline in check_die_inline()

2015-04-24 Thread Aldy Hernandez

GCC regression hunting..  one more down.

Committed to branch.
commit fe086e62a450538efc06c064530bfd564496d6a6
Author: Aldy Hernandez 
Date:   Fri Apr 24 14:22:58 2015 -0700

Relax condition in check_die_inline so that DW_AT_inline of 0 is
equivalent to no inline.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index f4e6858..39046f5 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -5737,7 +5737,7 @@ check_die_inline (dw_die_ref die)
   dw_attr_ref a;
   bool inline_found = false;
   FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
-if (a->dw_attr == DW_AT_inline)
+if (a->dw_attr == DW_AT_inline && a->dw_attr_val.v.val_unsigned)
   inline_found = true;
   if (inline_found)
 {


[debug-early] fix problem with C_TYPE_INCOMPLETE_VARS and TYPE_VFIELD overloading

2015-04-24 Thread Aldy Hernandez
In the debug-early work we call dwarf2out early from 
rest_of_decl_compilation.  Dwarf2out, via 
gen_struct_or_union_type_die(), will eventually look at TYPE_VFIELD, 
which is currently being overloaded by the C front-end to keep 
incomplete variables.


Nobody should be looking at the type too in depth if it's incomplete, 
but in this case, the type has just been laid out (layout_decl) so it's 
considered complete, just as we're about to iterate through 
C_TYPE_INCOMPLETE_VARS and fix things up.


To fix my dwarf problem, I've just cached C_TYPE_INCOMPLETE_VARS and 
immediately clear it, as it was going to be cleared after anyhow.


Attached is what I'm committing to the branch, but ideally y'all^Wyall 
front-end folks should use some language specific node.  Nothing was 
obvious in tree-core.h, as most front-end specific things are flags 
(booleans), so I've left this as an exercise to the front-end groupie. 
That being said, if you violently oppose this solution, I'd be more than 
happy to entertain another (hopefully simple) approach.


Aldy

p.s. I wonder how many things are being overloaded by the front-end that 
are being looked at by dwarf2out incorrectly.  Well, nothing that 
triggers a gdb regression
commit 0fa38f203619300ed1ea92418bc6dbabd1115ac9
Author: Aldy Hernandez 
Date:   Fri Apr 24 13:41:36 2015 -0700

Cache and clear C_TYPE_INCOMPLETE_VARS before iterating through its
members to avoid an overload problem within dwarf2out.

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 846e13b..688c055 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -7827,10 +7827,18 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
 }
 
   /* If this structure or union completes the type of any previous
- variable declaration, lay it out and output its rtl.  */
-  for (x = C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t));
-   x;
-   x = TREE_CHAIN (x))
+ variable declaration, lay it out and output its rtl.
+
+ Note: C_TYPE_INCOMPLETE_VARS overloads TYPE_VFIELD which is used
+ in dwarf2out via rest_of_decl_compilation below and means
+ something totally different.  Since we will be clearing
+ C_TYPE_INCOMPLETE_VARS shortly after we iterate through them,
+ clear it ahead of time and avoid problems in dwarf2out.  Ideally,
+ C_TYPE_INCOMPLETE_VARS should use some language specific
+ node.  */
+  tree incomplete_vars = C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t));
+  C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t)) = 0;
+  for (x = incomplete_vars; x; x = TREE_CHAIN (x))
 {
   tree decl = TREE_VALUE (x);
   if (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
@@ -7843,7 +7851,6 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
  rest_of_decl_compilation (decl, toplevel, 0);
}
 }
-  C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t)) = 0;
 
   /* Update type location to the one of the definition, instead of e.g.
  a forward declaration.  */


Re: [PATCH] Properly valueize values we value-number to

2015-04-24 Thread Jeff Law

On 02/17/2015 07:58 AM, Richard Biener wrote:
[ Restarting a old thread... ]


On a closer look the record_const_or_copy_1 hunk is redundant
(record_equality is really a bit obfuscated...).

Agreed.  I'm not entirely sure how it got to this point.


And record_equality is where the SSA_NAME_VALUEs might end up
forming chains?  At least because we might record a SSA_NAME_VALUE
for sth that might be the target of a SSA_NAME_VALUE, as in

  a_1 = b_2;  SSA_NAME_VALUE (a_1) == b_2;
  if (b_2 == i_3(D))
... temporarily SSA_NAME_VALUE (b_2) == i_3(D)

thus under if (b_2 == i_3(D)) SSA_NAME_VALUE (a_1) == b_2 and
SSA_NAME_VALUE (SSA_NAME_VALUE (a_1)) == i_3(D)?  I guess we
can't easily avoid that because we don't keep track of a
reverse mapping (nor would we want to update that).
Right.  In general, the fact that we're in SSA form means that we ought 
not get loops in the SSA_NAME_VALUE chain since there's a single static 
assignment and we'll visit it once.


That nice model breaks down in two ways.  First we derive equivalences 
from equality conditions like you've shown above.  Second, when 
threading we can thread through a loop latch and start reprocessing a 
block.  The interaction between those two can result in loops in the 
value chain.


What I'm hoping to do in GCC6 is eliminate all this stuff with a 
threader that is independent of the dominator walk (and optimizer). 
Instead of walking forward through the dominator tree recording key 
nuggets, then removing those nuggets as we pop back up the tree, we 
instead we start with the conditional and walk up the use-def chains, 
simplifying as we go, with the goal of simplifying to a constant, of course.


You can see the beginnings of that with the FSM bits from Sebastian.

Obviously until those bits are in place, we should try to clean up any 
warts in the current implementation.




Btw, in record_equality, the == part of the <= check for the loop
depth will just randomly swap x and y while we should prefer
IL canonical order.  If we can't rely on the input being in IL
canonical order we should prepend the whole function with

  if (tree_swap_operands_p (x, y, false))
std::swap (x, y);

and change <= to < for the loop-depth check.

Agreed.  Makes perfect sense.

jeff



[PATCH] Fix VRP update_value_range and caller (PR tree-optimization/65875)

2015-04-24 Thread Jakub Jelinek
Hi!

In vrp_visit_assignment_or_call we try to return SSA_PROP_VARYING
if update_value_range returned true and the new range is VR_VARYING,
but vrp_visit_phi_node fails to do that.
Another thing is that if update_value_range decides to
set_value_range_to_varying (old_vr);
it doesn't update new_vr, so the caller doesn't know the new vr
is varying (and calling get_value_range again sounds unnecessary).

The following patch fixes it, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk and 5.2?

2015-04-24  Jakub Jelinek  

PR tree-optimization/65875
* tree-vrp.c (update_value_range): If in is_new case setting
old_vr to VR_VARYING, also set new_vr to it.
(vrp_visit_phi_node): Return SSA_PROP_VARYING instead of
SSA_PROP_INTERESTING if update_value_range returned true,
but new range is VR_VARYING.

* gcc.c-torture/compile/pr65875.c: New test.

--- gcc/tree-vrp.c.jj   2015-04-20 14:35:39.0 +0200
+++ gcc/tree-vrp.c  2015-04-24 18:10:41.321367440 +0200
@@ -892,7 +892,12 @@ update_value_range (const_tree var, valu
 UNDEFINED or from VARYING.  */
   if (new_vr->type == VR_UNDEFINED
  || old_vr->type == VR_VARYING)
-   set_value_range_to_varying (old_vr);
+   {
+ BITMAP_FREE (new_vr->equiv);
+ set_value_range_to_varying (old_vr);
+ set_value_range_to_varying (new_vr);
+ return true;
+   }
   else
set_value_range (old_vr, new_vr->type, new_vr->min, new_vr->max,
 new_vr->equiv);
@@ -8941,6 +8946,9 @@ update_range:
  fprintf (dump_file, "\n");
}
 
+  if (vr_result.type == VR_VARYING)
+   return SSA_PROP_VARYING;
+
   return SSA_PROP_INTERESTING;
 }
 
--- gcc/testsuite/gcc.c-torture/compile/pr65875.c.jj2015-04-24 
18:20:47.650595581 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr65875.c   2015-04-24 
18:20:30.0 +0200
@@ -0,0 +1,24 @@
+/* PR tree-optimization/65875 */
+
+int a, b, c, d, e, f, g;
+
+void
+foo (void)
+{
+  long h = 0, i;
+  if (g < 0)
+i = -g;
+  for (; b;)
+for (; c;)
+  if (e)
+   h = 1;
+  for (; f;)
+if (a)
+  break;
+  if (h > i)
+while (h > i)
+  {
+   d = 0;
+   h--;
+  }
+}

Jakub


Re: [PATCH, combine] Try REG_EQUAL for nonzero_bits

2015-04-24 Thread Jeff Law

On 02/09/2015 07:00 PM, Thomas Preud'homme wrote:

And this is part 2.


From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Eric Botcazou

Once this is done, the same thing needs to be applied to XEXP
(reg_equal, 0)
before it is sent to nonzero_bits.



- /* Don't call nonzero_bits if it cannot change anything.  */
- if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0)
-   rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode);
  num = num_sign_bit_copies (SET_SRC (set), GET_MODE (x));
  if (rsp->sign_bit_copies == 0

  || rsp->sign_bit_copies > num)

rsp->sign_bit_copies = num;
+
+ /* Don't call nonzero_bits if it cannot change anything.  */
+ if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0)
+   update_rsp_from_reg_equal (rsp, insn, src, x);


Can't we improve on this?  rsp->sign_bit_copies is modified both here
and in
update_rsp_from_reg_equal, but rsp->nonzero_bits is modified only in
the
latter function.  There is no reason for this discrepancy, so they ought to
be
handled the same way, either entirely here or entirely in the function.


So I moved all the handling inside the new function and also added a check
before calling num_sign_bit_copies about whether it would give any more
information to be consistent with what is done for nonzero_bits.

ChangeLog entry is as follows:

2015-02-09  Thomas Preud'homme  

 * combine.c i(set_nonzero_bits_and_sign_copies): Split code updating
 rsp->sign_bit_copies and rsp->nonzero_bits into ...
 (update_rsp_from_reg_equal): This.  Also use REG_EQUAL note on src if
 present to get more accurate information about the number of sign bit
 copies and non zero bits.
Do you have a testcase where this change can result in better generated 
code.  If so please add that testcase.  It's OK if it's ARM specific.


OK with a testcase.

jeff



Re: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits

2015-04-24 Thread Jeff Law

On 02/09/2015 06:51 PM, Thomas Preud'homme wrote:


ChangeLog entry for part 1 is as follows:

*** gcc/ChangeLog ***

2015-02-09  Thomas Preud'homme  

 * combine.c (sign_extend_short_imm): New.
 (set_nonzero_bits_and_sign_copies): Use above new function for sign
 extension of src short immediate.
 (reg_nonzero_bits_for_combine): Likewise for tem.

OK with a very minor nit.




diff --git a/gcc/combine.c b/gcc/combine.c
index ad3bed0..f2b26c2 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1640,6 +1640,26 @@ setup_incoming_promotions (rtx_insn *first)
  }
  }

+#ifdef SHORT_IMMEDIATES_SIGN_EXTEND
+/* If MODE has a precision lower than PREC and SRC is a non-negative constant
+   that would appear negative in MODE, sign-extend SRC for use in nonzero_bits
+   because some machines (maybe most) will actually do the sign-extension and
+   this is the conservative approach.
+
+   ??? For 2.5, try to tighten up the MD files in this regard instead of this
+   kludge.  */
+
+static rtx
+sign_extend_short_imm (rtx src, machine_mode mode, unsigned int prec)
+{
+  if (GET_MODE_PRECISION (mode) < prec && CONST_INT_P (src)
+  && INTVAL (src) > 0 && val_signbit_known_set_p (mode, INTVAL (src)))
+src = GEN_INT (INTVAL (src) | ~GET_MODE_MASK (mode));
Can you go ahead and put each condition of the && on a separate line. 
It uses more vertical space, but IMHO makes this easier to read.As I 
said, it was a nit :-)


OK with that fix.

jeff



Re: C PATCH for c/52085 (enum forward declarations and attribute packed)

2015-04-24 Thread Jeff Law

On 04/24/2015 05:36 AM, Marek Polacek wrote:

On Thu, Apr 23, 2015 at 08:25:51PM -0600, Jeff Law wrote:

What happens if we have used the enum inside an aggregate?  Can we just
blindly change the alignment/precision like that?


If you just forward declare an enum/struct, it has an incomplete type, so you
cannot use it inside an aggregate.  In fact, you can't do much with that until
it's completed, except creating a pointer to it.  E.g. applying sizeof to an
incomplete type is forbidden.  With my patch, we behave the same as clang.

Clearly a "duh" moment for me.

OK for the trunk.

jeff



Re: [PATCH] gcc/genrecog.c: Check matching constraint in MATCH_OPERAND.

2015-04-24 Thread Jeff Law

On 02/26/2015 08:26 AM, Chen Gang S wrote:

2015-02-26  Chen Gang


* genrecog.c (validate_pattern): Check matching constraint in
MATCH_OPERAND and use 'opnu' for all 'XINT (pattern, 0)'.
I've updated the ChangeLog and verified the x86_64 continues to build 
with this patch.  I also did a build using config-list.mk and while 
there were several failures, none of them were related to this patch AFAICT.


I also temporarily reverted the xtensa fix and verified that with this 
patch installed genrecog issued an appropriate error.


Installed on the trunk.

Thanks!

Jeff


Re: C++ delayed folding branch review

2015-04-24 Thread Jason Merrill

On 04/24/2015 09:46 AM, Kai Tietz wrote:

Sure, we can use here instead *_fully_fold, but for what costs?  In
general we need to deal here a simple one-level fold for simplifying
constant-values, and/or removing useless type-conversions.


Well, here you're doing a two-level fold.  And in general fold relies on 
subexpressions being appropriately folded.  So for warning folding, I 
think the caching approach we were talking about in the call today is 
the way to go.



@@ -597,9 +597,9 @@ null_member_pointer_value_p (tree t)
  return false;
else if (TYPE_PTRMEMFUNC_P (type))
  return (TREE_CODE (t) == CONSTRUCTOR
-   && integer_zerop (CONSTRUCTOR_ELT (t, 0)->value));
+   && integer_zerop (fold (CONSTRUCTOR_ELT (t, 0)->value)));
else if (TYPE_PTRDATAMEM_P (type))
-return integer_all_onesp (t);
+return integer_all_onesp (fold (t));



Calling fold here is wrong; it doesn't handle constexpr, and we should have
folded before we got here.


s.o. we need to make sure constant-values get rid of useless
types-conversions/negates/etc ...


Certainly a constant value from maybe_constant_value should not have a 
useless type conversion wrapped around it, as that makes it appear 
non-constant.



Well, fold_convert isn't necessarily the same as fold (convert ())
within C++, due convert handles special cases fold_convert doesn't.


Ah, true.


@@ -576,7 +576,6 @@ build_simple_base_path (tree expr, tree binfo)
 expr = build3 (COMPONENT_REF,
cp_build_qualified_type (type, type_quals),
expr, field, NULL_TREE);
-   expr = fold_if_not_in_template (expr);


I don't think we need to remove this fold, since it is part of compiler
internals rather than something the user wrote.  Really, we should represent
the base conversion with something like a CONVERT_EXPR and only call this
function when we want to fold it.  But that can wait for a later patch.


Ok.  I remove this fold-case due simply removing
fold_if_not_in_template function.  So well, we could re-add a call for
fold, if not in template.


Let's try not checking for being in a template, see if it breaks.


That said, we should probably just remove this case and the next, as they
are obsolete.  I'll remove them on the trunk.


Done.


+static tree
+cp_fold (tree x, hash_map *fold_hash)
+{




I still think we need a hybrid of this and the constexpr code: it isn't full
folding if we aren't doing constexpr evaluation.  But we can't just use
maybe_constant_value because that only folds C++ constant-expressions, and
we want to fold more things than that.  I suppose one simple approach for
now would be to call maybe_constant_value from cp_fold.


Well, the functionality of cp_fold and maybe_constant_value (well,
actually how constexpr.c works) are different in cases of
non-constant results.


I think that's exactly what I was saying above: "we can't just use
maybe_constant_value because that only folds C++ constant-expressions, 
and we want to fold more things than that."


My point is that cp_fold should be a superset of maybe_constant_value, 
to fix bugs like 53792.  And the easiest way to get that would seem to 
be by calling maybe_constant_value from cp_fold.



@@ -614,9 +614,13 @@ cp_fold_convert (tree type, tree expr)
  }
else
  {
-  conv = fold_convert (type, expr);
+  if (TREE_CODE (expr) == INTEGER_CST)
+conv = fold_convert (type, expr);
+  else
+conv = convert (type, expr);


Why?  If we're calling cp_fold_convert in a place where we don't want to
fold, we should stop calling it rather than change it.



See, that we want to take care that constant-value is found here.
Otherwise we don't want anything folded.   Well, we could introduce
for this a special routine to abstract intention here.


OK, that makes sense.  Say, a function called like "fold" that only 
folds conversions (and NEGATE_EXPR) of constants.  It might make sense 
to do that and otherwise continue to delay folding of conversions.  In 
that context I guess this change makes sense.



@@ -8502,16 +8502,18 @@ compute_array_index_type (tree name, tree size,
tsubst_flags_t complain)
+  /* We need to do fully folding to determine if we have VLA, or not.  */
+  tree size_constant = maybe_constant_value (size);


Why is this needed?  We already call maybe_constant_value earlier in
compute_array_index_type.


Well, see above.  We might have constant-value not simplified.  So we
need a way to make sure we simplify in such case, but if it is
none-constant, we don't want an modified expression.  So
maybe_constant_value does this ...


Yes, but we already called maybe_constant_value.  Calling it again 
shouldn't make any difference.



-  itype = fold (itype);
+  itype = maybe_constant_value (itype);
-   itype = variable_size (fold (newitype));
+   itype = variable_size (maybe_constant_value (newitype));


Maybe these should use cp_fully_f

Re: [PATCH] Tidy up locking for libgomp OpenACC entry points

2015-04-24 Thread Julian Brown
On Thu, 23 Apr 2015 18:41:34 +0200
Thomas Schwinge  wrote:

> Hi!
> 
> On Wed, 22 Apr 2015 19:42:43 +0100, Julian Brown
>  wrote:
> > This patch is an attempt to fix some potential race conditions with
> > accesses to shared data structures from multiple concurrent threads
> > in libgomp's OpenACC entry points. The main change is to move
> > locking out of lookup_host and lookup_dev in oacc-mem.c and into
> > their callers (which can then hold the locks for the whole
> > operation that they are performing).
> 
> Yeah, that makes sense I guess.
> 
> > Also missing locking has been added for gomp_acc_insert_pointer.
> > 
> > Tests look OK (with offloading to NVidia PTX).
> 
> How did you test to get some confidence in the locking being
> sufficient?

Merely by running the existing tests and via inspection, sadly. I'm not
sure how much value we'd get from implementing an exhaustive threading
testsuite at this stage: I guess testcases will be easier to come by in
the future if/when people start to use e.g. OpenMP and OpenACC together.

> Going further (separate patch?), a few more comments:
> 
> Is it OK that oacc-init.c:cached_base_dev is accessed without locking?
> 
> Generally, we have to keep in mind that the same device may be
> accessed in parallel through both OpenACC and OpenMP interfaces.  For
> this, for example, in oacc-init.c, even though acc_device_lock is
> held, is it OK to call gomp_init_device(D) without D->lock being
> locked?  (Compare to target.c code.)
> 
> Please document what exactly oacc-init.c:acc_device_lock is to guard.
> I'm not sure I'm understanding this correctly.

I've attached a follow-on patch that documents the purpose of
acc_device_lock -- and also fixes some places that should have been
holding the lock, but were not.

I've also added locking (with dev->lock) when calling gomp_init_device
and gomp_fini_device from the OpenACC initialisation/finalisation code.

> Should oacc-init.c:acc_shutdown_1 release goacc_thread_lock before any
> gomp_fatal calls?  (That seems to be the general policy in libgomp.)

I added this to the first patch.

> > --- a/libgomp/oacc-mem.c
> > +++ b/libgomp/oacc-mem.c
> 
> > @@ -120,25 +116,32 @@ acc_free (void *d)
> >  {
> >splay_tree_key k;
> >struct goacc_thread *thr = goacc_thread ();
> > +  struct gomp_device_descr *acc_dev = thr->dev;
> >  
> >if (!d)
> >  return;
> >  
> >assert (thr && thr->dev);
> >  
> > +  gomp_mutex_lock (&acc_dev->lock);
> > +
> >/* We don't have to call lazy open here, as the ptr value must
> > have been returned by acc_malloc.  It's not permitted to pass NULL
> > in (unless you got that null from acc_malloc).  */
> > -  if ((k = lookup_dev (thr->dev->openacc.data_environ, d, 1)))
> > -   {
> > - void *offset;
> > +  if ((k = lookup_dev (acc_dev->openacc.data_environ, d, 1)))
> > +{
> > +  void *offset;
> > +
> > +  offset = d - k->tgt->tgt_start + k->tgt_offset;
> >  
> > - offset = d - k->tgt->tgt_start + k->tgt_offset;
> > +  gomp_mutex_unlock (&acc_dev->lock);
> >  
> > - acc_unmap_data ((void *)(k->host_start + offset));
> > -   }
> > +  acc_unmap_data ((void *)(k->host_start + offset));
> > +}
> > +  else
> > +gomp_mutex_unlock (&acc_dev->lock);
> 
> Does it make sense to make the unlock unconditional, and move the
> acc_unmap_data after it, guarded by »if (k)«?

I've left this one -- just a stylistic tweak, but I think it's fine
as-is.

> > -  thr->dev->free_func (thr->dev->target_id, d);
> > +  acc_dev->free_func (acc_dev->target_id, d);
> >  }
> >  
> >  void
> > @@ -178,16 +181,24 @@ acc_deviceptr (void *h)
> >goacc_lazy_initialize ();
> >  
> >struct goacc_thread *thr = goacc_thread ();
> > +  struct gomp_device_descr *dev = thr->dev;
> > +
> > +  gomp_mutex_lock (&dev->lock);
> >  
> > -  n = lookup_host (thr->dev, h, 1);
> > +  n = lookup_host (dev, h, 1);
> >  
> >if (!n)
> > -return NULL;
> > +{
> > +  gomp_mutex_unlock (&dev->lock);
> > +  return NULL;
> > +}
> >  
> >offset = h - n->host_start;
> >  
> >d = n->tgt->tgt_start + n->tgt_offset + offset;
> >  
> > +  gomp_mutex_unlock (&dev->lock);
> > +
> >return d;
> >  }
> 
> Do we need to retain the lock while working with n?  If not, the
> unlock could be placed right after the lookup_host, unconditionally.
> I'm confused -- it's commonly being done (retained) in target.c code,
> but not in the tgt_fn lookup in target.c:GOMP_target.

I think the difference can be explained as follows: a given mapping
(splay_key_tree_s) is essentially immutable after it is created (apart
from the refcounts). Thus it can be safely accessed *so long as we know
it will not be deallocated*.

Now, in some parts of target.c, we have an "active" target_mem_desc,
corresponding to a set of host-target mappings. So long as we are
holding that target_mem_desc (e.g. as we are in GOMP_target_data), we
know that none of the associated mappings' refcounts will fall to zero:
so,

Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2015-04-24 Thread Jiong Wang

Jeff Law writes:

> On 04/21/2015 08:24 AM, Jiong Wang wrote:
>>
>> Jiong Wang writes:
>>
>>> 2015-04-14 18:24 GMT+01:00 Jeff Law :
 On 04/14/2015 10:48 AM, Steven Bosscher wrote:
>>
>> So I think this stage2/3 binary difference is acceptable?
>
>
> No, they should be identical. If there's a difference, then there's a
> bug - which, it seems, you've already found, too.

 RIght.  And so the natural question is how to fix.

 At first glance it would seem like having this new code ignore dependencies
 rising from debug insns would work.

 Which then begs the question, what happens to the debug insn -- it's
 certainly not going to be correct anymore if the transformation is made.
>>>
>>> Exactly.
>>>
>>> The debug_insn 2776 in my example is to record the base address of a
>>> local array. the new code is doing correctly here by not shuffling the
>>> operands of insn 2556 and 2557 as there is additional reference of
>>> reg:1473 from debug insn, although the code will still execute correctly
>>> if we do the transformation.
>>>
>>> my understanding to fix this:
>>>
>>>* delete the out-of-date mismatch debug_insn? as there is no guarantee
>>>  to generate accurate debug info under -O2.
>>>
>>>  IMO, this debug_insn may affect "DW_AT_location" field for variable
>>>  descrption of "classes" in .debug_info section, but it's omitted in
>>>  the final output already.
>>>
>>>  <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable)
>>>  <38a4f>   DW_AT_name : (indirect string, offset: 0x18db): classes
>>>  <38a53>   DW_AT_decl_file   : 1
>>>  <38a54>   DW_AT_decl_line   : 548
>>>  <38a56>   DW_AT_type: <0x38cb4>
>>>
>>>* update the debug_insn? if the following change is OK with dwarf 
>>> standard
>>>
>>> from
>>>
>>>   insn0: reg0 = fp + reg1
>>>   debug_insn: var_loc = reg0 + const_off
>>>   insn1: reg2 = reg0 + const_off
>>>
>>> to
>>>
>>>   insn0: reg0 = fp + const_off
>>>   debug_insn: var_loc = reg0 + reg1
>>>   insn1: reg2 = reg0 + reg1
>>>
>>> Thanks,
>>>
>>
>> And attachment is the new patch which will update debug_insn as
>> described in the second solution above.
>>
>> Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK.
>>
>> On AArch64, this patch give 600+ new rtl loop invariants found across
>> spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new
>> invariants found in the critical function "regex_compile".
>>
>> The similar improvements may be achieved on other RISC backends like
>> powerpc/mips I guess.
>>
>> One thing to mention, for AArch64, one minor glitch in
>> aarch64_legitimize_address needs to be fixed to let this patch take
>> effect, I will send out that patch later as it's a seperate issue.
>> Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so
>> should be OK, and I verified the base address of local array in the
>> testcase given by Seb on pr62173 do hoisted on ppc64 now. I think
>> pr62173 is fixed on those 64bit arch by this patch.
>>
>> Thoughts?
>>
>> Thanks.
>>
>> 2015-04-21  Jiong Wang  
>>
>> gcc/
>>* loop-invariant.c (find_defs): Enable DF_DU_CHAIN build.
>>(vfp_const_iv): New hash table.
>>(expensive_addr_check_p): New boolean.
>>(init_inv_motion_data): Initialize new variables.>
>>(free_inv_motion_data): Release hash table.
>>(create_new_invariant): Set cheap_address to false for iv in
>>vfp_const_iv table.
>>(find_invariant_insn): Skip dependencies check for iv in vfp_const_iv
>>table.
>>(use_for_single_du): New function.
>>(reshuffle_insn_with_vfp): Likewise.
>>(find_invariants_bb): Call reshuffle_insn_with_vfp.
>>
>> gcc/testsuite/
>> * gcc.dg/pr62173.c: New testcase.
> So ultimately isn't this just a specialized version of reassociation of 
> pointer arithmetic?  And if so, don't we have all the usual problems 
> around introducing overflow?
>
>
> ISTM if this is going to go forward (ie, we somehow convince ourselves 
> that this kind of reassociation is OK), then should it be made to apply 
> on pointers in general rather than restricting to stack, frame, 
> virtual-frame?
>

Jeff,

  Thanks for the review.

  This transformation is not reassociation of pointer arithmetic.
  
  The idea of this patch is, given two instructions with variable value,
  we may get new instruction sequences with fixed value by reassociating
  their operands. And currently GCC only generate such instruction
  sequences for local array accessing as far as I known.

  for the statement "D.2707 = A[i]", gcc generate the following
  instruction sequence:

  (insn 92 91 93 6 (set (reg/f:DI 148)
(plus:DI (reg/f:DI 64 sfp)
(reg:DI 147 [ i ])))
 (expr_list:REG_DEAD (reg:DI 147 [ i ])
(nil)))

  (insn 93 92 94 6 (set (reg:SI 149 [ D.2707 ])
(zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148)
(const_in

Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

2015-04-24 Thread Marek Polacek
On Thu, Apr 23, 2015 at 09:11:39PM -0600, Martin Sebor wrote:
> I wonder if the tests where the left shift operands are both
> constants really do invoke undefined behavior in GCC. For
> example, AFAICS, in (-1 << 0) and other constant expressions
> gcc computes the shift in unsigned HOST_WIDE_INT which is well
> defined.
 
Yeah, two INTEGER_CSTs are computed in wide-int using uhwi.  But -1 << 0
certainly is UB according to ISO C/C++ and I think we should follow the
standards.

> It seems the warning would be more valuable (and less likely
> dismissed as overly pedantic) if it was issued when the second
> operand was not constant and the computation had to be done in
> hardware (or even better, in hardware not known to  use the
> same instructions for positive and negative operands).
 
What I've tried to do in the patch was to mimic the other two Wshift-*
warnings.  While the new warning triggered a few times in the testsuite, GCC
bootstrap itself was clean.  You raise a very good point though, we don't
want to be overly pedantic.

I suppose we could go with the patch as-is, and if users complain too much,
warn only if the second operand is not constant or so.  But I hope that
left-shifting a negative value is not a common thing.

> The warning would also be valuable in some sort of a portability
> mode (with -pedantic?) where the code is intended to be portable
> to implementations that don't provide well-defined semantics for
> left shifts of negative values.

I really think -Wshift-negative-value should be kin to -Wshift-count-negative
and -Wshift-count-overflow, those are enabled by default.

Thanks,

Marek


Re: [PATCH] [AArch32] Additional bics patterns.

2015-04-24 Thread Alex Velenko

Hi,

This patch adds rtl patterns to generate bics instructions with shift.

Added attribute predicable_short_it since last respin.

Done full regression run on arm-none-eabi and arm-none-gnueabihf.
Bootstrapped on arm-none-gnueabihf.

Is this patch ok?

gcc/config

2015-04-24  Alex Velenko  

  * arm/arm.md (andsi_not_shiftsi_si_scc): New pattern.
  * (andsi_not_shiftsi_si_scc_no_reuse): New pattern.

gcc/testsuite

2015-04-24  Alex Velenko 

  * gcc.target/arm/bics_1.c : New testcase.
  * gcc.target/arm/bics_2.c : New testcase.
  * gcc.target/arm/bics_3.c : New testcase.
  * gcc.target/arm/bics_4.c : New testcase.
---
 gcc/config/arm/arm.md | 44 +++
 gcc/testsuite/gcc.target/arm/bics_1.c | 54 
+
 gcc/testsuite/gcc.target/arm/bics_2.c | 57 
+++

 gcc/testsuite/gcc.target/arm/bics_3.c | 41 +
 gcc/testsuite/gcc.target/arm/bics_4.c | 49 ++
 5 files changed, 245 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_4.c

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 164ac13..9e774c1 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2768,6 +2768,50 @@
  (const_string "logic_shift_reg")))]
 )

+(define_insn "andsi_not_shiftsi_si_scc_no_reuse"
+  [(set (reg:CC_NOOV CC_REGNUM)
+   (compare:CC_NOOV
+   (and:SI (not:SI (match_operator:SI 0 "shift_operator"
+   [(match_operand:SI 1 "s_register_operand" "r")
+(match_operand:SI 2 "arm_rhs_operand" "rM")]))
+   (match_operand:SI 3 "s_register_operand" "r"))
+   (const_int 0)))
+   (clobber (match_scratch:SI 4 "=r"))]
+  "TARGET_32BIT"
+  "bic%.%?\\t%4, %3, %1%S0"
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
+   (set_attr "conds" "set")
+   (set_attr "shift" "1")
+   (set (attr "type") (if_then_else (match_operand 2 
"const_int_operand" "")

+ (const_string "logic_shift_imm")
+ (const_string "logic_shift_reg")))]
+)
+
+(define_insn "andsi_not_shiftsi_si_scc"
+  [(parallel [(set (reg:CC_NOOV CC_REGNUM)
+   (compare:CC_NOOV
+   (and:SI (not:SI (match_operator:SI 0 "shift_operator"
+   [(match_operand:SI 1 "s_register_operand" "r")
+(match_operand:SI 2 "arm_rhs_operand" "rM")]))
+   (match_operand:SI 3 "s_register_operand" "r"))
+   (const_int 0)))
+   (set (match_operand:SI 4 "s_register_operand" "=r")
+(and:SI (not:SI (match_op_dup 0
+[(match_dup 1)
+ (match_dup 2)]))
+(match_dup 3)))])]
+  "TARGET_32BIT"
+  "bic%.%?\\t%4, %3, %1%S0"
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
+   (set_attr "conds" "set")
+   (set_attr "shift" "1")
+   (set (attr "type") (if_then_else (match_operand 2 
"const_int_operand" "")

+ (const_string "logic_shift_imm")
+ (const_string "logic_shift_reg")))]
+)
+
 (define_insn "*andsi_notsi_si_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
(compare:CC_NOOV
diff --git a/gcc/testsuite/gcc.target/arm/bics_1.c 
b/gcc/testsuite/gcc.target/arm/bics_1.c

new file mode 100644
index 000..173eb89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bics_1.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps -fno-inline" } */
+/* { dg-require-effective-target arm32 } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a & ~b;
+
+  /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, 
r\[0-9\]+" 2 } } */

+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  int d = a & ~(b << 3);
+
+  /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, 
r\[0-9\]+, .sl \#3" 1 } } */

+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+
+  x = bics_si_test1 (29, ~4, 5);
+  if (x != ((29 & 4) + ~4 + 5))
+abort ();
+
+  x = bics_si_test1 (5, ~2, 20);
+  if (x != 25)
+abort ();
+
+x = bics_si_test2 (35, ~4, 5);
+  if (x != ((35 & ~(~4 << 3)) + ~4 + 5))
+abort ();
+
+  x = bics_si_test2 (96, ~2, 20);
+  if (x != 116)
+  abort ();
+
+  return 0;
+}
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/bics_2.c 
b/gcc/testsuite/gcc.target/arm/bics_2.c

new file mode 100644
index 000..740d7c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bics_2.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps -fno-inline" } */
+/* { dg-

Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-24 Thread David Edelsohn
On Fri, Apr 24, 2015 at 11:34 AM, Segher Boessenkool
 wrote:
> On Thu, Apr 23, 2015 at 02:16:04PM -0500, Peter Bergner wrote:
>> Ok, I created a separate ttest define_insn that hard codes the operands.
>> I switched to using r1 rather than r0 as the second operand, for the
>> reason that there could be code that sets r0 directly before this insn
>> and I didn't want to create a unneeded read-after-write dependency.
>> I thought r1 should be safe to use, since it's only updated in the
>> prologue/epilogue and should be constant for the life of the function.
>
> GPR1 is very likely already read before as well :-)
>
>> > Maybe just { powerpc64 } works?
>>
>> powerpc64 doesn't work.  It tells us whether the target will execute
>> 64-bit instructions or not.
>
> Ah yes, it is more like a "powerpc64_hw".  Oh well.
>
> All looks great to me now, thanks for the changes,

Okay for me.

Thanks, David


Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-24 Thread Segher Boessenkool
On Thu, Apr 23, 2015 at 02:16:04PM -0500, Peter Bergner wrote:
> Ok, I created a separate ttest define_insn that hard codes the operands.
> I switched to using r1 rather than r0 as the second operand, for the
> reason that there could be code that sets r0 directly before this insn
> and I didn't want to create a unneeded read-after-write dependency.
> I thought r1 should be safe to use, since it's only updated in the
> prologue/epilogue and should be constant for the life of the function.

GPR1 is very likely already read before as well :-)

> > Maybe just { powerpc64 } works?
> 
> powerpc64 doesn't work.  It tells us whether the target will execute
> 64-bit instructions or not.

Ah yes, it is more like a "powerpc64_hw".  Oh well.

All looks great to me now, thanks for the changes,


Segher


RE: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro

2015-04-24 Thread Petar Jovanovic
-Original Message-
From: Maciej W. Rozycki [mailto:ma...@linux-mips.org] 
Sent: Thursday, April 23, 2015 4:55 PM
To: Petar Jovanovic
Cc: gcc-patches@gcc.gnu.org; catherine_mo...@mentor.com; Matthew Fortune
Subject: RE: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro

>  I hope you find this explanation satisfactory, however if you still find 
> anything unclear or have any other questions or concerns, then please 
> shout.

Hi Maciej,

Thank you for your response. I should have coined my question better
though, so you do not need to write it all. In the v4 patch, I have
skipped the test for bare iron targets.

>  BTW, can you please change 0x0FF0 to 0x0FFF or suchlike to avoid 
> making the executable larger than absolutely necessary

Done.

> You didn't mention in your patch description why you picked up these
particular addresses.

There is a comment in the test which indicates that. I have put it as
part of the commit message now as well.

Regards,
Petar




[PATCH][ARM][committed] Use uppercase for code iterator names

2015-04-24 Thread Kyrill Tkachov

Hi all,

We usually use uppercase for code iterator names (and other iterators in 
general),
but these few were written in lowercase.
Encountering their use in arm.md and neon.md confused me for a bit
because I thought they were primitive RTL codes.
For the sake of consistency, this patch makes all code iterators
use uppercase so that they can be at a glance identified
as iterators.

Tested on arm-none-eabi. I guess build would have broken had I messed something 
up here.

Applied as obvious with r222416.

Thanks,
Kyrill

2015-04-24  Kyrylo Tkachov  

* config/arm/iterators.md (shiftable_ops): Rename to...
(SHIFTABLE_OPS): ... This.  Update use in comments.
(ior_xor): Rename to...
(IOR_XOR): ... This.
(vqh_ops): Rename to...
(VQH_OPS): ... This.
(vqhs_ops): Rename to...
(VQHS_OPS): ... This.
(rshifts): Rename to...
(RSHIFTS): ... This.
(returns): Rename to...
(RETURNS): ... This.
* config/arm/arm.md: Update uses of the above.
* config/arm/neon.md: Likewise.
commit 55e38483b5928f4c23c0fa902ea9a914e2d6d205
Author: Kyrylo Tkachov 
Date:   Wed Apr 8 09:10:01 2015 +0100

[ARM][trivial] Use uppercase for code iterator names

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ac94404..8b5036b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5175,7 +5175,7 @@ (define_split
 
 (define_split
   [(set (match_operand:SI 0 "s_register_operand" "")
-	(ior_xor:SI (and:SI (ashift:SI
+	(IOR_XOR:SI (and:SI (ashift:SI
 			 (match_operand:SI 1 "s_register_operand" "")
 			 (match_operand:SI 2 "const_int_operand" ""))
 			(match_operand:SI 3 "const_int_operand" ""))
@@ -5187,7 +5187,7 @@ (define_split
== (GET_MODE_MASK (GET_MODE (operands[5]))
& (GET_MODE_MASK (GET_MODE (operands[5]))
 	  << (INTVAL (operands[2])"
-  [(set (match_dup 0) (ior_xor:SI (ashift:SI (match_dup 1) (match_dup 2))
+  [(set (match_dup 0) (IOR_XOR:SI (ashift:SI (match_dup 1) (match_dup 2))
   (match_dup 4)))
(set (match_dup 0) (zero_extend:SI (match_dup 5)))]
   "operands[5] = gen_lowpart (GET_MODE (operands[5]), operands[0]);"
@@ -7945,7 +7945,7 @@ (define_insn "*sibcall_value_insn"
 )
 
 (define_expand "return"
-  [(returns)]
+  [(RETURNS)]
   "(TARGET_ARM || (TARGET_THUMB2
&& ARM_FUNC_TYPE (arm_current_func_type ()) == ARM_FT_NORMAL
&& !IS_STACKALIGN (arm_current_func_type (
@@ -7983,7 +7983,7 @@ (define_insn "*cond_return"
   [(set (pc)
 (if_then_else (match_operator 0 "arm_comparison_operator"
 		   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (returns)
+  (RETURNS)
   (pc)))]
   "TARGET_ARM  "
   "*
@@ -8006,7 +8006,7 @@ (define_insn "*cond_return_inverted"
 (if_then_else (match_operator 0 "arm_comparison_operator"
 		   [(match_operand 1 "cc_register" "") (const_int 0)])
   (pc)
-		  (returns)))]
+		  (RETURNS)))]
   "TARGET_ARM "
   "*
   {
@@ -8340,7 +8340,7 @@ (define_insn "trap"
 
 (define_insn "*_multsi"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r")
-	(shiftable_ops:SI
+	(SHIFTABLE_OPS:SI
 	 (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
 		  (match_operand:SI 3 "power_of_two_operand" ""))
 	 (match_operand:SI 1 "s_register_operand" "rk,")))]
@@ -8354,7 +8354,7 @@ (define_insn "*_multsi"
 
 (define_insn "*_shiftsi"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
-	(shiftable_ops:SI
+	(SHIFTABLE_OPS:SI
 	 (match_operator:SI 2 "shift_nomul_operator"
 	  [(match_operand:SI 3 "s_register_operand" "r,r,r")
 	   (match_operand:SI 4 "shift_amount_operand" "M,M,r")])
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 66f3f4d..1e7f3f1 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -191,34 +191,34 @@ (define_code_iterator GTUGEU [gtu geu])
 (define_code_iterator COMPARISONS [eq gt ge le lt])
 
 ;; A list of ...
-(define_code_iterator ior_xor [ior xor])
+(define_code_iterator IOR_XOR [ior xor])
 
 ;; Operations on two halves of a quadword vector.
-(define_code_iterator vqh_ops [plus smin smax umin umax])
+(define_code_iterator VQH_OPS [plus smin smax umin umax])
 
 ;; Operations on two halves of a quadword vector,
 ;; without unsigned variants (for use with *SFmode pattern).
-(define_code_iterator vqhs_ops [plus smin smax])
+(define_code_iterator VQHS_OPS [plus smin smax])
 
 ;; A list of widening operators
 (define_code_iterator SE [sign_extend zero_extend])
 
 ;; Right shifts
-(define_code_iterator rshifts [ashiftrt lshiftrt])
+(define_code_iterator RSHIFTS [ashiftrt lshiftrt])
 
 ;; Iterator for integer conversions
 (define_code_iterator FIXUORS [fix unsigned_fix])
 
 ;; Binary operators whose second operand can be shifted.
-(define_code_iterator shiftable_ops [plus minus ior xor and])
+(define_code_iterator SHIFTABLE_OPS [plus minus ior xor and])
 
-;; plus an

Re: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo

2015-04-24 Thread Kyrill Tkachov


On 24/04/15 09:39, Marcus Shawcroft wrote:

On 22 April 2015 at 16:08, Kyrill Tkachov  wrote:

On 22/04/15 12:46, Kyrill Tkachov wrote:

[Sorry for resending twice. My mail client glitched]


+/* Native CPU detection for aarch64.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+


That should be 2015, otherwise OK.


Thanks, I've committed the patch with that change with r222415.
Venkat kindly gave it a bootstrap on an AMD Seattle board as well
with BOOT_CFLAGS=-O2 -mcpu=native and CFLAGS_FOR_TARGET=-O2 -mcpu=native.

If anyone finds that cpu detection doesn't work on their system
do let me know, together with the contents of your /proc/cpuinfo.

Cheers,
Kyrill



Cheers
/Marcus





[PATCH v4][MIPS] fix CRT_CALL_STATIC_FUNCTION macro

2015-04-24 Thread Petar Jovanovic
New patch, v4. PTAL.
(resending the patch in plan text format, sorry for the multiple emails, I
am going to switch back to git-send-email)

Regards,
Petar

gcc/ChangeLog:

2015-04-21  Petar Jovanovic  

* config/mips/mips.h (CRT_CALL_STATIC_FUNCTION): Fix the macro to
use
la/jalr instead of jal.

gcc/testsuite/ChangeLog:

2015-04-21  Petar Jovanovic  

* gcc.target/mips/call-from-init.c: New test.
* gcc.target/mips/mips.exp: Add section_start to mips_option_groups.

commit 1c54988c43bdc653a730b97ea7a774a83f944e56
Author: Petar Jovanovic 
Date:   Thu Jan 22 02:15:22 2015 +0100

[mips] fix CRT_CALL_STATIC_FUNCTION macro

jal can not reach a target in different region, so replace it with
la/jalr variant.
New test has been added. It uses -Wl,--section-start=.init= and
-Wl,--section-start=.text= to put the two sections in different 256-MB
regions, so the test fails (on Linux) if the change is not applied.

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index ec69ed5..4bd83f5 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3034,11 +3034,11 @@ while (0)
nop\n\
 1: .cpload $31\n\
.set reorder\n\
-   jal " USER_LABEL_PREFIX #FUNC "\n\
+   la $25, " USER_LABEL_PREFIX #FUNC "\n\
+   jalr $25\n\
.set pop\n\
" TEXT_SECTION_ASM_OP);
-#elif ((defined _ABIN32 && _MIPS_SIM == _ABIN32) \
-   || (defined _ABI64 && _MIPS_SIM == _ABI64))
+#elif (defined _ABIN32 && _MIPS_SIM == _ABIN32)
 #define CRT_CALL_STATIC_FUNCTION(SECTION_OP, FUNC) \
asm (SECTION_OP "\n\
.set push\n\
@@ -3048,7 +3048,22 @@ while (0)
nop\n\
 1: .set reorder\n\
.cpsetup $31, $2, 1b\n\
-   jal " USER_LABEL_PREFIX #FUNC "\n\
+   la $25, " USER_LABEL_PREFIX #FUNC "\n\
+   jalr $25\n\
+   .set pop\n\
+   " TEXT_SECTION_ASM_OP);
+#elif (defined _ABI64 && _MIPS_SIM == _ABI64)
+#define CRT_CALL_STATIC_FUNCTION(SECTION_OP, FUNC) \
+   asm (SECTION_OP "\n\
+   .set push\n\
+   .set nomips16\n\
+   .set noreorder\n\
+   bal 1f\n\
+   nop\n\
+1: .set reorder\n\
+   .cpsetup $31, $2, 1b\n\
+   dla $25, " USER_LABEL_PREFIX #FUNC "\n\
+   jalr $25\n\
.set pop\n\
" TEXT_SECTION_ASM_OP);
 #endif
diff --git a/gcc/testsuite/gcc.target/mips/call-from-init.c
b/gcc/testsuite/gcc.target/mips/call-from-init.c
new file mode 100644
index 000..4fe5acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/call-from-init.c
@@ -0,0 +1,11 @@
+/* Check that __do_global_ctors_aux can be reached from .init section that
+   is in a different (256MB) region. */
+/* { dg-do run { target { "mips*-*-linux*" } } } */
+/* { dg-skip-if "" { "mips*-sde-elf mips*-mti-elf mips*-img-elf" } } */
+/* { dg-options "-Wl,--section-start=.init=0x0FFF" } */
+/* { dg-options "-Wl,--section-start=.text=0x1000" } */
+
+int
+main (void) {
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
b/gcc/testsuite/gcc.target/mips/mips.exp
index a0980a9..1dd4173 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -254,6 +254,7 @@ set mips_option_groups {
 madd "HAS_MADD"
 maddps "HAS_MADDPS"
 lsa "(|!)HAS_LSA"
+section_start "-Wl,--section-start=.*"
 }
 
 for { set option 0 } { $option < 32 } { incr option } {



C++ PATCH to remove obsolete cases from potential_constant_expression_1

2015-04-24 Thread Jason Merrill
potential_constant_expression_1 originally implemented an early version 
of the constexpr proposal, which included a concept of "potential 
constant expression" that no longer exists in the standard.  It's still 
useful for catching expressions that could never be constant, but these 
two are obsolete.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f7ac4fd6b541a7b07fc041e1300c7169672dc97e
Author: Jason Merrill 
Date:   Thu Apr 23 17:19:49 2015 -0400

	* constexpr.c (potential_constant_expression_1) [MINUS_EXPR]:
	Remove obsolete code.
	[NE_EXPR]: Likewise.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 2990519..6465677 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4156,15 +4156,6 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict,
   }
 
 case MINUS_EXPR:
-  /* -- a subtraction where both operands are pointers.   */
-  if (TYPE_PTR_P (TREE_OPERAND (t, 0))
-  && TYPE_PTR_P (TREE_OPERAND (t, 1)))
-{
-  if (flags & tf_error)
-error ("difference of two pointer expressions is not "
-   "a constant expression");
-  return false;
-}
   want_rval = true;
   goto binary;
 
@@ -4174,16 +4165,6 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict,
 case GE_EXPR:
 case EQ_EXPR:
 case NE_EXPR:
-  /* -- a relational or equality operator where at least
-one of the operands is a pointer.  */
-  if (TYPE_PTR_P (TREE_OPERAND (t, 0))
-  || TYPE_PTR_P (TREE_OPERAND (t, 1)))
-{
-  if (flags & tf_error)
-error ("pointer comparison expression is not a "
-   "constant expression");
-  return false;
-}
   want_rval = true;
   goto binary;
 


Re: [PATCH][PR65802] Mark ifn_va_arg with ECF_NOTHROW

2015-04-24 Thread Tom de Vries

On 24-04-15 05:25, Bin.Cheng wrote:

On Tue, Apr 21, 2015 at 3:10 PM, Tom de Vries  wrote:

Hi,

this patch fixes PR65802.

diff --git a/gcc/testsuite/g++.dg/

pr65802.C b/gcc/testsuite/g++.dg/pr65802.C

new file mode 100644
index 000..26e5317
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr65802.C
@@ -0,0 +1,29 @@
+// { dg-do compile }
+// { dg-options "-O0" }
+
+typedef int tf ();
+
+struct S
+{
+  tf m_fn1;
+} a;
+
+void
+fn1 ()
+{
+  try
+{
+  __builtin_va_list c;
+  {
+ int *d = __builtin_va_arg (c, int *);
+ int **e = &d;
+ __asm__("" : "=d"(e));

Hi, thanks for fixing the issue.
But 'd' is a machine specific constraint?  This case failed on all arm
processors.


Hi,

I've rewritten the test-case for C, made the function a valid stdargs function, 
and removed the superfluous inline assembly.


Committed as attached.

Thanks,
- Tom
2015-04-24  Tom de Vries  

	PR tree-optimization/65802
	* g++.dg/pr65802.C: Move to ...
	* gcc.dg/pr65802.c: ... here.  Add -fexceptions to dg-options. Include
	stdarg.h.  Rewrite for C.
	(fn1): Use va_list and va_arg.  Make variable args function.  Add use of
	va_start and va_end.  Remove unnecessary inline asm.

diff --git a/gcc/testsuite/g++.dg/pr65802.C b/gcc/testsuite/g++.dg/pr65802.C
deleted file mode 100644
index 26e5317..000
--- a/gcc/testsuite/g++.dg/pr65802.C
+++ /dev/null
@@ -1,29 +0,0 @@
-// { dg-do compile }
-// { dg-options "-O0" }
-
-typedef int tf ();
-
-struct S
-{
-  tf m_fn1;
-} a;
-
-void
-fn1 ()
-{
-  try
-{
-  __builtin_va_list c;
-  {
-	int *d = __builtin_va_arg (c, int *);
-	int **e = &d;
-	__asm__("" : "=d"(e));
-	a.m_fn1 ();
-  }
-  a.m_fn1 ();
-}
-  catch (...)
-{
-
-}
-}
diff --git a/gcc/testsuite/gcc.dg/pr65802.c b/gcc/testsuite/gcc.dg/pr65802.c
new file mode 100644
index 000..fcec234
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65802.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -fexceptions" } */
+
+#include 
+
+struct S
+{
+  int (*m_fn1) (void);
+} a;
+
+void
+fn1 (int d, ...)
+{
+  va_list c;
+  va_start (c, d);
+
+  {
+int *d = va_arg (c, int *);
+
+int **e = &d;
+
+a.m_fn1 ();
+  }
+
+  a.m_fn1 ();
+
+  va_end (c);
+}
-- 
1.9.1



Re: [PATCH] PR target/65849 -- Add additional options to powerpc #pragma/attribute target support

2015-04-24 Thread David Edelsohn
On Thu, Apr 23, 2015 at 3:25 PM, Michael Meissner
 wrote:
> Steve Munroe was tuning an application on PowerPC, and needed to set the
> -msave-toc-indirect option for only one function, and it wasn't available via
> the #praga/attribute target options.  This patch adds support for the
> additional options that don't involve an ABI change to the list of options 
> that
> can be set via the #pragma GCC target or attribute((target(...))) support.
>
> I have bootstrapped the compiler on a power7 (big endian) and power8 (little
> endian) with no regressions.  Is this patch ok to install?  I would like to
> backport this patch to the current branches (5.x, 4.9, 4.8, maybe 4.7).  Is
> this ok?
>
> [gcc]
> 2015-04-23  Michael Meissner  
>
> PR target/65849
> * config/rs6000/rs6000.opt (-mvsx-align-128): Make options that
> save to independent variables use the Save attribute.  This will
> allow these options to be modified with the #pragma/attribute
> target support.
> (-mallow-movmisalign): Likewise.
> (-mallow-df-permute): Likewise.
> (-msched-groups): Likewise.
> (-malways-hint): Likewise.
> (-malign-branch-targets): Likewise.
> (-mvectorize-builtins): Likewise.
> (-msave-toc-indirect): Likewise.
>
> * config/rs6000/rs6000.c (rs6000_opt_masks): Add more options that
> can be set via the #pragma/attribute target support.
> (rs6000_opt_vars): Likewise.
> (rs6000_inner_target_options): If VSX was set, also set
> -mno-avoid-indexed-addresses.
>
> [gcc/testsuite]
> 2015-04-23  Michael Meissner  
>
> PR target/65849
> * gcc.target/powerpc/pr65849-1.c: New test to verify being able to
> set new options.
> * gcc.target/powerpc/pr65849-2.c: Likewise.

Okay.

Thanks, David


Re: C++ delayed folding branch review

2015-04-24 Thread Kai Tietz
2015-04-24 6:22 GMT+02:00 Jason Merrill :
>> +  expr = fold (expr);
>>/* This may happen, because for LHS op= RHS we preevaluate
>>   RHS and create C_MAYBE_CONST_EXPR >, which
>>   means we could no longer see the code of the EXPR.  */
>>if (TREE_CODE (expr) == C_MAYBE_CONST_EXPR)
>>  expr = C_MAYBE_CONST_EXPR_EXPR (expr);
>>if (TREE_CODE (expr) == SAVE_EXPR)
>> -expr = TREE_OPERAND (expr, 0);
>> +expr = fold (TREE_OPERAND (expr, 0));
>
>
> How about moving the first fold after the SAVE_EXPR block, so that we only
> need to call fold once?

I will try.  It might be required to fold in front to strip of useless
type-conversions ...

>> +case NEGATE_EXPR:
>> +case BIT_NOT_EXPR:
>> +case CONVERT_EXPR:
>> +case VIEW_CONVERT_EXPR:
>> +case NOP_EXPR:
>> +case FIX_TRUNC_EXPR:
>> +  {
>> +   tree op1 = TREE_OPERAND (expr, 0);
>> +   tree fop1 = fold (op1);
>> +   if (fop1 && op1 != fop1)
>> + fop1 = fold_build1_loc (loc, TREE_CODE (expr), TREE_TYPE (expr),
>> + fop1);

well, AFAIR was the idea here to make sure we do constant-value folding ...

>
> Isn't this redundant with the call to fold above?  If not, it seems that the
> above call should be to *_fully_fold.  I suppose we want an entry point
> defined by both front ends that c-common code can call which does full
> folding of an expression.

Sure, we can use here instead *_fully_fold, but for what costs?  In
general we need to deal here a simple one-level fold for simplifying
constant-values, and/or removing useless type-conversions.  As convert
doesn't fold them away anymore, it can stay with such NOP_EXPR before
constant-values, which cause us to fail on later value-analyzis.  So
sure, we can use *_fully_fold, but actually we don't need a full-fold.
(this applies to other places below too, where you mentioned the same
issue, too).

>> @@ -597,9 +597,9 @@ null_member_pointer_value_p (tree t)
>>  return false;
>>else if (TYPE_PTRMEMFUNC_P (type))
>>  return (TREE_CODE (t) == CONSTRUCTOR
>> -   && integer_zerop (CONSTRUCTOR_ELT (t, 0)->value));
>> +   && integer_zerop (fold (CONSTRUCTOR_ELT (t, 0)->value)));
>>else if (TYPE_PTRDATAMEM_P (type))
>> -return integer_all_onesp (t);
>> +return integer_all_onesp (fold (t));
>
>
> Calling fold here is wrong; it doesn't handle constexpr, and we should have
> folded before we got here.

s.o. we need to make sure constant-values get rid of useless
types-conversions/negates/etc ...

>> warn_logical_operator (loc, code, boolean_type_node,
>> -  code_orig_arg1, arg1,
>> -  code_orig_arg2, arg2);
>> +  code_orig_arg1, fold (arg1),
>> +  code_orig_arg2, fold (arg2));
>
>
> I think warn_logical_operator should call back into *_fully_fold. Likewise
> for most similar added calls to fold.

Same as above.  It isn't necessary for further analysis to perform
fully_fold on arg1/arg2.  But of course we can introduce this load
here.

>> @@ -7356,8 +7354,13 @@ build_over_call (struct z_candidate *cand, int
>> flags, tsu
>> bst_flags_t complain)
>>
>>gcc_assert (j <= nargs);
>>nargs = j;
>> +  {
>> +tree *fargs = (!nargs ? argarray : (tree *) alloca (nargs * sizeof
>> (tree)))
>> ;
>> +for (j = 0; j < nargs; j++)
>> +  fargs[j] = fold_non_dependent_expr (argarray[j]);
>
>
> Similarly, this and build_cxx_call should use cp_fully_fold.

Well, cp_fully_fold wouldn't resolve constexpr ... at least not in completeness.

>> @@ -7602,7 +7614,6 @@ build_cxx_call (tree fn, int nargs, tree *argarray,
>>&& current_function_decl
>>&& DECL_DECLARED_CONSTEXPR_P (current_function_decl))
>>  optimize = 1;
>> -  fn = fold_if_not_in_template (fn);
>>optimize = optimize_sav;
>
>
> Since we're removing the fold, we can also remove the changes to "optimize".

Yeah, I missed that.  I removed superfluous code-path and committed it
on branch.

>> @@ -443,7 +443,7 @@ build_base_path (enum tree_code code,
>>
>>   t = TREE_TYPE (TYPE_VFIELD (current_class_type));
>>   t = build_pointer_type (t);
>> - v_offset = convert (t, current_vtt_parm);
>> + v_offset = fold (convert (t, current_vtt_parm));
>
> fold_convert should work here.

Well, fold_const isn't necessarily the same as fold (convert ())
within C++, due convert handles special cases fold_const doesn't.  At
least I ran here into issues for templates within templates AFAIR.


>> @@ -576,7 +576,6 @@ build_simple_base_path (tree expr, tree binfo)
>> expr = build3 (COMPONENT_REF,
>>cp_build_qualified_type (type, type_quals),
>>expr, field, NULL_TREE);
>> -   expr = fold_if_not_in_template (expr);
>
>
> I don't think we need to remove this fold, since it is part of compil

Re: [PATCH] [RTL] Relax CSE check to set REG_EQUAL notes.

2015-04-24 Thread Alex Velenko



On 24/04/15 02:16, Jeff Law wrote:

On 04/10/2015 03:14 AM, Alex Velenko wrote:

On 09/03/15 17:40, Jeff Law wrote:

On 03/09/15 03:53, Steven Bosscher wrote:

On Wed, Mar 4, 2015 at 12:09 PM, Alex Velenko wrote:

For example, in arm testcase pr43920-2.c, CSE previously decided not
to put
an "obvious" note on insn 9, as set value was the same as note value.
At the same time, other insns set up as -1 were set up through a
register
and did get a note:


...which is the point of the REG_EQUAL notes. In insn 8 there is a
REG_EQUAL note to show that the value of r111 is known. In insn 9 the
known value is, well, known from SET_SRC so there is no need for a
REG_EQUAL note. Adding REG_EQUAL notes in such cases is just wasteful.

RIght.  I'd rather look into why later passes aren't discovering
whatever equivalences are important rather than adding the redundant
notes.

Regardless, I think this is a gcc-6 issue, so I'm not likely to look at
it in the immediate future.

jeff



Hi Jeff,
I reworked the patch to satisfy your preference.

This patch enables cfgcleanup.c to use const int rtx as REG_EQUAL notes.
For example, this benefits Jump2 to find extra optimisation opportunities.
This patch fixes gcc.target/arm/pr43920-2.c for arm-none-eabi.

Bootstraped on x86, run full regression run on arm-none-eabi and
aarch64-none-elf.

Is this patch ok?

gcc/

2015-03-17  Alex Velenko  

  * cfgcleanup.c (can_replace_by): Use const int rtx of single set as
  REG_EQUAL note.

Now I finally see this in my queue.  I recalled the discussion around
whether or not to add the redundant notes, but hadn't had a chance to
look at the updated patch.

AFAICT, this is redundant with Shiva's patch, right?

jeff



Hi Jeff,
Yes, you are correct, this patch is now redundant.
Kind regards,
Alex



Re: C PATCH for c/52085 (enum forward declarations and attribute packed)

2015-04-24 Thread Marek Polacek
On Thu, Apr 23, 2015 at 08:25:51PM -0600, Jeff Law wrote:
> What happens if we have used the enum inside an aggregate?  Can we just
> blindly change the alignment/precision like that?
 
If you just forward declare an enum/struct, it has an incomplete type, so you
cannot use it inside an aggregate.  In fact, you can't do much with that until
it's completed, except creating a pointer to it.  E.g. applying sizeof to an
incomplete type is forbidden.  With my patch, we behave the same as clang.

Marek


Re: [RFC] Adding unroller and DCE to early passes

2015-04-24 Thread Jan Hubicka
> > I added cunrolle pass that differ from cunrolli by not allowing code size
> > growth even at -O3 (because we do not know what loops are hot yet).
> > We currently unroll tiny loop with 2 calls that I think needs to be tammed
> > down, I can do that if the patch seems to make sense.
> 
> Please.

OK, will do that.
> 
> > I tried several options and ended up adding cunrolle before FRE and 
> > reordering
> > FRE and SRA: SRA needs constant propagation to happen after unrolling to 
> > work
> > and I think value numbering does work pretty well on non-SRAed 
> > datastructures.
> > I also added DCE just before unrolling. This increases number of unrolls by
> > about 60% on both tramp3d and eon. (basically we want to have DCE and cprop
> > done to make unroller metrics go resonably well)
> 
> I've re-ordered SRA and ealias for GCC5 because ealias benefits from SRA
> while SRA doesn't need PTA.  You undo that improvement.

OK, I see. I assumed that the PTA solutions are updated when SRA introduce new 
scalars.
We could simply do limited CCP when unrolling happened lifting the need for FRE 
in between cunrolle and SRA.  I dimly remember we even have code for that?
> 
> We should improve the unroller instead of requiring DCE before it.

If I get loop with dead code in it, because of einline or gimple production or 
whatever,
what unroller should do short of  doing its own DCE pass on the whole function 
body (well the mark, ignoring sweep)?

Honza
> 
> I think that adding more passes to the early pipeline is bad.  Sure, it
> will help weird C++ code-bases.  But it will slow down the compile for
> all the rest.

I am not 100% convinced about this especially with LTO, where we need to pickle
all garbage we did not eliminated.
> 
> As we want early unrolling to get the secondary effects by performing
> better FRE/DCE/DSE I think trying to do a better job in value-numbering
> for the kind of loops we are talking about would be better.  For tramp3d
> we are talking about cases like
> 
>  for (i=0; i<3; ++i)
>   dim[i] = i;
> 
> or similar code which ends up storing to a single array.  There is
> the vn_reference_lookup_3 function in tree-ssa-sccvn.c which is
> the canonical place for value-numbering "tricks".  It "simply"
> needs to be taught how to "look through loops".
> 
> I'd like to net get into the game of huge pass order dependences in
> early-opts - that just shows our scalar cleanups are too weak.
> Ideally we'd just have a single pass in early-opts...

Other motivation for getting rid of loops is to make control&data flow more
explicit for IPA analysers.
If you have code o nvectors of size 3 that does series of operations, it is very
hard to reorder/fuse/merge loops as needed for the scalar optimizations...

Honza


[PATCH, i386]: Merge movlpd/movsd with movhpd to form movupd

2015-04-24 Thread Uros Bizjak
Hello!

This patch revives the old patch from Wei to merge movlpd/movsd with
movhpd to movupd. As evident from the patch, the patch merges only
instructions with simple memory references, which should IMO cover all
interesting cases (please see the included testcases).

I have played a bit with load/store fusion infrastructure, but at the
end of the day decided, that it isn't worth to handle relatively rare
transformation with such costly approach.

2015-04-24  Uros Bizjak  
Wei Mi  

* config/i386/i386-protos.h (ix86_operands_ok_for_move_multiple): New.
* config/i386/i386.c (extract_base_offset_in_addr): New function.
(ix86_operands_ok_for_move_multiple): Ditto.
* config/i386/sse.md (movsd/movhpd to movupd peephole2): New pattern.
(movlpd/movhpd to movupd peephole2): Ditto.

testsuite/ChangeLog:

2015-04-24  Uros Bizjak  
Wei Mi  

* gcc.target/i386/sse2-load-multi.c: New test.
* gcc.target/i386/sse2-store-multi.c: Ditto.

Tested on x86_64-linux-gnu {,-m32} and will commit it to mainline SVN soon.

Uros.
Index: config/i386/i386-protos.h
===
--- config/i386/i386-protos.h   (revision 222384)
+++ config/i386/i386-protos.h   (working copy)
@@ -304,6 +304,8 @@
 #endif
 
 extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);
+extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
+   enum machine_mode mode);
 
 #ifdef RTX_CODE
 /* Target data for multipass lookahead scheduling.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 222384)
+++ config/i386/i386.c  (working copy)
@@ -51726,6 +51726,92 @@
 }
 #endif
 
+/* If MEM is in the form of [base+offset], extract the two parts
+   of address and set to BASE and OFFSET, otherwise return false.  */
+
+static bool
+extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset)
+{
+  rtx addr;
+
+  gcc_assert (MEM_P (mem));
+
+  addr = XEXP (mem, 0);
+  
+  if (GET_CODE (addr) == CONST)
+addr = XEXP (addr, 0);
+
+  if (REG_P (addr) || GET_CODE (addr) == SYMBOL_REF)
+{
+  *base = addr;
+  *offset = const0_rtx;
+  return true;
+}
+
+  if (GET_CODE (addr) == PLUS
+  && (REG_P (XEXP (addr, 0))
+ || GET_CODE (XEXP (addr, 0)) == SYMBOL_REF)
+  && CONST_INT_P (XEXP (addr, 1)))
+{
+  *base = XEXP (addr, 0);
+  *offset = XEXP (addr, 1);
+  return true;
+}
+
+  return false;
+}
+
+/* Given OPERANDS of consecutive load/store, check if we can merge
+   them into move multiple.  LOAD is true if they are load instructions.
+   MODE is the mode of memory operands.  */
+
+bool
+ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
+   enum machine_mode mode)
+{
+  HOST_WIDE_INT offval_1, offval_2, msize;
+  rtx mem_1, mem_2, reg_1, reg_2, base_1, base_2, offset_1, offset_2;
+
+  if (load)
+{
+  mem_1 = operands[1];
+  mem_2 = operands[3];
+  reg_1 = operands[0];
+  reg_2 = operands[2];
+}
+  else
+{
+  mem_1 = operands[0];
+  mem_2 = operands[2];
+  reg_1 = operands[1];
+  reg_2 = operands[3];
+}
+
+  gcc_assert (REG_P (reg_1) && REG_P (reg_2));
+
+  if (REGNO (reg_1) != REGNO (reg_2))
+return false;
+
+  /* Check if the addresses are in the form of [base+offset].  */
+  if (!extract_base_offset_in_addr (mem_1, &base_1, &offset_1))
+return false;
+  if (!extract_base_offset_in_addr (mem_2, &base_2, &offset_2))
+return false;
+
+  /* Check if the bases are the same.  */
+  if (!rtx_equal_p (base_1, base_2))
+return false;
+
+  offval_1 = INTVAL (offset_1);
+  offval_2 = INTVAL (offset_2);
+  msize = GET_MODE_SIZE (mode);
+  /* Check if mem_1 is adjacent to mem_2 and mem_1 has lower address.  */
+  if (offval_1 + msize != offval_2)
+return false;
+
+  return true;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 222384)
+++ config/i386/sse.md  (working copy)
@@ -1183,6 +1183,21 @@
  ]
  (const_string "")))])
 
+;; Merge movsd/movhpd to movupd for TARGET_SSE_UNALIGNED_LOAD_OPTIMAL targets.
+(define_peephole2
+  [(set (match_operand:V2DF 0 "register_operand")
+   (vec_concat:V2DF (match_operand:DF 1 "memory_operand")
+(match_operand:DF 4 "const0_operand")))
+   (set (match_operand:V2DF 2 "register_operand")
+   (vec_concat:V2DF (vec_select:DF (match_dup 0)
+   (parallel [(const_int 0)]))
+(match_operand:DF 3 "memory_operand")))]
+  "TARGET_SSE2 && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
+   && ix86_operands_ok_for_move_multiple (operands, t

RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits

2015-04-24 Thread Thomas Preud'homme
Hi,

first of all, sorry for the delay. We quickly entered stage 4 and I thought
it was best waiting for stage 1 to update you on this.

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> Of course both approaches are not exclusive. I'll try to test with *both*
> rs6000 bootstrap and with a cross-compiler for one of these targets.

I did two experiments where I checked the impact of removing the code
guarded by SHORT_IMMEDIATES_SIGN_EXTEND. In the first one I removed
the code in both rtlanal.c and combine.c. In the second, I only removed the
code from combine.c (in both occurences). In both cases powerpc
bootstrap succeeded.

I then proceeded to use these 2 produced compilers to compile the same
gcc source (actually the source from removing all code guarded by the
macro). I compared the output of objdump on the resulting g++ and
found that in both case the output was different from the one without
any modification. Both diffs look like:

Disassembly of section .init:
@@ -1359,7 +1359,7 @@ Disassembly of section .text:
 10003a94:  f8 21 ff 81 stdur1,-128(r1)
 10003a98:  eb e4 00 00 ld  r31,0(r4)
 10003a9c:  3c 82 ff f8 addis   r4,r2,-8
-10003aa0:  38 84 d7 60 addir4,r4,-10400
+10003aa0:  38 84 d7 70 addir4,r4,-10384
 10003aa4:  7f e3 fb 78 mr  r3,r31
 10003aa8:  4b ff f0 d9 bl  10002b80 
<003d.plt_call.strcmp@@GLIBC_2.3+0>
 10003aac:  e8 41 00 28 ld  r2,40(r1)
@@ -1371,7 +1371,7 @@ Disassembly of section .text:
 10003ac4:  79 2a ff e3 rldicl. r10,r9,63,63
 10003ac8:  41 82 00 78 beq-10003b40 
<._ZL22sanitize_spec_functioniPPKc+0xc0>
 10003acc:  3c 62 ff f8 addis   r3,r2,-8
-10003ad0:  38 63 f5 70 addir3,r3,-2704
+10003ad0:  38 63 f5 b0 addir3,r3,-2640
 10003ad4:  38 21 00 80 addir1,r1,128
 10003ad8:  e8 01 00 10 ld  r0,16(r1)
 10003adc:  eb e1 ff f8 ld  r31,-8(r1)

(this one is when comparing g++ compiled by GCC with partial removal of
the code guarded by the macro compared to compiled without GCC
being modified.

I may have done a mistake when doing the experiment though and can
do it again if you wish.

Best regards,

Thomas




Re: [PATCH] Improve targetm.binds_local_p for common symbols on s390*/arm/aarch64 (PR target/65780)

2015-04-24 Thread Marcus Shawcroft
On 23 April 2015 at 17:36, Jakub Jelinek  wrote:
> Hi!
>
> This patch undoes the PR65780 performance regressions on a few targets
> I have tested to work fine.
> This PR was about an access to uninitialized COMMON symbol defined in
> executable (or PIE) where there is a normal symbol definition in a shared
> library.  The PR65780 fix that got committed stopped treating
> such COMMONs as binding local on all architectures, except for i?86 (for
> normal execs only) and x86_64 (for normal execs and PIEs, but the latter
> only with recent linker).  As s390/arm/aarch64 seems to work fine
> (generate a COPY relocation and thus define symbol locally) in non-PIE
> executables, this patch changes those to a function that has been added for
> that behavior.  E.g. powerpc64{,le} on the other side isn't able to link
> that case neither in normal execs nor PIEs.

> * config/aarch64/aarch64-linux.h (TARGET_BINDS_LOCAL_P): Likewise.

AArch64 chunk is fine with me. Cheers /Marcus


RE: [Patch, MIPS] Minor cleanup in mips.md

2015-04-24 Thread Matthew Fortune
> 2015-04-23  Steve Ellcey  
> 
>   * config/mips/mips.md: (*madd4) Remove accum_in attribute.
>   (*madd3): Ditto.
>   (*msub4): Ditto.
>   (*msub3): Ditto.
>   (*nmadd4): Ditto.
>   (*nmadd3): Ditto.
>   (*nmadd4_fastmath): Ditto.
>   (*nmadd3_fastmath): Ditto.
>   (*nmsub4): Ditto.
>   (*nmsub3): Ditto.
>   (*nmsub4_fastmath): Ditto.
>   (*nmsub3_fastmath): Ditto.

OK, thanks.

Matthew




Re: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo

2015-04-24 Thread Marcus Shawcroft
On 22 April 2015 at 16:08, Kyrill Tkachov  wrote:
>
> On 22/04/15 12:46, Kyrill Tkachov wrote:
>>
>> [Sorry for resending twice. My mail client glitched]


+/* Native CPU detection for aarch64.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+


That should be 2015, otherwise OK.

Cheers
/Marcus


RE: [Patch] OPT: Update heuristics for loop-invariant for address arithmetic.

2015-04-24 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Sandiford [mailto:rdsandif...@googlemail.com] 
Sent: Friday, April 24, 2015 12:40 AM
To: Ajit Kumar Agarwal
Cc: vmaka...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch] OPT: Update heuristics for loop-invariant for address 
arithmetic.

Very delayed answer, sorry...

Ajit Kumar Agarwal  writes:
> Hello All:
>
> The changes are made in the patch to update the heuristics for loop 
> invariant for address arithemetic at RTL Level.  The heuristics are 
> updated with the consideration of single def and use for register 
> pressure calculation instead Of ignoring it and also to update the 
> estimated register pressure cost along with the check of actual uses 
> with Address uses.
>
> With the above change, gains are seen in the Geomean for Mibench/EEMBC 
> benchmarks for microblaze target. No Regression is seen in deja GNU 
> regressions tests for microblaze.

>>Since thispatch is basically removing code, were you able to analyse why that 
>>code was having a detrimental effect?  I assume it benefited some target 
>>??>>originally.

This patch modified the estimated register pressure cost for non ira based 
register pressure(flag_ira_loop_pressure is not set).
Following changes were made in the estimated register pressure cost.

size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
   regs_used, speed, call_p)
   - estimate_reg_pressure_cost (new_regs[0],
 regs_used, speed, call_p));

is changed to 

size_cost =  estimate_reg_pressure_cost (regs_needed[0],
   regs_used, speed, call_p);

This looks reasonable change for the estimated_reg_pressure_cost calculation. 
The other changes I have made, For the single use for the given
Def the current code does not include such invariants in the register pressure 
calculation which I have enabled including the single use for the
Given def for the register pressure calculation. Though the comment in the code 
says that there won't be a new register for single use after moving,
But moving such invariants outside the loop will affect the register pressures 
as  the spans of the live range after moving out of loops differs from 
The original loop. Since the Live range spans varies such cases should surely 
affect the registers pressure.

The above explanation looks reasonable and the code that does not include such 
invariants into register pressure is removed in the patch.

I don't any see background or the patches in the past that explicit made the 
above check as part of  any performance improvement or bug fix.

Thanks & Regards
Ajit

 



Thanks,
Richard


Re: [Patch, Fortran, PR58586, v2] ICE with derived type with allocatable component passed by value

2015-04-24 Thread Andre Vehreschild
Hi all,

Just to clear things up, with

> I have tested the code in the comments of pr61831 with v2 of this patch and
> got no issues.

I meant, that I have checked the code in comment #28 of pr61831. With only
this patch, there still is an illegal free() of unallocated memory.
With this patch and the one of Mikael in comment 44 I get no issues anymore.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [Ping, Patch, Fortran] Prevent segfault with dump-*-original for implicit class expressions.

2015-04-24 Thread Andre Vehreschild
Ping!

This ICE is still in gfortran and with it the patch still current. 

Bootstrapped and regtests ok on x86_64-linux-gnu/F21.

The patch has only changed in being applyable on current trunk (ct) w/o big
shifts.

Ok for trunk?

- Andre

This crash is still current 
On Fri, 13 Mar 2015 11:33:39 +0100
Andre Vehreschild  wrote:

> Hi all,
> 
> this is another patch preventing a segfault. This time the segfault occurred,
> when -fdump-(fortran|tree)-original was given with the program having an
> implicit class set. The issue is that the _data component is assumed to be
> present in a BT_CLASS w/o checking and trying to access the unlimited
> polymorphic flag there. The patch fixes this by redirecting the access to the
> flag to the correct position whether the _data component is present or not.
> 
> Building a testcase for this is difficult for me. May be I am just blocked in
> the head there. The issue occurred when trying to dump the
> (fortran|tree)-original of the testcase gfortran.dg/implicit_class_1.f90. So
> one could argue to add the flag to that testcase, but does it pay in contrast
> to the additional effort each time the testsuite is executed? I have added the
> test now, not being happy with it, but having no clue how to do it better.
> 
> Bootstraps and regtests ok on x86_64-linux-gnu/F20.
> 
> Ok, for trunk?
> 
> Regards,
>   Andre


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


crashfix2_v1.clog
Description: Binary data
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 320eb01..71721dd 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -484,13 +484,24 @@ gfc_compare_types (gfc_typespec *ts1, gfc_typespec *ts2)
   if (ts1->type == BT_VOID || ts2->type == BT_VOID)
 return 1;
 
-  if (ts1->type == BT_CLASS
-  && ts1->u.derived->components->ts.u.derived->attr.unlimited_polymorphic)
+  /* The _data component is not always present, therefore check for its
+ presence before assuming, that its derived->attr is available.
+ When the _data component is not present, then nevertheless the
+ unlimited_polymorphic flag may be set in the derived type's attr.  */
+  if (ts1->type == BT_CLASS && ts1->u.derived->components
+  && ((strcmp (ts1->u.derived->components->name, "_data") == 0
+	   && ts1->u.derived->components->ts.u.derived->attr
+		  .unlimited_polymorphic)
+	  || ts1->u.derived->attr.unlimited_polymorphic))
 return 1;
 
   /* F2003: C717  */
   if (ts2->type == BT_CLASS && ts1->type == BT_DERIVED
-  && ts2->u.derived->components->ts.u.derived->attr.unlimited_polymorphic
+  && ts2->u.derived->components
+  && ((strcmp (ts2->u.derived->components->name, "_data") == 0
+	   && ts2->u.derived->components->ts.u.derived->attr
+		  .unlimited_polymorphic)
+	  || ts2->u.derived->attr.unlimited_polymorphic)
   && (ts1->u.derived->attr.sequence || ts1->u.derived->attr.is_bind_c))
 return 1;
 
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 44392e8..ac1a0b1 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -4567,7 +4567,10 @@ gfc_type_compatible (gfc_typespec *ts1, gfc_typespec *ts2)
 
   if (is_class1
   && ts1->u.derived->components
-  && ts1->u.derived->components->ts.u.derived->attr.unlimited_polymorphic)
+  && ((strcmp (ts1->u.derived->components->name, "_data") == 0
+	   && ts1->u.derived->components->ts.u.derived->attr
+			.unlimited_polymorphic)
+	  || ts1->u.derived->attr.unlimited_polymorphic))
 return 1;
 
   if (!is_derived1 && !is_derived2 && !is_class1 && !is_class2)
@@ -4576,13 +4579,14 @@ gfc_type_compatible (gfc_typespec *ts1, gfc_typespec *ts2)
   if (is_derived1 && is_derived2)
 return gfc_compare_derived_types (ts1->u.derived, ts2->u.derived);
 
-  if (is_derived1 && is_class2)
+  if (is_derived1 && is_class2 && ts2->u.derived->components)
 return gfc_compare_derived_types (ts1->u.derived,
   ts2->u.derived->components->ts.u.derived);
-  if (is_class1 && is_derived2)
+  if (is_class1 && is_derived2 && ts1->u.derived->components)
 return gfc_type_is_extension_of (ts1->u.derived->components->ts.u.derived,
  ts2->u.derived);
-  else if (is_class1 && is_class2)
+  else if (is_class1 && is_class2 && ts1->u.derived->components
+	   && ts2->u.derived->components)
 return gfc_type_is_extension_of (ts1->u.derived->components->ts.u.derived,
  ts2->u.derived->components->ts.u.derived);
   else
diff --git a/gcc/testsuite/gfortran.dg/implicit_class_1.f90 b/gcc/testsuite/gfortran.dg/implicit_class_1.f90
index 329f57a..fff1f2b 100644
--- a/gcc/testsuite/gfortran.dg/implicit_class_1.f90
+++ b/gcc/testsuite/gfortran.dg/implicit_class_1.f90
@@ -4,6 +4,10 @@
 !
 ! Contributed by Reinhold Bader 
 
+! Add dump-tree-original to check, if the patch preventing a gfortran
+! segfault is working correctly.
+! { dg-options "-fdump-tree-original" }
+
 program upimp
   implicit class(foo) (a-b)
   implicit class(*) (c)
@@ -33,

Re: [RFC] Adding unroller and DCE to early passes

2015-04-24 Thread Richard Biener
On Fri, 24 Apr 2015, Jan Hubicka wrote:

> Hi,
> I was looking into reordering optimization queue of early passes. This is
> motivated by PR57249 and fact that I run into some super sily loops while
> looking into firefox dumps.  It  indeed makes a lot of sense for me as for 
> code
> dealing with short arrays this enables more SRA/FRE/DSE.
> 
> I added cunrolle pass that differ from cunrolli by not allowing code size
> growth even at -O3 (because we do not know what loops are hot yet).
> We currently unroll tiny loop with 2 calls that I think needs to be tammed
> down, I can do that if the patch seems to make sense.

Please.

> I tried several options and ended up adding cunrolle before FRE and reordering
> FRE and SRA: SRA needs constant propagation to happen after unrolling to work
> and I think value numbering does work pretty well on non-SRAed datastructures.
> I also added DCE just before unrolling. This increases number of unrolls by
> about 60% on both tramp3d and eon. (basically we want to have DCE and cprop
> done to make unroller metrics go resonably well)

I've re-ordered SRA and ealias for GCC5 because ealias benefits from SRA
while SRA doesn't need PTA.  You undo that improvement.

We should improve the unroller instead of requiring DCE before it.

> On tramp3d there is not great code quality improvement (which is expected), 
> but
> we get stronger early opts. In particular the number of basic blocks at
> release_ssa time is 8% down, the inline size estimate at IPA time about 2%.
> 
> We do 124 unrollings early, 136 at cunrolli time and 132 in cunroll
> compared to 228 at cunrolli and 130 at cunroll without the patch.
>
> New early DCE pass does 8005 deletions, early cddce does 9307, first late dce
> does 2888.
> Without patch early cddce does 9510 (so the patch basically doubles statement 
> count
> we get rid of), first late dce does 8587 (almost 3 times as much ).
> 
> This seems like a significant decrease of garbage pushed through IPA pipeline
> (which in turn confuses inline metrics).
> 
> 
> On DealII we early unroll 477 loops (out of 11k), 421 at cunrolli and 122 at 
> cunroll
> without patch we unroll loops 1428 in cunrolli and 127 loop in cunroll
> 
> early dce1 removes 24133 and cddce 15599, late dce does 7698 statements.
> without patch we cddce1 33007 statements and first late dce does 8717 
> statements.
> 20% increase of # of statements we get rid of early and 13% decrease in late 
> DCE.
> 
> Number of basic blocks at release_ssa time drops from 270859 to 260485, by 21%
> number of statements by 4%.
> 
> If this looks resonable, I would suggest doing one change at a time, that
> is first adding extra dce pass, then reordering SRA and finally adding the
> cunrolle pass (after implementing the logic to not unroll calls)

I think that adding more passes to the early pipeline is bad.  Sure, it
will help weird C++ code-bases.  But it will slow down the compile for
all the rest.

As we want early unrolling to get the secondary effects by performing
better FRE/DCE/DSE I think trying to do a better job in value-numbering
for the kind of loops we are talking about would be better.  For tramp3d
we are talking about cases like

 for (i=0; i<3; ++i)
  dim[i] = i;

or similar code which ends up storing to a single array.  There is
the vn_reference_lookup_3 function in tree-ssa-sccvn.c which is
the canonical place for value-numbering "tricks".  It "simply"
needs to be taught how to "look through loops".

I'd like to net get into the game of huge pass order dependences in
early-opts - that just shows our scalar cleanups are too weak.
Ideally we'd just have a single pass in early-opts...

Richard.
 
> Honza
> 
> Index: tree-ssa-loop-ivcanon.c
> ===
> --- tree-ssa-loop-ivcanon.c   (revision 222391)
> +++ tree-ssa-loop-ivcanon.c   (working copy)
> @@ -1571,4 +1571,59 @@ make_pass_complete_unrolli (gcc::context
>return new pass_complete_unrolli (ctxt);
>  }
>  
> +/* Early complete unrolling pass; do only those internal loops where code
> +   size gets reduced.  */
>  
> +namespace {
> +
> +const pass_data pass_data_complete_unrolle =
> +{
> +  GIMPLE_PASS, /* type */
> +  "cunrolle", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_COMPLETE_UNROLL, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +class pass_complete_unrolle : public gimple_opt_pass
> +{
> +public:
> +  pass_complete_unrolle (gcc::context *ctxt)
> +: gimple_opt_pass (pass_data_complete_unrolle, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return optimize >= 2; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_complete_unrolle
> +
> +unsigned int
> +pass_complete_unrolle::execute (function *fun)
> +{
> +  unsigned ret = 0;
> +
>

Re: RFA (tree.c): PATCH for may_alias vs. TYPE_CANONICAL, related to c++/50800

2015-04-24 Thread Richard Biener
On Thu, Apr 23, 2015 at 5:49 PM, Jason Merrill  wrote:
> In general, TYPE_CANONICAL of a type strips all attributes.  An exception to
> this seems to be that TYPE_REF_CAN_ALIAS_ALL remains set on the
> TYPE_CANONICAL of a pointer/reference type even though its TREE_TYPE no
> longer has the may_alias attribute, and is inconsistent with
> "affects_type_identity" being false for may_alias.  This seems to have been
> a mistake in the patch that first introduced TYPE_CANONICAL, rather than a
> deliberate choice.
>
> I'm also planning to fix 50800 in the front end, but this patch still seems
> like an improvement.
>
> Tested x86-64-pc-linux-gnu, OK for trunk?

Looks good to me.  We eventually were confused by alias.c get_alias_set
dropping to TYPE_CANONICAL (TYPE_MAIN_VARIANT (T)) before determining
the alias set of a type T.  But of course ref-all is only relevant for
the alias-set
of type *T.

Richard.


Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease

2015-04-24 Thread Richard Biener
On Fri, 24 Apr 2015, Jakub Jelinek wrote:

> On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote:
> > Hi,
> > 
> > I checked this patch into gcc-5-branch.
> 
> That's wrong according to https://gcc.gnu.org/develop.html#num_scheme

Yes.  Please revert.

In future please don't do this kind of changes as non-RM without
approval.

Thanks,
Richard.

> > Index: ChangeLog
> > ===
> > --- ChangeLog   (revision 222386)
> > +++ ChangeLog   (working copy)
> > @@ -1,3 +1,7 @@
> > +2015-04-23  H.J. Lu  
> > +
> > +   * DEV-PHASE: Set to prerelease.
> > +
> >  2015-04-23  Bill Schmidt  
> >  
> > Backport from mainline r222349
> > Index: DEV-PHASE
> > ===
> > --- DEV-PHASE   (revision 222386)
> > +++ DEV-PHASE   (working copy)
> > @@ -0,0 +1 @@
> > +prerelease
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)


[PATCH, i386]: Make mul operand commutative

2015-04-24 Thread Uros Bizjak
Hello!

2015-04-24  Uros Bizjak  

* config/i386/sse.md (*vec_widen_smult_even_v8si):
Mark operand1 commutative.

Tested on x86_64-linux-gnu {,-m32} and committed to mainline and 5.0 branch.

Uros.

Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 222401)
+++ config/i386/sse.md  (working copy)
@@ -9523,7 +9523,7 @@
(mult:V4DI
  (sign_extend:V4DI
(vec_select:V4SI
- (match_operand:V8SI 1 "nonimmediate_operand" "v")
+ (match_operand:V8SI 1 "nonimmediate_operand" "%v")
  (parallel [(const_int 0) (const_int 2)
 (const_int 4) (const_int 6)])))
  (sign_extend:V4DI


RE: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo

2015-04-24 Thread Kumar, Venkataramanan
Hi Kyrill, 

I checked this patch on AMD seattle board and it bootstrapped cleanly with  
BOOT_CFLAGS=-O2 -mcpu=native and CFLAGS_FOR_TARGET=-O2 -mcpu=native.

With -mcpu=cortex-57,  I get 

 .cpu cortex-a57+fp+simd+crc
options passed:  test.c -mcpu=cortex-a57 -mlittle-endian -mabi=lp64 
-auxbase-strip

With –mcpu=native, I get 

.cpu cortex-a57+fp+simd+crypto+crc
 options passed:  test.c -mlittle-endian -mabi=lp64 
-mcpu=cortex-a57+fp+simd+crypto+crc  -auxbase-strip

It matches the features.   evtstrm is not relevant  to compiler I believe. 

Another thing is detecting cache sizes automatically.   But I think as of now 
we have not yet included cache parameters in aarch64 tune tables.  


Regards,
Venkat.

  -Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Kyrill Tkachov
Sent: Wednesday, April 22, 2015 8:38 PM
To: GCC Patches
Cc: Marcus Shawcroft; Richard Earnshaw; James Greenhalgh; Evandro Menezes; 
Andrew Pinski
Subject: Re: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only 
/proc/cpuinfo


On 22/04/15 12:46, Kyrill Tkachov wrote:
> [Sorry for resending twice. My mail client glitched]
>
> On 20/04/15 16:47, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This is an attempt to add native CPU detection to AArch64 GNU/Linux targets.
>> Similar to other ports we use SPEC rewriting to rewrite 
>> -m{cpu,tune,arch}=native options into the appropriate 
>> CPU/architecture and the architecture extension options when appropriate 
>> (i.e. +crypto/+crc etc).
>>
>> For CPU/architecture detection it gets a bit involved, especially 
>> when running on a big.LITTLE system. My proposed approach is to look 
>> at /proc/cpuinfo/ and search for the implementer id and part number 
>> fields that uniquely identify each core (appropriate identifying 
>> information is added to aarch64-cores.def). If we find two types of 
>> core we have a big.LITTLE system, so search through the core 
>> definitions extracted from aarch64-cores.def to find if we support such a 
>> combination (currently only cortex-a57.cortex-a53 and cortex-a72.cortex-a53) 
>> and make sure that the implementer id field matches up.
>>
>> I tested this on a 4xCortex-A53 + 2xCortex-A57 big.LITTLE Ubuntu GNU/Linux 
>> system.
>> There are two formats for /proc/cpuinfo/ that I'm aware of. The first (old) 
>> one has the format:
>> --
>> processor: 0
>> processor: 1
>> processor: 2
>> processor: 3
>> processor: 4
>> processor: 5
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer: 0x41
>> CPU architecture: AArch64
>> CPU variant: 0x0
>> CPU part: 0xd03
>> --
>>
>> In this format it lists the 6 cores but the CPU part it reports is 
>> only the one for the core from which /proc/cpuinfo was read from (!), in 
>> this case one of the Cortex-A53 cores.
>> This means we detect a different CPU depending on which core GCC was 
>> invoked on. Not ideal really, but there's no more information that we can 
>> extract.
>> Given the /proc/cpuinfo above, this patch will rewrite -mcpu=native 
>> into -mcpu=cortex-a53+fp+simd+crypto+crc
>>
>> The newer /proc/cpuinfo format proposed at 
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commi
>> t/?id=44b82b7700d05a52cd983799d3ecde1a976b3bed
>> looks like this:
>>
>> --
>> processor   : 0
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant : 0x0
>> CPU part: 0xd03
>> CPU revision: 0
>>
>> processor   : 1
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant : 0x0
>> CPU part: 0xd03
>> CPU revision: 0
>>
>> processor   : 2
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant : 0x0
>> CPU part: 0xd03
>> CPU revision: 0
>>
>> processor   : 3
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant : 0x0
>> CPU part: 0xd03
>> CPU revision: 0
>>
>> processor   : 4
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant : 0x0
>> CPU part: 0xd07
>> CPU revision: 0
>>
>> processor   : 5
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant : 0x0
>> CPU part: 0xd07
>> CPU revision: 0
>> --
>>
>> The Features field is used to detect the architectural features that 
>> we map to GCC option extensions i.e. +fp,+crypto,+simd,+crc etc.
>>
>> Similarly, -ma