[PATCH] [doc] Update plugin doc

2014-01-08 Thread Joey Ye
Update plugin document after switching to C++, also make it more friendly to
cross-build.

ChangeLog:
2014-01-08  Joey Ye  joey...@arm.com
doc/plugin.texi (Building GCC plugins): Update to C++.

OK to trunk?

diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi
index fc2d754..e668de6 100644
--- a/gcc/doc/plugins.texi
+++ b/gcc/doc/plugins.texi
@@ -465,18 +465,18 @@ integer numbers, so a plugin could ensure it is built
for GCC 4.7 with
 The following GNU Makefile excerpt shows how to build a simple plugin:
 
 @smallexample
-GCC=gcc
-PLUGIN_SOURCE_FILES= plugin1.c plugin2.c
-PLUGIN_OBJECT_FILES= $(patsubst %.c,%.o,$(PLUGIN_SOURCE_FILES))
-GCCPLUGINS_DIR:= $(shell $(GCC) -print-file-name=plugin)
-CFLAGS+= -I$(GCCPLUGINS_DIR)/include -fPIC -O2
-
-plugin.so: $(PLUGIN_OBJECT_FILES)
-   $(GCC) -shared $^ -o $@@
+HOST_GCC=g++
+TARGET_GCC=gcc
+PLUGIN_SOURCE_FILES= plugin1.c plugin2.cc
+GCCPLUGINS_DIR:= $(shell $(TARGET_GCC) -print-file-name=plugin)
+CXXFLAGS+= -I$(GCCPLUGINS_DIR)/include -fPIC -fno-rtti -O2
+
+plugin.so: $(PLUGIN_SOURCE_FILES)
+   $(HOST_GCC) -shared $(CXXFLAGS) $^ -o $@@
 @end smallexample
 
-A single source file plugin may be built with @code{gcc -I`gcc
--print-file-name=plugin`/include -fPIC -shared -O2 plugin.c -o
+A single source file plugin may be built with @code{g++ -I`gcc
+-print-file-name=plugin`/include -fPIC -shared -fno-rtti -O2 plugin.c -o
 plugin.so}, using backquote shell syntax to query the @file{plugin}
 directory.








Re: [RFA][PATCH][middle-end/53623] Improve extension elimination

2014-01-08 Thread Eric Botcazou
 Committed after private email approval from Jakub.  I made one
 additional trivial change (missing whitespace in a comment).

This breaks bootstrap with RTL checking enabled:

/home/eric/svn/gcc/libgcc/config/libbid/bid64_noncomp.c:119:1: internal 
compiler error: RTL check: expected code 'set' or 'clobber', have 'parallel' 
in combine_reaching_defs, at ree.c:711
 }
 ^
0x9c5fcf rtl_check_failed_code2(rtx_def const*, rtx_code, rtx_code, char 
const*, int, char const*)
/home/eric/svn/gcc/gcc/rtl.c:783
0x14626da combine_reaching_defs
/home/eric/svn/gcc/gcc/ree.c:711
0x1464ae9 find_and_remove_re
/home/eric/svn/gcc/gcc/ree.c:957
0x1464ae9 rest_of_handle_ree
/home/eric/svn/gcc/gcc/ree.c:1019
0x1464ae9 execute
/home/eric/svn/gcc/gcc/ree.c:1058
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
make[3]: *** [bid64_noncomp.o] Error 1
make[3]: *** Waiting for unfinished jobs

-- 
Eric Botcazou


Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.

2014-01-08 Thread Yangfei (Felix)
Hi Bernd,

The patch is OK to me. But do we need reorder_loops for the c6x backend ? 
I mean we can set the do_reorder parameter to FALSE to save compile 
time, since c6x backend only choose hw-doloops whose body contains only one 
basic block.

Cheers,
Felix

 
 On 01/05/2014 05:10 PM, Teresa Johnson wrote:
  On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt ber...@codesourcery.com
 wrote:
  I have a different patch which I'll submit next week after some more
  testing. The assert in cfgrtl is unnecessarily broad and really only
  needs to trigger if -freorder-blocks-and-partition; there's nothing
  wrong with entering cfglayout after normal bb-reorder.
 
  Currently -freorder-blocks-and-partition is the default for x86. I
  assume that hw-doloop is not enabled for any i386 targets, which is
  why we haven't seen this?
 
 Precisely.
 
  And will this mean that -freorder-blocks-and-partition cannot be used
  for the targets that use hw-doloop? If so, should
  -freorder-blocks-and-partition be prevented with a warning for those
  targets?
 
 If someone explicitly chooses that option we can turn off the reordering in
 hw-doloop. That should happen sufficiently rarely that it isn't a problem. 
 That's
 what the patch below does - bootstraped on x86_64-linux, tested there and
 with bfin-elf. Ok?
 
  I've also tested that Blackfin still benefits from the hw-doloop
  reordering code and generates more hardware loops if it's enabled. So
  we want to be able to run it at -O2.
 
  I looked at hw-doloop briefly and since it seems to be doing some
  manual bb reordering I guess it can't simply be moved before bbro. It
  seems like a better long-term solution would be to make bbro
  hw-doloop-aware as Felix suggested earlier.
 
 Maybe. It could be argued that the code in hw-doloop is relevant only for a
 small class of targets so it should only be enabled for them. In any case, 
 that's
 not stage 3 material and two ports are broken...
 
 
 Bernd



Re: [PATCH] Fix PR59471

2014-01-08 Thread Richard Biener
On Tue, 7 Jan 2014, Jakub Jelinek wrote:

 On Tue, Jan 07, 2014 at 04:12:57PM +0100, Richard Biener wrote:
   What about if something post gimplification creates VCE(BFR(VCE())) or
   similar and tries to force_gimple_operand_gsi or similar, then without
   making the above invalid in the predicates we'd still not try to gimplify 
   it
   at all (because it would pass the predicate), and then hit the 
   verification
   ICE.
  
  I don't think it passes any predicate, certainly not is_gimple_val,
  so we enter gimplification anyway.  Or am I missing something?
 
 It isn't is_gimple_val, sure, I was thinking about whatever predicate we
 have for say the RHS of a load, is that is_gimple_addressable?  There is
 is_gimple_lvalue too.  Apparently we are calling force_gimple_operand_1*
 with just is_gimple_condexpr if it is not is_gimple_val or
 is_gimple_reg_rhs, so perhaps we are fine.

I think it's fine.  We do

tree
force_gimple_operand_1 (tree expr, gimple_seq *stmts,
gimple_predicate gimple_test_f, tree var)
{
...
  /* gimple_test_f might be more strict than is_gimple_val, make
 sure we pass both.  Just checking gimple_test_f doesn't work
 because most gimple predicates do not work recursively.  */
  if (is_gimple_val (expr)
   (*gimple_test_f) (expr))
return expr;

which of course is kind of pointless, but the gimplifier predicates
are designed to only work post-gimplification, not pre-gimplification
(if you test for the predicate at the start of gimplify_expr you'll
see lots of failures).

I have now committed the patch.

Richard.


[PATCH] Fix PR49718 : allow no_instrument_function attribute in class member definition/declaration

2014-01-08 Thread Laurent Alfonsi

All,

I was looking at PR49718. I have enclosed a simple fix for this bug report.

2014-01-07  Laurent Alfonsi laurent.alfo...@st.com

* c-family/c-common.c (handle_no_instrument_function_attribute): Allow
  no_instrument_function attribute in class member 
definition/declaration.



Looking at the implementation of the function attributes, I see no 
reason anymore to keep this error message.

Let me know if I missed something.
I have also added a testcase in the enclosed patch.

2014-01-07  Laurent Alfonsi laurent.alfo...@st.com

PR c++/49718
* g++.dg/pr49718.C: New


gcc/g++/libstdc++ testsuites are ok on x86-64. Ok for trunk ?

Regards,
Laurent

From 141d2bcfeab5e0635c7f4e362387fd5b1b9494e6 Mon Sep 17 00:00:00 2001
From: Laurent ALFONSI laurent.alfo...@st.com
Date: Tue, 7 Jan 2014 16:26:04 +0100
Subject: [PATCH] Fix PR49718 : allow no_instrument_function attribute in class
 member definition/declaration

---
 gcc/c-family/c-common.c|  6 --
 gcc/testsuite/g++.dg/pr49718.C | 41 +
 2 files changed, 41 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr49718.C

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 8ecb70c..17fcb0d 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -7929,12 +7929,6 @@ handle_no_instrument_function_attribute (tree *node, tree name,
 		%qE attribute applies only to functions, name);
   *no_add_attrs = true;
 }
-  else if (DECL_INITIAL (decl))
-{
-  error_at (DECL_SOURCE_LOCATION (decl),
-		can%'t set %qE attribute after definition, name);
-  *no_add_attrs = true;
-}
   else
 DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl) = 1;
 
diff --git a/gcc/testsuite/g++.dg/pr49718.C b/gcc/testsuite/g++.dg/pr49718.C
new file mode 100644
index 000..07cac8c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr49718.C
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -finstrument-functions } */
+/* { dg-final { scan-assembler-times __cyg_profile_func_enter 1} } */
+
+#define NOINSTR __attribute__((no_instrument_function))
+
+struct t
+{
+   public:
+   /* Function code should be instrumented */
+   __attribute__((noinline)) t() {}
+   
+   /* Function t::a() should not be instrumented */
+   NOINSTR void a(){
+   }
+   /* Function t::b() should not be instrumented */
+   void NOINSTR b(){
+   }
+   /* Function t::c() should not be instrumented */
+   void c() NOINSTR {
+   }
+   /* Function t::d() should not be instrumented */
+   void d() NOINSTR;
+};
+
+void t::d()
+{
+}
+
+/* Function call_all_functions() should not be instrumented */
+struct t call_all_functions() __attribute__((no_instrument_function));
+struct t call_all_functions() 
+{
+   struct t a; /* Constructor not inlined */
+   a.a();	   /* Inlined t::a() should not be instrumented */
+   a.b();	   /* Inlined t::b() should not be instrumented */
+   a.c();	   /* Inlined t::c() should not be instrumented */
+   a.d();	   /* Inlined t::d() should not be instrumented */
+   return a;
+}
+
-- 
1.8.4.1



Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Jonathan Wakely
On 7 January 2014 19:36, Tim Shen wrote:

 I didn't noticed that's so time consuming. I think reducing the
 compile time is possible (by templating several member functions
 instead of whole _Compiler class).

Ouch! Yes, that's quite a bit slower, and this code is already very
slow to compile.

I haven't looked at the code recently, but another option that
sometimes helps is to have a base class that implements the common
functionality and then derive four classes from it, but minimise the
amount of code in the derived classes.

Thanks for the patches!


[PATCH] Fix get_mode_bounds for BImode (PR rtl-optimization/59649)

2014-01-08 Thread Jakub Jelinek
Hi!

The recent change to get_mode_bounds for partial mode, where
GET_MODE_PRECISION instead of GET_MODE_SIZE is now used, has broken
ia64 bootstrap.  The problem is that BImode is special cased in various
places, e.g. trunc_int_for_mode, so the two values of the mode are
0 and STORE_FLAG_VALUE (which is sometimes -1, sometimes (ia64 case) 1).

Now, two of the 3 get_mode_bounds callers use the same mode == target_mode
and when called with BImode, true, BImode, ... min_val is -1 and max_val is 0, 
but given the weirdo trunc_int_for_mode behavior which returns
STORE_FLAG_VALUE for value with low bit set and 0 otherwise, get_mode_bounds
actually returns min_rtx (const_int 1) and max_rtx (const_int 0).
This confuses the callers (in this case simplify-rtx.c) which then compares
the trueop1 value against the bounds to miscompile the code.

This patch fixes this by special casing BImode, so that we get the bounds
in the right order for BImode, ?, BImode and even for the case where
target_mode is wider ignores sign and returns 0, STORE_FLAG_VALUE or vice
versa in the right order.

Eric has kindly tested this on ia64.  Ok for trunk?

2014-01-08  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/59649
* stor-layout.c (get_mode_bounds): For BImode return
0 and STORE_FLAG_VALUE.

--- gcc/stor-layout.c.jj2014-01-03 11:40:57.0 +0100
+++ gcc/stor-layout.c   2014-01-07 18:59:39.056846684 +0100
@@ -2821,7 +2821,21 @@ get_mode_bounds (enum machine_mode mode,
 
   gcc_assert (size = HOST_BITS_PER_WIDE_INT);
 
-  if (sign)
+  /* Special case BImode, which has values 0 and STORE_FLAG_VALUE.  */
+  if (mode == BImode)
+{
+  if (STORE_FLAG_VALUE  0)
+   {
+ min_val = STORE_FLAG_VALUE;
+ max_val = 0;
+   }
+  else
+   {
+ min_val = 0;
+ max_val = STORE_FLAG_VALUE;
+   }
+}
+  else if (sign)
 {
   min_val = -((unsigned HOST_WIDE_INT) 1  (size - 1));
   max_val = ((unsigned HOST_WIDE_INT) 1  (size - 1)) - 1;

Jakub


Re: [PATCH] [doc] Update plugin doc

2014-01-08 Thread Gerald Pfeifer
Joey Ye joey...@arm.com wrote:
ChangeLog:
2014-01-08  Joey Ye  joey...@arm.com
doc/plugin.texi (Building GCC plugins): Update to C++.

OK to trunk?

Okay unless anyone raises concrete issues in the next couple of days (or 
approves directly, of course).

Thanks,
Gerald



Re: [PATCH] Fix get_mode_bounds for BImode (PR rtl-optimization/59649)

2014-01-08 Thread Richard Biener
On Wed, 8 Jan 2014, Jakub Jelinek wrote:

 Hi!
 
 The recent change to get_mode_bounds for partial mode, where
 GET_MODE_PRECISION instead of GET_MODE_SIZE is now used, has broken
 ia64 bootstrap.  The problem is that BImode is special cased in various
 places, e.g. trunc_int_for_mode, so the two values of the mode are
 0 and STORE_FLAG_VALUE (which is sometimes -1, sometimes (ia64 case) 1).
 
 Now, two of the 3 get_mode_bounds callers use the same mode == target_mode
 and when called with BImode, true, BImode, ... min_val is -1 and max_val is 
 0, 
 but given the weirdo trunc_int_for_mode behavior which returns
 STORE_FLAG_VALUE for value with low bit set and 0 otherwise, get_mode_bounds
 actually returns min_rtx (const_int 1) and max_rtx (const_int 0).
 This confuses the callers (in this case simplify-rtx.c) which then compares
 the trueop1 value against the bounds to miscompile the code.
 
 This patch fixes this by special casing BImode, so that we get the bounds
 in the right order for BImode, ?, BImode and even for the case where
 target_mode is wider ignores sign and returns 0, STORE_FLAG_VALUE or vice
 versa in the right order.
 
 Eric has kindly tested this on ia64.  Ok for trunk?

Ok.

Thanks,
Richard.

 2014-01-08  Jakub Jelinek  ja...@redhat.com
 
   PR rtl-optimization/59649
   * stor-layout.c (get_mode_bounds): For BImode return
   0 and STORE_FLAG_VALUE.
 
 --- gcc/stor-layout.c.jj  2014-01-03 11:40:57.0 +0100
 +++ gcc/stor-layout.c 2014-01-07 18:59:39.056846684 +0100
 @@ -2821,7 +2821,21 @@ get_mode_bounds (enum machine_mode mode,
  
gcc_assert (size = HOST_BITS_PER_WIDE_INT);
  
 -  if (sign)
 +  /* Special case BImode, which has values 0 and STORE_FLAG_VALUE.  */
 +  if (mode == BImode)
 +{
 +  if (STORE_FLAG_VALUE  0)
 + {
 +   min_val = STORE_FLAG_VALUE;
 +   max_val = 0;
 + }
 +  else
 + {
 +   min_val = 0;
 +   max_val = STORE_FLAG_VALUE;
 + }
 +}
 +  else if (sign)
  {
min_val = -((unsigned HOST_WIDE_INT) 1  (size - 1));
max_val = ((unsigned HOST_WIDE_INT) 1  (size - 1)) - 1;
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: std::vector move assign patch

2014-01-08 Thread Jonathan Wakely
On 27 December 2013 18:27, François Dumont wrote:
 Hi

 Here is a patch to fix an issue in normal mode during the move
 assignment. The destination vector allocator instance is moved too during
 the assignment which is wrong.

Thanks for your patience, the normal-mode fix is definitely correct,
and I've finished reviewing the other parts and they look good too.

 As I discover this problem while working on issues with management of
 safe iterators during move operations this patch also fix those issues in
 the debug mode for the vector container. Fixes for other containers in debug
 mode will come later.

OK, great.

In the new test you have:

+  VERIFY( it == v1.begin() ); // Error, it singular

Please change this to Error, it is singular

 2013-12-27  François Dumont fdum...@gcc.gnu.org

 * include/bits/stl_vector.h (std::vector::_M_move_assign): Pass
 *this allocator instance when building temporary vector instance
 so that *this allocator do not get moved.

Please change this to does not get moved


 * include/debug/safe_base.h
 (_Safe_sequence_base(_Safe_sequence_base)): New.
 * include/debug/vector (__gnu_debug::vector(vector)): Use
 latter.

I don't think latter is clear here, please say something like Use
new move constructor for base class or ... for _Safe_sequence_base.

This is OK for trunk, thanks very much.

We might also want to fix just the normal-mode part on the 4.8 branch,
I'll think about that.


Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Paolo Carlini

Hi,

On 01/08/2014 10:24 AM, Jonathan Wakely wrote:

On 7 January 2014 19:36, Tim Shen wrote:

I didn't noticed that's so time consuming. I think reducing the
compile time is possible (by templating several member functions
instead of whole _Compiler class).

Ouch! Yes, that's quite a bit slower, and this code is already very
slow to compile.

I haven't looked at the code recently, but another option that
sometimes helps is to have a base class that implements the common
functionality and then derive four classes from it, but minimise the
amount of code in the derived classes.
I only want to add that, besides keeping compile-time under control for 
4.9.0 - please investigate a bit more along the mentioned lines - we 
should also start experimenting with exporting the instantiations. I 
don't know what the other implementations are doing, but in general it 
definitely makes sense, for compile-time performance too. I think we 
already said that some time ago, but the issue seems more important now. 
Maybe it's really unavoidable if we need template complexity for first 
class run-time performance.


Paolo.


[patch] [plugin] Fix PR 59335 plugin build

2014-01-08 Thread Joey Ye
Fix trunk plugin build by adding missing headers and remove headers no
longer exist.

Test passed:
- arm-none-eabi build --enable-plugins
- build test plugin 
- x86_64 bootstrap --enable-plugins

OK to trunk?

ChangeLog.gcc
2013-11-19  Joey Ye  joey...@arm.com

PR plugin/59335
* Makefile.in (tree-cfg.h, tree-into-ssa.h, fold-const.h,
gimple-ssa.h,
gimple-iterator.h, varasm.h, context.h): Add missing headers for
plugin.
(tree-flow.h, tree-flow-inline.h): Remove as they no longer exist.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 459b1ba..55f1ace 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -882,13 +882,14 @@ TREE_CORE_H = tree-core.h coretypes.h all-tree.def
tree.def \
$(VEC_H) treestruct.def $(HASHTAB_H) \
double-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \
$(REAL_H) $(FIXED_VALUE_H)
-TREE_H = tree.h $(TREE_CORE_H)  tree-check.h
+TREE_H = tree.h $(TREE_CORE_H)  tree-check.h tree-cfg.h tree-into-ssa.h
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \
cfg-flags.def cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
$(GGC_H) $(BASIC_BLOCK_H) $(TREE_H) tree-ssa-operands.h \
-   tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h
+   tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h \
+   gimple-ssa.h gimple-iterator.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 RECOG_H = recog.h
 EMIT_RTL_H = emit-rtl.h
@@ -929,7 +930,7 @@ CPP_ID_DATA_H = $(CPPLIB_H)
$(srcdir)/../libcpp/include/cpp-id-data.h
 CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h $(CPP_ID_DATA_H)
 TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
 TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
-TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \
+TREE_FLOW_H = tree-ssa-operands.h \
$(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
$(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \
tree-ssa-alias.h
@@ -3119,7 +3120,7 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H)
coretypes.h $(TM_H) \
   cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h
\
   $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \
   $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \
-  version.h stringpool.h
+  version.h stringpool.h varasm.h fold-const.h $(CONTEXT_H)
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefilediff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 459b1ba..55f1ace 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -882,13 +882,14 @@ TREE_CORE_H = tree-core.h coretypes.h all-tree.def 
tree.def \
$(VEC_H) treestruct.def $(HASHTAB_H) \
double-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \
$(REAL_H) $(FIXED_VALUE_H)
-TREE_H = tree.h $(TREE_CORE_H)  tree-check.h
+TREE_H = tree.h $(TREE_CORE_H)  tree-check.h tree-cfg.h tree-into-ssa.h
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \
cfg-flags.def cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
$(GGC_H) $(BASIC_BLOCK_H) $(TREE_H) tree-ssa-operands.h \
-   tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h
+   tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h \
+   gimple-ssa.h gimple-iterator.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 RECOG_H = recog.h
 EMIT_RTL_H = emit-rtl.h
@@ -929,7 +930,7 @@ CPP_ID_DATA_H = $(CPPLIB_H) 
$(srcdir)/../libcpp/include/cpp-id-data.h
 CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h $(CPP_ID_DATA_H)
 TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
 TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
-TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \
+TREE_FLOW_H = tree-ssa-operands.h \
$(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
$(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \
tree-ssa-alias.h
@@ -3119,7 +3120,7 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) 
coretypes.h $(TM_H) \
   cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h \
   $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \
   $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \
-  version.h stringpool.h
+  version.h stringpool.h varasm.h fold-const.h $(CONTEXT_H)
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefile


Re: reload autoinc fix

2014-01-08 Thread Richard Earnshaw
On 07/01/14 21:06, Andrew Pinski wrote:
 On Tue, Jan 7, 2014 at 12:55 PM, Jeff Law l...@redhat.com wrote:
 On 01/07/14 09:16, Bernd Schmidt wrote:

 This is PR56791. The address inside of an autoinc is reloaded, and the
 autoinc is reloaded, but the reload insns are emitted in the wrong order.

 As far as I can tell, this is because find_reloads_address_1 has two
 methods of pushing a reload for an autoinc, one of them using the
 previously identified type, and the other (better one) using
 RELOAD_OTHER. If we previously reloaded an inner part of the address,
 the use of RELOAD_OTHER is mismatched and leads to the wrong order of
 insns.

 This patch just remembers if we've pushed a reload, and forces the
 optimization to be skipped in that case. Bootstrapped and tested on
 x86_64-linux (with lra_p disabled but still somewhat pointlessly); John
 Anglin said in the PR that it tests ok on PA. Will commit in a few days
 if no objections.

 No objections to the substance of the patch, though I think the comment
 could be clearer.
 
 Though my question is what target does this matter since ARM has moved
 away from reload and other targets should do the same?
 

There's still the chance we will have to move back for this release when
building Thumb1.  Only if we can iron out enough of the bugs/size
regressions will we stick with LRA for that permutation.

R.




Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2014-01-08 Thread Richard Biener
On Wed, 8 Jan 2014, Kugan wrote:

 
 On 07/01/14 23:23, Richard Biener wrote:
  On Tue, 7 Jan 2014, Kugan wrote:
 
 [snip]
 
 
  Note that VIEW_CONVERT_EXPR is wrong here.  I think you are
  handling this wrong still.  From a quick look you want to avoid
  the actual promotion for
  
reg_1 = 
  
  when reg_1 is promoted and thus the target is (subreg:XX N).
  The RHS has been expanded in XXmode.  Dependent on the value-range
  of reg_1 you want to set N to a paradoxical subreg of the expanded
  result.  You can always do that if the reg is zero-extended
  and else if the MSB is not set for any of the values of reg_1.
 
 Thanks Richard for the explanation. I just want to double confirm I
 understand you correctly before I attempt to fix it. So let me try this
 for the following example,
 
 for a gimple stmt of the following from:
 unsigned short _5;
 short int _6;
 _6 = (short int)_5;
 
 ;; _6 = (short int) _5;
 target = (subreg/s/u:HI (reg:SI 110 [ D.4144 ]) 0)
 temp = (subreg:HI (reg:SI 118) 0)
 
 So, I must generate the following if it satisfies the other conditions.
 (set (reg:SI 110 [ D.4144 ]) (subreg:SI temp ))
 
 Is my understanding correct?

I'm no RTL expert in this particular area but yes, I think so.  Not
sure what paradoxical subregs are valid, so somebody else should
comment here.  You could even generate

  (set (reg:SI 110) (reg:SI 118))

iff temp is a SUBREG of a promoted var, as you require that for the
destination as well.

 
  I don't see how is_assigned_exp_fit_type reflects this in any way.
 
 
 
 What I tried doing with the patch is:
 
 (insn 13 12 0 (set (reg:SI 110 [ D.4144 ])
 (zero_extend:SI (subreg:HI (reg:SI 118) 0))) c5.c:8 -1
  (nil))
 
 If the values in register (reg:SI 118) fits HI mode (without
 overflowing), I assume that it is not necessary to just drop the higher
 bits and zero_extend as done above and generate the following instead.
 
 (insn 13 12 0 (set (reg:SI 110 [ D.4144 ])
 (((reg:SI 118) 0))) c5.c:8 -1
  (nil))
 
 is_assigned_exp_fit_type just checks if the range fits (in the above
 case, the value in eg:SI 118 fits HI mode) and the checks before
 emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp)); checks the
 modes match.
 
 Is this wrong  or am I missing the whole point?

is_assigned_exp_fit_type is weird - it looks at the value-range of _5,
but as you want to elide the extension from _6 to SImode you want
to look at the value-range from _5.  So, breaking it down and applying
the promotion to GIMPLE it would look like

   unsigned short _5;
   short int _6;
   _6 = (short int)_5;
   _6_7 = (int) _6;

where you want to remove the last line representing the
assignment to (subreg:HI (reg:SI 110)).  Whether you can
do that depends on the value-range of _6, not on the
value-range of _5.  It's also completely independent
on the operation performed on the RHS.

Well.  As far as I understand at least.

Richard.


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)

2014-01-08 Thread Richard Biener
On Tue, 7 Jan 2014, Jakub Jelinek wrote:

 Hi!
 
 On Fri, Jan 03, 2014 at 11:33:50AM +0100, Jakub Jelinek wrote:
  On Fri, Jan 03, 2014 at 11:24:53AM +0100, Richard Biener wrote:
 
 Anyway, back to the original patch, so do you prefer something like
 this instead?  I.e. handle only __builtin_unreachable and
 __cxa_pure_virtual specially, and not devirt for fold_stmt_inplace?
 
 2014-01-07  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/59622
   * gimple-fold.c (gimple_fold_call): Fix a typo in message.  Handle
   __cxa_pure_virtual similarly to __builtin_unreachable.  Don't
   devirtualize for inplace at all.
 
   * g++.dg/opt/pr59622-2.C: New test.
 
 --- gcc/gimple-fold.c.jj  2014-01-03 11:40:57.247320424 +0100
 +++ gcc/gimple-fold.c 2014-01-07 18:15:00.352601812 +0100
 @@ -1167,7 +1167,7 @@ gimple_fold_call (gimple_stmt_iterator *
   (OBJ_TYPE_REF_EXPR 
 (callee)
   {
 fprintf (dump_file,
 -Type inheritnace inconsistent devirtualization of );
 +Type inheritance inconsistent devirtualization of );
 print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 fprintf (dump_file,  to );
 print_generic_expr (dump_file, callee, TDF_SLIM);
 @@ -1177,26 +1177,35 @@ gimple_fold_call (gimple_stmt_iterator *
 gimple_call_set_fn (stmt, OBJ_TYPE_REF_EXPR (callee));
 changed = true;
   }
 -  else if (flag_devirtualize  virtual_method_call_p (callee))
 +  else if (flag_devirtualize  !inplace  virtual_method_call_p 
 (callee))
   {
 bool final;
 vec cgraph_node *targets
   = possible_polymorphic_call_targets (callee, final);
 if (final  targets.length () = 1)
   {
 +   tree fndecl;
 if (targets.length () == 1)
 + fndecl = targets[0]-decl;
 +   else
 + fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
 +
 +   /* If fndecl (like __builtin_unreachable or
 +  __cxa_pure_virtual) takes no arguments, doesn't have
 +  return value and is noreturn, just add the call before
 +  stmt and DCE will do it's job later on.  */
 +   if (TREE_THIS_VOLATILE (fndecl)
 +VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
 +TYPE_ARG_TYPES (TREE_TYPE (fndecl)) == void_list_node)
   {
 -   gimple_call_set_fndecl (stmt, targets[0]-decl);
 -   changed = true;
 - }
 -   else if (!inplace)
 - {
 -   tree fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
 gimple new_stmt = gimple_build_call (fndecl, 0);
 gimple_set_location (new_stmt, gimple_location (stmt));
 gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
 return true;
   }
 +
 +   gimple_call_set_fndecl (stmt, fndecl);

I prefer to always do this, not do the fancy insertion-before.  That
would do repeated folding for

   fold_stmt (gsi);
   fold_stmt (gsi);
   fold_stmt (gsi);

where the last two should be a no-op.

Richard.


 +   changed = true;
   }
   }
  }
 --- gcc/testsuite/g++.dg/opt/pr59622-2.C.jj   2014-01-07 18:10:45.435904909 
 +0100
 +++ gcc/testsuite/g++.dg/opt/pr59622-2.C  2014-01-07 18:10:45.435904909 
 +0100
 @@ -0,0 +1,21 @@
 +// PR tree-optimization/59622
 +// { dg-do compile }
 +// { dg-options -O2 }
 +
 +namespace
 +{
 +  struct A
 +  {
 +A () {}
 +virtual A *bar (int) = 0;
 +A *baz (int x) { return bar (x); }
 +  };
 +}
 +
 +A *a;
 +
 +void
 +foo ()
 +{
 +  a-baz (0);
 +}
 
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [Patch AArch64] Implement Vector Permute Support

2014-01-08 Thread James Greenhalgh
On Wed, Jan 08, 2014 at 12:10:13AM +, Andrew Pinski wrote:
 On Tue, Jan 7, 2014 at 4:05 PM, Marcus Shawcroft
 marcus.shawcr...@arm.com wrote:
 
  Andrew, We know that there are numerous issues with aarch64 BE advsimd 
  support in GCC.  The aarch64_be support is very much a work in progress.  
  Tejas sorted out a number of fundamentals with a series of patches in 
  November, notably in PCS conformance.  There is more to come.  However, 
  aarch64_be-* support in gcc 4.9 is not going to match the level of quality 
  for the aarch64-* port.
 
 
 Yes but should not introduce an ICE while GCC is in stage3.  This was
 working before due not having a vec_perm before.  I am going to
 request this to be reverted soon if it is not fixed (the GCC rules are
 clear here).

Hi Andrew,

I am confused, are you also proposing to revert this patch on 4.8
branch? The code has been sitting with that assert in place on trunk
for well over a year (note that December 2012 was during 4.8's
stage 3, not 4.9) there is no regression here.

But, that doesn't absolve me of the fact that this is broken in
a stupid way for big-endian AArch64.

The band-aid, which I can prepare, would be to turn off
vec_perm for BYTES_BIG_ENDIAN targets on the 4.9 and
4.8 branches. This is the most sensible thing to do in the short
term. Naturally, you will lose vectorization of permute operations,
but at least you won't get the ICE or wrong code generation. This
is what the ARM back-end (from which I ported the vec_perm code)
does.

In the longer term you would want to audit the lane-numbering
discrepancies between GCC and our architectural lane-numbers.
We are some way towards that after Tejas' PCS conformance fix,
but as Marcus has said, there is more to come. I should imagine
that in this case you will need to provide a run-time transformation
between the permute mask and an appropriate mask for tbl.

To reiterate, this does not need reverted, we'll get a fix out
disabling vec_perm for BYTES_BIG_ENDIAN on 4.8 branch and 4.9.

Thanks,
James



Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Paolo Carlini

On 01/07/2014 08:36 PM, Tim Shen wrote:

On Tue, Jan 7, 2014 at 4:02 AM, Paolo Carlini paolo.carl...@oracle.com wrote:

Ideally, I would suggest committing first the improvements in your previous
patch (by the way, thanks for the numbers!) + the pure bug fixes and
separate the further performance improvements which have compile-time
performance implications (how big?), see if, eg, Jon has something to
recommend. Can we do that?

First patch committed. I later found that the second patch b.diff is
based on the committed version (the attach, which fixed the 
problem);
Not sure I'm following all the past and present tenses ;) but in my old 
message I proposed to commit now the *correctness* fixes too, which, I 
suppose, are fixes which don't have compile-time performance implications.


Paolo.


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 11:45:28AM +0100, Richard Biener wrote:
 I prefer to always do this, not do the fancy insertion-before.  That
 would do repeated folding for
 
fold_stmt (gsi);
fold_stmt (gsi);
fold_stmt (gsi);
 
 where the last two should be a no-op.

I don't see how is that possible, at least for the __builtin_unreachable
case, because by just setting the fndecl to __builtin_unreachable and
keeping the incompatible fntype and bogus arguments for it all the
predicates whether it is a valid/suitable builtin call will fail and we
don't have a __builtin_unreachable function you could call.
So at least for builtin we want to make sure it has the right parameters.
If the lhs is something we can just initialize to zero, we can replace the
call with zeroing the lhs, but that is no the case always.

For __cxa_pure_virtual we could just keep the code as is (just with the
!inplace addition and spelling fix?), but would need to fix up whatever ICEs
during checking on it to honor fntype rather than decl's type.

Jakub


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)

2014-01-08 Thread Richard Biener
On Wed, 8 Jan 2014, Jakub Jelinek wrote:

 On Wed, Jan 08, 2014 at 11:45:28AM +0100, Richard Biener wrote:
  I prefer to always do this, not do the fancy insertion-before.  That
  would do repeated folding for
  
 fold_stmt (gsi);
 fold_stmt (gsi);
 fold_stmt (gsi);
  
  where the last two should be a no-op.
 
 I don't see how is that possible, at least for the __builtin_unreachable
 case, because by just setting the fndecl to __builtin_unreachable and
 keeping the incompatible fntype and bogus arguments for it all the
 predicates whether it is a valid/suitable builtin call will fail and we
 don't have a __builtin_unreachable function you could call.

Well, that just means we need two sets of predicates to check for
a builtin call.  The __builtin_unreachable code wants to know what
the callee is, not if that's a valid call to it.  But yeah - this
starts to get confusing :/

 So at least for builtin we want to make sure it has the right parameters.
 If the lhs is something we can just initialize to zero, we can replace the
 call with zeroing the lhs, but that is no the case always.

I start to think this is a too complex transform for stmt folding ...

 For __cxa_pure_virtual we could just keep the code as is (just with the
 !inplace addition and spelling fix?), but would need to fix up whatever ICEs
 during checking on it to honor fntype rather than decl's type.

Yes.

So a patch just keeping the targets.length () == 1 case in folding
with just replacing the fndecl of the call is ok.

Thanks,
Richard.


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)

2014-01-08 Thread Richard Biener
On Wed, 8 Jan 2014, Richard Biener wrote:

 On Wed, 8 Jan 2014, Jakub Jelinek wrote:
 
  On Wed, Jan 08, 2014 at 11:45:28AM +0100, Richard Biener wrote:
   I prefer to always do this, not do the fancy insertion-before.  That
   would do repeated folding for
   
  fold_stmt (gsi);
  fold_stmt (gsi);
  fold_stmt (gsi);
   
   where the last two should be a no-op.
  
  I don't see how is that possible, at least for the __builtin_unreachable
  case, because by just setting the fndecl to __builtin_unreachable and
  keeping the incompatible fntype and bogus arguments for it all the
  predicates whether it is a valid/suitable builtin call will fail and we
  don't have a __builtin_unreachable function you could call.
 
 Well, that just means we need two sets of predicates to check for
 a builtin call.  The __builtin_unreachable code wants to know what
 the callee is, not if that's a valid call to it.  But yeah - this
 starts to get confusing :/
 
  So at least for builtin we want to make sure it has the right parameters.
  If the lhs is something we can just initialize to zero, we can replace the
  call with zeroing the lhs, but that is no the case always.
 
 I start to think this is a too complex transform for stmt folding ...

Alternatively do update_call_from_tree (gsi, get_or_create_ssa_default_def 
(cfun, create_tmp_var (TREE_TYPE (lhs.

  For __cxa_pure_virtual we could just keep the code as is (just with the
  !inplace addition and spelling fix?), but would need to fix up whatever ICEs
  during checking on it to honor fntype rather than decl's type.
 
 Yes.
 
 So a patch just keeping the targets.length () == 1 case in folding
 with just replacing the fndecl of the call is ok.
 
 Thanks,
 Richard.
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 12:15:40PM +0100, Richard Biener wrote:
  I start to think this is a too complex transform for stmt folding ...
 
 Alternatively do update_call_from_tree (gsi, get_or_create_ssa_default_def 
 (cfun, create_tmp_var (TREE_TYPE (lhs.

The lhs might not be is_gimple_reg_type though.  What to do in that case?

Jakub


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)

2014-01-08 Thread Richard Biener
On Wed, 8 Jan 2014, Jakub Jelinek wrote:

 On Wed, Jan 08, 2014 at 12:15:40PM +0100, Richard Biener wrote:
   I start to think this is a too complex transform for stmt folding ...
  
  Alternatively do update_call_from_tree (gsi, get_or_create_ssa_default_def 
  (cfun, create_tmp_var (TREE_TYPE (lhs.
 
 The lhs might not be is_gimple_reg_type though.  What to do in that case?

In that case you can remove the stmt.

Richard.


Re: [PING] [REPOST] Invalid Code when reading from unaligned zero-sized array

2014-01-08 Thread Richard Biener
On Tue, Jan 7, 2014 at 5:31 PM, Bernd Edlinger
bernd.edlin...@hotmail.de wrote:
 Hello,

 Ping...

 We still need a decision how to fix this.

 There are two alternative patches:

 1. My latest proposal: http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01675.html

 2. Eric's latest proposal: 
 http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01667.html

Let's go with 1., your patch adjusting how we recurse in expand.  That
seems safer to eventually backport.

For 4.10 we should re-visit this and fix all the backends for those ABI
issues with modes ...

Richard.


 Thanks
 Bernd.


Re: [PATCH, 4.8, PR 59610] More optimize guards in ipa-prop.c

2014-01-08 Thread Richard Biener
On Tue, Jan 7, 2014 at 7:27 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 I forgot to put the optimize test to the ipa_compute_jump_functions
 when fixing PR 57358 which is where it is most necessary.  This patch
 adds it there and to parm_preserved_before_stmt_p which is also
 reachable through ipa_load_from_parm_agg_1 that is also called from
 outside of jump function computations.

 I'm currently bootstrapping and testing the following on x86_64-linux.
 OK for the branch if it passes?  And the testcase for trunk?

Ok.

Thanks,
Richard.

 Thanks,

 Martin


 2014-01-07  Martin Jambor  mjam...@suse.cz

 PR ipa/59610
 * ipa-prop.c (ipa_compute_jump_functions): Bail out if not optimizing.
 (parm_preserved_before_stmt_p): Assume modification present when not
 optimizing.

 testsuite/
 * gcc.dg/ipa/pr59610.c: New test.

 diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
 index 47d487d..3788a11 100644
 --- a/gcc/ipa-prop.c
 +++ b/gcc/ipa-prop.c
 @@ -623,16 +623,22 @@ parm_preserved_before_stmt_p (struct 
 param_analysis_info *parm_ainfo,
if (parm_ainfo  parm_ainfo-parm_modified)
  return false;

 -  gcc_checking_assert (gimple_vuse (stmt) != NULL_TREE);
 -  ao_ref_init (refd, parm_load);
 -  /* We can cache visited statements only when parm_ainfo is available and 
 when
 - we are looking at a naked load of the whole parameter.  */
 -  if (!parm_ainfo || TREE_CODE (parm_load) != PARM_DECL)
 -visited_stmts = NULL;
 +  if (optimize)
 +{
 +  gcc_checking_assert (gimple_vuse (stmt) != NULL_TREE);
 +  ao_ref_init (refd, parm_load);
 +  /* We can cache visited statements only when parm_ainfo is available 
 and
 + when we are looking at a naked load of the whole parameter.  */
 +  if (!parm_ainfo || TREE_CODE (parm_load) != PARM_DECL)
 +   visited_stmts = NULL;
 +  else
 +   visited_stmts = parm_ainfo-parm_visited_statements;
 +  walk_aliased_vdefs (refd, gimple_vuse (stmt), mark_modified, 
 modified,
 + visited_stmts);
 +}
else
 -visited_stmts = parm_ainfo-parm_visited_statements;
 -  walk_aliased_vdefs (refd, gimple_vuse (stmt), mark_modified, modified,
 - visited_stmts);
 +modified = true;
 +
if (parm_ainfo  modified)
  parm_ainfo-parm_modified = true;
return !modified;
 @@ -1466,6 +1472,9 @@ ipa_compute_jump_functions (struct cgraph_node *node,
  {
struct cgraph_edge *cs;

 +  if (!optimize)
 +return;
 +
for (cs = node-callees; cs; cs = cs-next_callee)
  {
struct cgraph_node *callee = cgraph_function_or_thunk_node (cs-callee,
 diff --git a/gcc/testsuite/gcc.dg/ipa/pr59610.c 
 b/gcc/testsuite/gcc.dg/ipa/pr59610.c
 new file mode 100644
 index 000..fc09334
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/ipa/pr59610.c
 @@ -0,0 +1,11 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 } */
 +
 +struct A { int a; };
 +extern void *y;
 +
 +__attribute__((optimize (0))) void
 +foo (void *p, struct A x)
 +{
 +  foo (y, x);
 +}


Re: [PATCH] Change i?86/x86_64 into SWITCHABLE_TARGET (PR58115)

2014-01-08 Thread Richard Biener
On Tue, Jan 7, 2014 at 8:39 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Jan 06, 2014 at 10:27:06AM +, Richard Sandiford wrote:
 Of course, IMO, the cleanest fix would be to use switchable targets
 for i386...

 The following patch does that, bootstrapped/regtested on x86_64-linux and
 i686-linux.  The only problem with the patch is PCH,
 +FAIL: 17_intro/headers/c++200x/stdc++.cc (test for excess errors)
 +FAIL: 17_intro/headers/c++200x/stdc++_multiple_inclusion.cc (test for excess 
 errors)
 (both 32-bit and 64-bit regtests), where it ICEs.  I guess the problem is
 that the target globals are allocated partly in GC, partly in heap and
 even if they were allocated completely in GC and GTY(()) marked fully all
 the individual pointed structures, we IMNSHO still don't want it to be
 saved during PCH and restored later, what we have is basically just a cache
 of the target globals.

 Dunno what is the best way to handle that though.
 Either before writing PCH c-common.c could call some tree.c routine that
 would traverse the cl_option_hash_table hash table and for every
 TARGET_OPTION_NODE in the hash table clear TREE_TARGET_GLOBALS.
 Or perhaps some gengtype extension to run some routine before PCH saving
 on the tree_target_option structs and clear the globals field in there.
 Or use GTY((user)) on tree_target_option, but then dunno how we'd handle the
 marking of the embedded opts field (and common).
 Any ideas?

Well, a GTY((skip_pch)) would probably work.  Or move the thing
out-of GC land (thus make cl_option_hash_table persistant) and
simply GTY((skip)) the pointer completely.  Not sure if we ever
collect from it.

Richard.

 2014-01-07  Jakub Jelinek  ja...@redhat.com

 PR target/58115
 * tree-core.h (struct target_globals): New forward declaration.
 (struct tree_target_option): Add globals field.
 * tree.h (TREE_TARGET_GLOBALS): Define.
 * target-globals.h (struct target_globals): Define even if
 !SWITCHABLE_TARGET.
 * config/i386/i386.h (SWITCHABLE_TARGET): Define.
 * config/i386/i386.c: Include target-globals.h.
 (ix86_set_current_function): Instead of doing target_reinit
 unconditionally, use save_target_globals_default_opts and
 restore_target_globals.

 --- gcc/tree-core.h.jj  2014-01-07 08:47:24.0 +0100
 +++ gcc/tree-core.h 2014-01-07 16:44:35.591358235 +0100
 @@ -1557,11 +1557,18 @@ struct GTY(()) tree_optimization_option
struct target_optabs *GTY ((skip)) base_optabs;
  };

 +/* Forward declaration, defined in target-globals.h.  */
 +
 +struct GTY(()) target_globals;
 +
  /* Target options used by a function.  */

  struct GTY(()) tree_target_option {
struct tree_common common;

 +  /* Target globals for the corresponding target option.  */
 +  struct target_globals *globals;
 +
/* The optimization options used by the user.  */
struct cl_target_option opts;
  };
 --- gcc/tree.h.jj   2014-01-03 11:40:33.0 +0100
 +++ gcc/tree.h  2014-01-07 12:55:39.137295100 +0100
 @@ -2695,6 +2695,9 @@ extern tree build_optimization_node (str
  #define TREE_TARGET_OPTION(NODE) \
(TARGET_OPTION_NODE_CHECK (NODE)-target_option.opts)

 +#define TREE_TARGET_GLOBALS(NODE) \
 +  (TARGET_OPTION_NODE_CHECK (NODE)-target_option.globals)
 +
  /* Return a tree node that encapsulates the target options in OPTS.  */
  extern tree build_target_option_node (struct gcc_options *opts);

 --- gcc/target-globals.h.jj 2014-01-03 11:40:46.0 +0100
 +++ gcc/target-globals.h2014-01-07 17:08:51.113880947 +0100
 @@ -37,6 +37,7 @@ extern struct target_builtins *this_targ
  extern struct target_gcse *this_target_gcse;
  extern struct target_bb_reorder *this_target_bb_reorder;
  extern struct target_lower_subreg *this_target_lower_subreg;
 +#endif

  struct GTY(()) target_globals {
struct target_flag_state *GTY((skip)) flag_state;
 @@ -57,6 +58,7 @@ struct GTY(()) target_globals {
struct target_lower_subreg *GTY((skip)) lower_subreg;
  };

 +#if SWITCHABLE_TARGET
  extern struct target_globals default_target_globals;

  extern struct target_globals *save_target_globals (void);
 --- gcc/config/i386/i386.h.jj   2014-01-06 22:37:19.0 +0100
 +++ gcc/config/i386/i386.h  2014-01-07 12:13:06.480486755 +0100
 @@ -2510,6 +2510,9 @@ extern void debug_dispatch_window (int);
  #define IX86_HLE_ACQUIRE (1  16)
  #define IX86_HLE_RELEASE (1  17)

 +/* For switching between functions with different target attributes.  */
 +#define SWITCHABLE_TARGET 1
 +
  /*
  Local variables:
  version-control: t
 --- gcc/config/i386/i386.c.jj   2014-01-06 22:37:19.0 +0100
 +++ gcc/config/i386/i386.c  2014-01-07 16:52:32.597904760 +0100
 @@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.
  #include tree-pass.h
  #include context.h
  #include pass_manager.h
 +#include target-globals.h

  static rtx legitimize_dllimport_symbol (rtx, bool);
  static rtx 

Re: [Patch,testsuite] Fix testcases that use bind_pic_locally

2014-01-08 Thread Vidya Praveen
On Tue, Jan 07, 2014 at 09:35:54PM +, Mike Stump wrote:
 On Dec 17, 2013, at 6:06 AM, Vidya Praveen vidyaprav...@arm.com wrote:
  bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
  default [1][2].
 
 Let's give Jakub 2 days to weigh in?  If no objections, Ok, though, do see 
 about adding documentation for it.  

Sure. I didn't respin the patch with documentation since I wanted to know
if the solution is acceptable. If this patch is OK, I'll respin with the
documentation for bind_pic_locally_ok. 

 I kinda would like a simpler interface for these two, but?  that can be 
 follow on work, if someone has a bright idea and some time to implement it.
 

Could you explain what do you mean by simpler interface here? 

Cheers
VP.




Re: Rb tree node recycling patch

2014-01-08 Thread Jonathan Wakely
On 27 December 2013 18:30, François Dumont wrote:
 Hi

 Here is a patch to add recycling of Rb tree nodes when possible.

The change looks good, but it is not a bug fix, so I don't think it's
suitable for Stage 3.  Please re-submit this after 4.9 is released
when we are in Stage 1 again, thanks.


Re: [Patch,testsuite] Fix testcases that use bind_pic_locally

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 11:49:08AM +, Vidya Praveen wrote:
 On Tue, Jan 07, 2014 at 09:35:54PM +, Mike Stump wrote:
  On Dec 17, 2013, at 6:06 AM, Vidya Praveen vidyaprav...@arm.com wrote:
   bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
   default [1][2].
  
  Let's give Jakub 2 days to weigh in?  If no objections, Ok, though, do see 
  about adding documentation for it.  
 
 Sure. I didn't respin the patch with documentation since I wanted to know
 if the solution is acceptable. If this patch is OK, I'll respin with the
 documentation for bind_pic_locally_ok. 
 
  I kinda would like a simpler interface for these two, but?  that can be 
  follow on work, if someone has a bright idea and some time to implement it.
  
 
 Could you explain what do you mean by simpler interface here? 

The simpler interface, as I said earlier, would be just to make sure
/* { dg-add-options bind_pic_locally } */
does the right thing, I really don't believe you've tried hard enough.

It is true dejagnu's default_target_compile has:
if {[board_info $dest exists multilib_flags]} {
append add_flags  [board_info $dest multilib_flags]
}
last (before just adding -o $destfile; is multilib_flags where the
-fpic/-fPIC comes in, right?), but if say dg-add-options bind_pic_locally
adds the necessary options not to dg-extra-tools-flags, but to some
other variable and say gcc_target_compile (and g++_target_compile)
around the [target_compile ...] invocation e.g. temporarily append
that other variable (if not empty) to board_info's multilib_flags
and afterwards remove it, I don't see why it wouldn't work.
Tcl is quite flexible in this.

Jakub


[PATCH] Change i?86/x86_64 into SWITCHABLE_TARGET (PR58115, take 2)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 12:32:59PM +0100, Richard Biener wrote:
  Either before writing PCH c-common.c could call some tree.c routine that
  would traverse the cl_option_hash_table hash table and for every
  TARGET_OPTION_NODE in the hash table clear TREE_TARGET_GLOBALS.
  Or perhaps some gengtype extension to run some routine before PCH saving
  on the tree_target_option structs and clear the globals field in there.
  Or use GTY((user)) on tree_target_option, but then dunno how we'd handle the
  marking of the embedded opts field (and common).
  Any ideas?
 
 Well, a GTY((skip_pch)) would probably work.  Or move the thing
 out-of GC land (thus make cl_option_hash_table persistant) and
 simply GTY((skip)) the pointer completely.  Not sure if we ever
 collect from it.

Even if the pointer was out of GCC land and GTY((skip)), we'd need to clear
it somewhere during PCH saving, as the containing structure is GC allocated.

I've already implemented in the mean time the variant with the
htab_traverse, all still reachable TARGET_OPTION_NODE trees should be in that
hash table.

Bootstrapped/regtested on x86_64-linux and i686-linux (in both cases with
--enable-checking=yes,rtl and --enable-checking=release, for the
i686-linux/release checking I had to fix an unrelated compare debug issue
I'll post when I manage to reduce testcase).

I'd like to get rid of all the XCNEW calls in target-globals.c as a
follow-up.

As for performance, for --enable-checking=release from very rough check
on make -j48 bootstrap and make -j48 check times the patch is compile time
neutral, on e.g. declare-simd-1.C testcase g++ is twice as fast with the
patch though (~ 0.8 sec without the patch, ~ 0.3 sec with the patch, both
for x86_64 and i686).

Ok for trunk?

2014-01-07  Jakub Jelinek  ja...@redhat.com

PR target/58115
* tree-core.h (struct target_globals): New forward declaration.
(struct tree_target_option): Add globals field.
* tree.h (TREE_TARGET_GLOBALS): Define.
(prepare_target_option_nodes_for_pch): New prototype.
* target-globals.h (struct target_globals): Define even if
!SWITCHABLE_TARGET.
* tree.c (prepare_target_option_node_for_pch,
prepare_target_option_nodes_for_pch): New functions.
* config/i386/i386.h (SWITCHABLE_TARGET): Define.
* config/i386/i386.c: Include target-globals.h.
(ix86_set_current_function): Instead of doing target_reinit
unconditionally, use save_target_globals_default_opts and
restore_target_globals.
c-family/
* c-pch.c (c_common_write_pch): Call
prepare_target_option_nodes_for_pch.

--- gcc/tree-core.h.jj  2014-01-07 08:47:24.0 +0100
+++ gcc/tree-core.h 2014-01-07 16:44:35.591358235 +0100
@@ -1557,11 +1557,18 @@ struct GTY(()) tree_optimization_option
   struct target_optabs *GTY ((skip)) base_optabs;
 };
 
+/* Forward declaration, defined in target-globals.h.  */
+
+struct GTY(()) target_globals;
+
 /* Target options used by a function.  */
 
 struct GTY(()) tree_target_option {
   struct tree_common common;
 
+  /* Target globals for the corresponding target option.  */
+  struct target_globals *globals;
+
   /* The optimization options used by the user.  */
   struct cl_target_option opts;
 };
--- gcc/tree.h.jj   2014-01-03 11:40:33.0 +0100
+++ gcc/tree.h  2014-01-07 21:28:15.038061120 +0100
@@ -2695,9 +2695,14 @@ extern tree build_optimization_node (str
 #define TREE_TARGET_OPTION(NODE) \
   (TARGET_OPTION_NODE_CHECK (NODE)-target_option.opts)
 
+#define TREE_TARGET_GLOBALS(NODE) \
+  (TARGET_OPTION_NODE_CHECK (NODE)-target_option.globals)
+
 /* Return a tree node that encapsulates the target options in OPTS.  */
 extern tree build_target_option_node (struct gcc_options *opts);
 
+extern void prepare_target_option_nodes_for_pch (void);
+
 #if defined ENABLE_TREE_CHECKING  (GCC_VERSION = 2007)
 
 inline tree
--- gcc/target-globals.h.jj 2014-01-03 11:40:46.0 +0100
+++ gcc/target-globals.h2014-01-07 17:08:51.113880947 +0100
@@ -37,6 +37,7 @@ extern struct target_builtins *this_targ
 extern struct target_gcse *this_target_gcse;
 extern struct target_bb_reorder *this_target_bb_reorder;
 extern struct target_lower_subreg *this_target_lower_subreg;
+#endif
 
 struct GTY(()) target_globals {
   struct target_flag_state *GTY((skip)) flag_state;
@@ -57,6 +58,7 @@ struct GTY(()) target_globals {
   struct target_lower_subreg *GTY((skip)) lower_subreg;
 };
 
+#if SWITCHABLE_TARGET
 extern struct target_globals default_target_globals;
 
 extern struct target_globals *save_target_globals (void);
--- gcc/tree.c.jj   2014-01-03 11:40:33.0 +0100
+++ gcc/tree.c  2014-01-07 21:27:35.590268195 +0100
@@ -11527,6 +11527,28 @@ build_target_option_node (struct gcc_opt
   return t;
 }
 
+/* Reset TREE_TARGET_GLOBALS cache for TARGET_OPTION_NODE.
+   Called through htab_traverse.  */
+
+static int
+prepare_target_option_node_for_pch (void **slot, 

Re: Rb tree node recycling patch

2014-01-08 Thread Paolo Carlini

Hi,

On 12/27/2013 07:30 PM, François Dumont wrote:
Note that this patch contains also a cleanup of a useless template 
parameter _Is_pod_comparator on _Rb_tree_impl.
The useless parameter is a remnant of an attempt at exploiting the EBO 
for _Rb_tree_impl. At some point Benjamin got a patch from a contributor 
but then had to quickly revert it just in time for the ABI freeze 
because it didn't work. Evrything is recorded in the mailing list. 
Anyway, whatever we do now (more exactly, post 4.9) let's make sure we 
don't break the ABI inadvertently, or, if we actually decide do that, we 
should reconsider the EBO.


About the node recycling idea itself, we got a closely related Bugzilla. 
Is it *exactly* the same issue, or not? Please double check.


Paolo.


Re: Rb tree node recycling patch

2014-01-08 Thread Paolo Carlini

On 01/08/2014 02:34 PM, Paolo Carlini wrote:

Hi,

On 12/27/2013 07:30 PM, François Dumont wrote:
Note that this patch contains also a cleanup of a useless template 
parameter _Is_pod_comparator on _Rb_tree_impl.
The useless parameter is a remnant of an attempt at exploiting the EBO 
for _Rb_tree_impl. At some point Benjamin got a patch from a 
contributor but then had to quickly revert it just in time for the ABI 
freeze because it didn't work. Evrything is recorded in the mailing 
list. Anyway, whatever we do now (more exactly, post 4.9) let's make 
sure we don't break the ABI inadvertently, or, if we actually decide 
do that, we should reconsider the EBO.


This ChangeLog entry:

2004-03-25  Dhruv Matani  dhruvb...@gmx.net

   * include/bits/stl_tree.h: Introduced a new class _Rb_tree_impl, ...

has the original EBO idea, which in fact we didn't deliver.

Paolo.





Re: [RFA][PATCH][middle-end/53623] Improve extension elimination

2014-01-08 Thread Jeff Law

On 01/08/14 01:14, Eric Botcazou wrote:

Committed after private email approval from Jakub.  I made one
additional trivial change (missing whitespace in a comment).


This breaks bootstrap with RTL checking enabled:

[ ... ]
Thanks.  I'm on it.
jeff



Re: [Patch] libgcov.c re-factoring

2014-01-08 Thread Teresa Johnson
On Mon, Jan 6, 2014 at 9:49 AM, Teresa Johnson tejohn...@google.com wrote:
 On Sun, Jan 5, 2014 at 12:08 PM, Jan Hubicka hubi...@ucw.cz wrote:
 2014-01-03  Rong Xu  x...@google.com

 * gcc/gcov-io.c (gcov_var): Move from gcov-io.h.
 (gcov_position): Ditto.
 (gcov_is_error): Ditto.
 (gcov_rewrite): Ditto.
 * gcc/gcov-io.h: Refactor. Move gcov_var to gcov-io.h, and libgcov
 only part to libgcc/libgcov.h.
 * libgcc/libgcov-driver.c: Use libgcov.h.
 (buffer_fn_data): Use xmalloc instead of malloc.
 (gcov_exit_merge_gcda): Ditto.
 * libgcc/libgcov-driver-system.c (allocate_filename_struct): Ditto.
 * libgcc/libgcov.h: New common header files for libgcov-*.h.
 * libgcc/libgcov-interface.c: Use libgcov.h
 * libgcc/libgcov-merge.c: Ditto.
 * libgcc/libgcov-profiler.c: Ditto.
 * libgcc/Makefile.in: Add dependence to libgcov.h

 OK, with the licence changes and...

 Index: gcc/gcov-io.c
 ===
 --- gcc/gcov-io.c   (revision 206100)
 +++ gcc/gcov-io.c   (working copy)
 @@ -36,6 +36,61 @@ static const gcov_unsigned_t *gcov_read_words (uns
  static void gcov_allocate (unsigned);
  #endif

 +/* Optimum number of gcov_unsigned_t's read from or written to disk.  */
 +#define GCOV_BLOCK_SIZE (1  10)
 +
 +GCOV_LINKAGE struct gcov_var
 +{
 +  FILE *file;
 +  gcov_position_t start;   /* Position of first byte of block */
 +  unsigned offset; /* Read/write position within the block.  */
 +  unsigned length; /* Read limit in the block.  */
 +  unsigned overread;   /* Number of words overread.  */
 +  int error;   /*  0 overflow,  0 disk error.  */
 +  int mode;/*  0 writing,  0 reading */
 +#if IN_LIBGCOV
 +  /* Holds one block plus 4 bytes, thus all coverage reads  writes
 + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE
 + to and from the disk. libgcov never backtracks and only writes 4
 + or 8 byte objects.  */
 +  gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1];
 +#else
 +  int endian;  /* Swap endianness.  */
 +  /* Holds a variable length block, as the compiler can write
 + strings and needs to backtrack.  */
 +  size_t alloc;
 +  gcov_unsigned_t *buffer;
 +#endif
 +} gcov_var;
 +
 +/* Save the current position in the gcov file.  */
 +static inline gcov_position_t
 +gcov_position (void)
 +{
 +  gcc_assert (gcov_var.mode  0);
 +  return gcov_var.start + gcov_var.offset;
 +}
 +
 +/* Return nonzero if the error flag is set.  */
 +static inline int
 +gcov_is_error (void)
 +{
 +  return gcov_var.file ? gcov_var.error : 1;
 +}
 +
 +#if IN_LIBGCOV
 +/* Move to beginning of file and initialize for writing.  */
 +GCOV_LINKAGE inline void
 +gcov_rewrite (void)
 +{
 +  gcc_assert (gcov_var.mode  0);

 I would turn those two asserts into checking asserts so they do not
 bloat the runtime lib.

 Ok, but note that there are a number of other gcc_assert already in
 gcov-io.c (these were the only 2 in gcov-io.h, now moved here). Should
 I go ahead and change all of them in gcov-io.c?

Actually, I tried changing these two, but gcc_checking_assert is
undefined in libgcov.a. Ok to commit without this change?

Teresa


 Thanks,
 Teresa


 Thanks,
 Honza



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Workaround PR59584 on 4.8 Fix use of stack-pointer-register as a temporary for CRIS

2014-01-08 Thread Hans-Peter Nilsson
 From: Hans-Peter Nilsson h...@axis.com
 Date: Mon, 23 Dec 2013 23:34:02 +0100

Just as previously done on trunk, I'm going to cover up PR59584
(which was fixed and then exposed on the 4.8 branch) by applying
commit r206187 from trunk below.  Again, the PR bug is an ICE
caused by the combination of expr.c:find_args_size_adjust and
expr.c:fixup_args_size_notes not able to handle a define_split
matching for the stack-adjustment assignment instruction emitted
by __builtin_stack_restore (the insn that gets the REG_ARGS_SIZE
note).

*This* bug is slightly different but the fix happens to cover up
that bug by not matching the splitter for the stack-pointer; the
destination is used as a temporary, so sp is set to something
unusable as a stack-pointer, ungood.

Tested cris-elf, makes gcc.dg/pr50251.c pass again, will commit
to the 4.8 branch.

   PR middle-end/59584
   * config/cris/predicates.md (cris_nonsp_register_operand):
   New define_predicate.
   * config/cris/cris.md: Replace register_operand with
   cris_nonsp_register_operand for destinations in all
   define_splits where a register is set more than once.
 
 Index: gcc/config/cris/cris.md
 ===
 --- gcc/config/cris/cris.md   (revision 206176)
 +++ gcc/config/cris/cris.md   (working copy)
 @@ -758,7 +758,7 @@ (define_split
 (match_operand:SI 1 const_int_operand ))
(match_operand:SI 2 register_operand ))])
 (match_operand 3 register_operand ))
 - (set (match_operand:SI 4 register_operand )
 + (set (match_operand:SI 4 cris_nonsp_register_operand )
 (plus:SI (mult:SI (match_dup 0)
   (match_dup 1))
  (match_dup 2)))])]
 @@ -859,7 +859,7 @@ (define_split
(match_operand:SI 0 cris_bdap_operand )
(match_operand:SI 1 cris_bdap_operand ))])
 (match_operand 2 register_operand ))
 - (set (match_operand:SI 3 register_operand )
 + (set (match_operand:SI 3 cris_nonsp_register_operand )
 (plus:SI (match_dup 0) (match_dup 1)))])]
reload_completed  reg_overlap_mentioned_p (operands[3], operands[2])
[(set (match_dup 4) (match_dup 2))
 @@ -3960,7 +3960,7 @@ (define_expand casesi
  ;; up.
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
4 cris_operand_extend_operator
[(match_operand 1 register_operand )
 @@ -3990,7 +3990,7 @@ (define_split
  ;; Call this op-extend-split-rx=rz
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
4 cris_plus_or_bound_operator
[(match_operand 1 register_operand )
 @@ -4018,7 +4018,7 @@ (define_split
  ;; Call this op-extend-split-swapped
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
4 cris_plus_or_bound_operator
[(match_operator
 @@ -4044,7 +4044,7 @@ (define_split
  ;; bound.  Call this op-extend-split-swapped-rx=rz.
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
4 cris_plus_or_bound_operator
[(match_operator
 @@ -4075,7 +4075,7 @@ (define_split
  ;; Call this op-extend.
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
3 cris_orthogonal_operator
[(match_operand 1 register_operand )
 @@ -4099,7 +4099,7 @@ (define_split
  ;; Call this op-split-rx=rz
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
3 cris_commutative_orth_op
[(match_operand 2 memory_operand )
 @@ -4123,7 +4123,7 @@ (define_split
  ;; Call this op-split-swapped.
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
3 cris_commutative_orth_op
[(match_operand 1 register_operand )
 @@ -4146,7 +4146,7 @@ (define_split
  ;; Call this op-split-swapped-rx=rz.
  
  (define_split
 -  [(set (match_operand 0 register_operand )
 +  [(set (match_operand 0 cris_nonsp_register_operand )
   (match_operator
3 cris_orthogonal_operator
[(match_operand 2 memory_operand )
 @@ -4555,10 +4555,11 @@ (define_split
  ;; We're not allowed to generate copies of registers with different mode
  ;; until after reload; copying pseudos upsets reload.  CVS as of
  ;; 2001-08-24, unwind-dw2-fde.c, _Unwind_Find_FDE ICE in
 -;; cselib_invalidate_regno.
 +;; cselib_invalidate_regno.  Also, don't do this for the stack-pointer,
 +;; as we don't want it set temporarily to an invalid value.
  
  

Re: [PATCH] Change i?86/x86_64 into SWITCHABLE_TARGET (PR58115, take 2)

2014-01-08 Thread Richard Biener
On Wed, Jan 8, 2014 at 1:45 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Jan 08, 2014 at 12:32:59PM +0100, Richard Biener wrote:
  Either before writing PCH c-common.c could call some tree.c routine that
  would traverse the cl_option_hash_table hash table and for every
  TARGET_OPTION_NODE in the hash table clear TREE_TARGET_GLOBALS.
  Or perhaps some gengtype extension to run some routine before PCH saving
  on the tree_target_option structs and clear the globals field in there.
  Or use GTY((user)) on tree_target_option, but then dunno how we'd handle 
  the
  marking of the embedded opts field (and common).
  Any ideas?

 Well, a GTY((skip_pch)) would probably work.  Or move the thing
 out-of GC land (thus make cl_option_hash_table persistant) and
 simply GTY((skip)) the pointer completely.  Not sure if we ever
 collect from it.

 Even if the pointer was out of GCC land and GTY((skip)), we'd need to clear
 it somewhere during PCH saving, as the containing structure is GC allocated.

 I've already implemented in the mean time the variant with the
 htab_traverse, all still reachable TARGET_OPTION_NODE trees should be in that
 hash table.

 Bootstrapped/regtested on x86_64-linux and i686-linux (in both cases with
 --enable-checking=yes,rtl and --enable-checking=release, for the
 i686-linux/release checking I had to fix an unrelated compare debug issue
 I'll post when I manage to reduce testcase).

 I'd like to get rid of all the XCNEW calls in target-globals.c as a
 follow-up.

 As for performance, for --enable-checking=release from very rough check
 on make -j48 bootstrap and make -j48 check times the patch is compile time
 neutral, on e.g. declare-simd-1.C testcase g++ is twice as fast with the
 patch though (~ 0.8 sec without the patch, ~ 0.3 sec with the patch, both
 for x86_64 and i686).

 Ok for trunk?

Works for me.  Wait a bit for others to comment though.

Thanks,
Richard.

 2014-01-07  Jakub Jelinek  ja...@redhat.com

 PR target/58115
 * tree-core.h (struct target_globals): New forward declaration.
 (struct tree_target_option): Add globals field.
 * tree.h (TREE_TARGET_GLOBALS): Define.
 (prepare_target_option_nodes_for_pch): New prototype.
 * target-globals.h (struct target_globals): Define even if
 !SWITCHABLE_TARGET.
 * tree.c (prepare_target_option_node_for_pch,
 prepare_target_option_nodes_for_pch): New functions.
 * config/i386/i386.h (SWITCHABLE_TARGET): Define.
 * config/i386/i386.c: Include target-globals.h.
 (ix86_set_current_function): Instead of doing target_reinit
 unconditionally, use save_target_globals_default_opts and
 restore_target_globals.
 c-family/
 * c-pch.c (c_common_write_pch): Call
 prepare_target_option_nodes_for_pch.

 --- gcc/tree-core.h.jj  2014-01-07 08:47:24.0 +0100
 +++ gcc/tree-core.h 2014-01-07 16:44:35.591358235 +0100
 @@ -1557,11 +1557,18 @@ struct GTY(()) tree_optimization_option
struct target_optabs *GTY ((skip)) base_optabs;
  };

 +/* Forward declaration, defined in target-globals.h.  */
 +
 +struct GTY(()) target_globals;
 +
  /* Target options used by a function.  */

  struct GTY(()) tree_target_option {
struct tree_common common;

 +  /* Target globals for the corresponding target option.  */
 +  struct target_globals *globals;
 +
/* The optimization options used by the user.  */
struct cl_target_option opts;
  };
 --- gcc/tree.h.jj   2014-01-03 11:40:33.0 +0100
 +++ gcc/tree.h  2014-01-07 21:28:15.038061120 +0100
 @@ -2695,9 +2695,14 @@ extern tree build_optimization_node (str
  #define TREE_TARGET_OPTION(NODE) \
(TARGET_OPTION_NODE_CHECK (NODE)-target_option.opts)

 +#define TREE_TARGET_GLOBALS(NODE) \
 +  (TARGET_OPTION_NODE_CHECK (NODE)-target_option.globals)
 +
  /* Return a tree node that encapsulates the target options in OPTS.  */
  extern tree build_target_option_node (struct gcc_options *opts);

 +extern void prepare_target_option_nodes_for_pch (void);
 +
  #if defined ENABLE_TREE_CHECKING  (GCC_VERSION = 2007)

  inline tree
 --- gcc/target-globals.h.jj 2014-01-03 11:40:46.0 +0100
 +++ gcc/target-globals.h2014-01-07 17:08:51.113880947 +0100
 @@ -37,6 +37,7 @@ extern struct target_builtins *this_targ
  extern struct target_gcse *this_target_gcse;
  extern struct target_bb_reorder *this_target_bb_reorder;
  extern struct target_lower_subreg *this_target_lower_subreg;
 +#endif

  struct GTY(()) target_globals {
struct target_flag_state *GTY((skip)) flag_state;
 @@ -57,6 +58,7 @@ struct GTY(()) target_globals {
struct target_lower_subreg *GTY((skip)) lower_subreg;
  };

 +#if SWITCHABLE_TARGET
  extern struct target_globals default_target_globals;

  extern struct target_globals *save_target_globals (void);
 --- gcc/tree.c.jj   2014-01-03 11:40:33.0 +0100
 +++ gcc/tree.c  2014-01-07 21:27:35.590268195 +0100
 @@ -11527,6 +11527,28 

Re: [Patch] libgcov.c re-factoring

2014-01-08 Thread Jan Hubicka
 
 Actually, I tried changing these two, but gcc_checking_assert is
 undefined in libgcov.a. Ok to commit without this change?

OK.
incrementally can you please define gcov_nonruntime_assert that will wind into
gcc_assert for code within gcc/coverage tools and into nothing for libgcov
runtime and we can change those offenders to that.

Honza
 
 Teresa
 
 
  Thanks,
  Teresa
 
 
  Thanks,
  Honza
 
 
 
  --
  Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
 
 
 
 -- 
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Extend -fstack-protector-strong to cover calls with return slot

2014-01-08 Thread Florian Weimer

On 01/07/2014 02:37 PM, Jakub Jelinek wrote:

On Tue, Jan 07, 2014 at 02:27:04PM +0100, Florian Weimer wrote:

gimplify_modify_expr_rhs, in the CALL_EXPR case:

  if (use_target)
{
  CALL_EXPR_RETURN_SLOT_OPT (*from_p) = 1;
  mark_addressable (*to_p);
}


Yeah, that sets it in some cases too, not in other testcases.

Just look at how the flag is used when actually expanding it:

 if (target  MEM_P (target)  CALL_EXPR_RETURN_SLOT_OPT (exp))
   structure_value_addr = XEXP (target, 0);
 else
   {
 /* For variable-sized objects, we must be called with a target
specified.  If we were to allocate space on the stack here,
we would have no way of knowing when to free it.  */
 rtx d = assign_temp (rettype, 1, 1);
 structure_value_addr = XEXP (d, 0);
 target = 0;
   }


Okay, I'm beginning to understand.  I tried to actually reach the second 
branch, and ended up with PR59711. :)


foo12 in the new C testcase covers it in part without a variable-sized 
object.



so, if it is set, the address of the var on the LHS is passed to the
function as hidden argument, if it is not set, we pass address of
a stack temporary instead.  Both the automatic var and the stack temporary
can overflow, if the callee does something wrong.


What about the attached version?  It still does not exactly match your 
original suggestion because gimple_call_lhs (stmt) can be NULL_TREE if 
the result is ignored and this case needs instrumentation, as you 
explained, so I use the function return type in the aggregate_value_p check.


Testing is still under way, but looks good so far.  I'm bootstrapping 
with BOOT_CFLAGS=-O2 -g -fstack-protector-strong with Ada enabled, for 
additional coverage.


--
Florian Weimer / Red Hat Product Security Team
gcc/

2014-01-08  Florian Weimer  fwei...@redhat.com

	* cfgexpand.c (stack_protect_decl_p): New function, extracted from
	expand_used_vars.
	(stack_protect_return_slot_p): New function.
	(expand_used_vars): Call stack_protect_decl_p and
	stack_protect_return_slot_p for -fstack-protector-strong.

gcc/testsuite/

2014-01-08  Florian Weimer  fwei...@redhat.com

	* gcc.dg/fstack-protector-strong.c: Add coverage for return slots.
	* g++.dg/fstack-protector-strong.C: Likewise.
	* gcc.target/i386/ssp-strong-reg.c: New file.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 206311)
+++ gcc/cfgexpand.c	(working copy)
@@ -1599,6 +1599,52 @@
   return 0;
 }
 
+/* Check if the current function has local referenced variables that
+   have their addresses taken, contain an array, or are arrays.  */
+
+static bool
+stack_protect_decl_p ()
+{
+  unsigned i;
+  tree var;
+
+  FOR_EACH_LOCAL_DECL (cfun, i, var)
+if (!is_global_var (var))
+  {
+	tree var_type = TREE_TYPE (var);
+	if (TREE_CODE (var) == VAR_DECL
+	 (TREE_CODE (var_type) == ARRAY_TYPE
+		|| TREE_ADDRESSABLE (var)
+		|| (RECORD_OR_UNION_TYPE_P (var_type)
+		 record_or_union_type_has_array_p (var_type
+	  return true;
+  }
+  return false;
+}
+
+/* Check if the current function has calls that use a return slot.  */
+
+static bool
+stack_protect_return_slot_p ()
+{
+  basic_block bb;
+  
+  FOR_ALL_BB_FN (bb, cfun)
+for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	 !gsi_end_p (gsi); gsi_next (gsi))
+  {
+	gimple stmt = gsi_stmt (gsi);
+	/* This assumes that calls to internal-only functions never
+	   use a return slot.  */
+	if (is_gimple_call (stmt)
+	 !gimple_call_internal_p (stmt)
+	 aggregate_value_p (TREE_TYPE (gimple_call_fntype (stmt)),
+  gimple_call_fndecl (stmt)))
+	  return true;
+  }
+  return false;
+}
+
 /* Expand all variables used in the function.  */
 
 static rtx
@@ -1669,22 +1715,8 @@
   pointer_map_destroy (ssa_name_decls);
 
   if (flag_stack_protect == SPCT_FLAG_STRONG)
-FOR_EACH_LOCAL_DECL (cfun, i, var)
-  if (!is_global_var (var))
-	{
-	  tree var_type = TREE_TYPE (var);
-	  /* Examine local referenced variables that have their addresses taken,
-	 contain an array, or are arrays.  */
-	  if (TREE_CODE (var) == VAR_DECL
-	   (TREE_CODE (var_type) == ARRAY_TYPE
-		  || TREE_ADDRESSABLE (var)
-		  || (RECORD_OR_UNION_TYPE_P (var_type)
-		   record_or_union_type_has_array_p (var_type
-	{
-	  gen_stack_protect_signal = true;
-	  break;
-	}
-	}
+  gen_stack_protect_signal
+	= stack_protect_decl_p () || stack_protect_return_slot_p ();
 
   /* At this point all variables on the local_decls with TREE_USED
  set are not associated with any block scope.  Lay them out.  */
Index: gcc/testsuite/g++.dg/fstack-protector-strong.C
===
--- gcc/testsuite/g++.dg/fstack-protector-strong.C	(revision 206311)
+++ 

[Patch,ARM] crypto intrinsics in AArch32 testsuite fix

2014-01-08 Thread Christophe Lyon
Hi,

Commit 206131 introduced check_effective_target_arm_crypto_ok in
lib/target-supports.exp, to check that the target supports
-mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp.

However, when GCC is configured for target arm-none-linux-gnueabihf, I
can see all the new tests fail:
sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:29: fatal
error: gnu/stubs-soft.h: No such file or directory

(stubs.h is included via arm_neon.h)

This is because check_effective_target_arm_crypto_ok sample test is
too simple. Making it include arm_neon.h does the trick (and makes the
tests UNSUPPORTED rather than FAIL).

OK?

Christophe.

2014-01-08  Christophe Lyon  christophe.l...@linaro.org

* lib/target-supports.exp (check_effective_target_arm_crypto_ok):
Include arm_neon.h in sample test.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index a8910bb..cc10936 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-01-08  Christophe Lyon  christophe.l...@linaro.org
+
+   * lib/target-supports.exp (check_effective_target_arm_crypto_ok):
+   Include arm_neon.h in sample test.
+
 2014-01-07  Paolo Carlini  paolo.carl...@oracle.com
 
* g++.dg/ext/is_base_of_incomplete-2.C: New.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5166679..7b40ccd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2305,6 +2305,7 @@ proc check_effective_target_arm_unaligned { } {
 proc check_effective_target_arm_crypto_ok {} {
 if { [check_effective_target_arm32] } {
return [check_no_compiler_messages arm_crypto_ok object {
+ #include arm_neon.h
  int foo (void)
  {
 __asm__ volatile (aese.8 q0, q0);


[PATCH] Don't segv in omp-low.c (PR middle-end/59669)

2014-01-08 Thread Marek Polacek
We can also get NULL for the default definition, so we need to handle that
before calling has_zero_uses on it.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-01-08  Marek Polacek  pola...@redhat.com

PR middle-end/59669
* omp-low.c (simd_clone_adjust): Don't crash if def is NULL.
testsuite/
* gcc.dg/gomp/pr59669.c: New test.

--- gcc/omp-low.c.mp2014-01-08 13:48:40.353624984 +0100
+++ gcc/omp-low.c   2014-01-08 13:48:47.780656551 +0100
@@ -11587,7 +11587,7 @@ simd_clone_adjust (struct cgraph_node *n
tree def = ssa_default_def (cfun, orig_arg);
gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
|| POINTER_TYPE_P (TREE_TYPE (orig_arg)));
-   if (!has_zero_uses (def))
+   if (def  !has_zero_uses (def))
  {
iter1 = make_ssa_name (orig_arg, NULL);
iter2 = make_ssa_name (orig_arg, NULL);
--- gcc/testsuite/gcc.dg/gomp/pr59669.c.mp  2014-01-08 13:50:23.710492087 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr59669.c 2014-01-08 13:50:54.339622411 +0100
@@ -0,0 +1,9 @@
+/* PR middle-end/59669 */
+/* { dg-do compile } */
+/* { dg-options -fopenmp } */
+
+#pragma omp declare simd linear(a)
+void
+foo (int a)
+{
+}

Marek


Re: [PATCH] Don't segv in omp-low.c (PR middle-end/59669)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 04:09:08PM +0100, Marek Polacek wrote:
 We can also get NULL for the default definition, so we need to handle that
 before calling has_zero_uses on it.
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?

Looks ok, but there is similar code a few lines above, can you please fix it up
and add it to the testcase?

I'd think
#pragma omp declare simd uniform(a) aligned(a:32)
void
bar (int *a)
{
}

could hit the other spot.

Jakub


Re: [Patch,ARM] crypto intrinsics in AArch32 testsuite fix

2014-01-08 Thread Kyrill Tkachov

On 08/01/14 15:00, Christophe Lyon wrote:

Hi,

Commit 206131 introduced check_effective_target_arm_crypto_ok in
lib/target-supports.exp, to check that the target supports
-mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp.

However, when GCC is configured for target arm-none-linux-gnueabihf, I
can see all the new tests fail:
sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:29: fatal
error: gnu/stubs-soft.h: No such file or directory

(stubs.h is included via arm_neon.h)

This is because check_effective_target_arm_crypto_ok sample test is
too simple. Making it include arm_neon.h does the trick (and makes the
tests UNSUPPORTED rather than FAIL).

OK?

Christophe.


Hi Christophe,

I believe the best solution here is to figure out the best mfloat-abi and mfpu 
options combiation like we do for the NEON options (look for example at 
check_effective_target_arm_neon_ok_nocache in target-supports.exp).


That way these tests will not add -mfloat-abi=softfp to an 
arm-none-linux-gnueabihf target (which is the root of the problem) and they will 
PASS instead of being just UNSUPPORTED.


I have a patch for that in testing.

Thanks,
Kyrill




Re: [PATCH] Don't segv in omp-low.c (PR middle-end/59669)

2014-01-08 Thread Marek Polacek
On Wed, Jan 08, 2014 at 04:14:06PM +0100, Jakub Jelinek wrote:
 On Wed, Jan 08, 2014 at 04:09:08PM +0100, Marek Polacek wrote:
  We can also get NULL for the default definition, so we need to handle that
  before calling has_zero_uses on it.
  
  Bootstrapped/regtested on x86_64-linux, ok for trunk?
 
 Looks ok, but there is similar code a few lines above, can you please fix it 
 up
 and add it to the testcase?
 
 I'd think
 #pragma omp declare simd uniform(a) aligned(a:32)
 void
 bar (int *a)
 {
 }
 
 could hit the other spot.

Indeed it does.  So like this?

2014-01-08  Marek Polacek  pola...@redhat.com

PR middle-end/59669
* omp-low.c (simd_clone_adjust): Don't crash if def is NULL.
testsuite/
* gcc.dg/gomp/pr59669-1.c: New test.
* gcc.dg/gomp/pr59669-2.c: New test.

--- gcc/omp-low.c.mp2014-01-08 13:48:40.353624984 +0100
+++ gcc/omp-low.c   2014-01-08 16:21:06.247268557 +0100
@@ -11537,7 +11537,7 @@ simd_clone_adjust (struct cgraph_node *n
unsigned int alignment = node-simdclone-args[i].alignment;
tree orig_arg = node-simdclone-args[i].orig_arg;
tree def = ssa_default_def (cfun, orig_arg);
-   if (!has_zero_uses (def))
+   if (def  !has_zero_uses (def))
  {
tree fn = builtin_decl_explicit (BUILT_IN_ASSUME_ALIGNED);
gimple_seq seq = NULL;
@@ -11587,7 +11587,7 @@ simd_clone_adjust (struct cgraph_node *n
tree def = ssa_default_def (cfun, orig_arg);
gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
|| POINTER_TYPE_P (TREE_TYPE (orig_arg)));
-   if (!has_zero_uses (def))
+   if (def  !has_zero_uses (def))
  {
iter1 = make_ssa_name (orig_arg, NULL);
iter2 = make_ssa_name (orig_arg, NULL);
--- gcc/testsuite/gcc.dg/gomp/pr59669-1.c.mp2014-01-08 13:50:23.710492087 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr59669-1.c   2014-01-08 13:50:54.339622411 
+0100
@@ -0,0 +1,9 @@
+/* PR middle-end/59669 */
+/* { dg-do compile } */
+/* { dg-options -fopenmp } */
+
+#pragma omp declare simd linear(a)
+void
+foo (int a)
+{
+}
--- gcc/testsuite/gcc.dg/gomp/pr59669-2.c.mp2014-01-08 16:20:35.553121408 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr59669-2.c   2014-01-08 16:20:54.099210269 
+0100
@@ -0,0 +1,9 @@
+/* PR middle-end/59669 */
+/* { dg-do compile } */
+/* { dg-options -fopenmp } */
+
+#pragma omp declare simd uniform(a) aligned(a:32)
+void
+bar (int *a)
+{
+}

Marek


Re: [PATCH] Don't segv in omp-low.c (PR middle-end/59669)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 04:25:47PM +0100, Marek Polacek wrote:
 Indeed it does.  So like this?
 
 2014-01-08  Marek Polacek  pola...@redhat.com
 
   PR middle-end/59669
   * omp-low.c (simd_clone_adjust): Don't crash if def is NULL.
 testsuite/
   * gcc.dg/gomp/pr59669-1.c: New test.
   * gcc.dg/gomp/pr59669-2.c: New test.

Yep, thanks.

Jakub


[Patch, Fortran] PR 58182: [4.9 Regression] ICE with global binding name used as a FUNCTION

2014-01-08 Thread Janus Weil
Hi all,

I just committed an 'obvious' patch for a ICE-on-invalid regression on trunk:

http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=206429

Cheers,
Janus


[PATCH] Add zero-overhead looping for xtensa backend

2014-01-08 Thread Felix Yang
Hi Sterling,

  This patch implements zero-overhead looping for xtensa backend using
hw-doloop facility.
  If OK for trunk, please apply it for me. Thanks.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog(revision 206431)
+++ gcc/ChangeLog(working copy)
@@ -1,3 +1,18 @@
+2014-01-08  Felix Yang  fei.yang0...@gmail.com
+
+* config/xtensa/xtensa.c (xtensa_reorg): New.
+(xtensa_reorg_loops): New.
+(xtensa_can_use_doloop_p): New.
+(xtensa_invalid_within_doloop): New.
+(hwloop_optimize): New.
+(hwloop_fail): New.
+(hwloop_pattern_reg): New.
+(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label.
+(xtensa_doloop_hooks): Define.
+* config/xtensa/xtensa.md (doloop_end): New.
+(zero_cost_loop_start): Rewritten.
+(zero_cost_loop_end): Rewritten.
+
 2014-01-08  Marek Polacek  pola...@redhat.com

 PR middle-end/59669
Index: gcc/config/xtensa/xtensa.md
===
--- gcc/config/xtensa/xtensa.md(revision 206431)
+++ gcc/config/xtensa/xtensa.md(working copy)
@@ -35,6 +35,8 @@
   (UNSPEC_TLS_CALL9)
   (UNSPEC_TP10)
   (UNSPEC_MEMW11)
+  (UNSPEC_LSETUP_START  12)
+  (UNSPEC_LSETUP_END13)

   (UNSPECV_SET_FP1)
   (UNSPECV_ENTRY2)
@@ -1289,6 +1291,8 @@
(set_attr length3)])


+;; Hardware loop support.
+
 ;; Define the loop insns used by bct optimization to represent the
 ;; start and end of a zero-overhead loop (in loop.c).  This start
 ;; template generates the loop insn; the end template doesn't generate
@@ -1296,34 +1300,58 @@

 (define_insn zero_cost_loop_start
   [(set (pc)
-(if_then_else (eq (match_operand:SI 0 register_operand a)
-  (const_int 0))
-  (label_ref (match_operand 1  ))
-  (pc)))
-   (set (reg:SI 19)
-(plus:SI (match_dup 0) (const_int -1)))]
+(if_then_else (ne (match_operand:SI 2 nonimmediate_operand 0)
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 0 nonimmediate_operand =a)
+(plus (match_dup 2)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_START)]
   
-  loopnez\t%0, %l1
+  loop\t%0, %l1_LEND
   [(set_attr typejump)
(set_attr modenone)
(set_attr length3)])

 (define_insn zero_cost_loop_end
   [(set (pc)
-(if_then_else (ne (reg:SI 19) (const_int 0))
-  (label_ref (match_operand 0  ))
-  (pc)))
-   (set (reg:SI 19)
-(plus:SI (reg:SI 19) (const_int -1)))]
+(if_then_else (ne (match_operand:SI 2 nonimmediate_operand 0)
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 0 nonimmediate_operand =a)
+(plus (match_dup 2)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_END)]
   
 {
-xtensa_emit_loop_end (insn, operands);
-return ;
+  xtensa_emit_loop_end (insn, operands);
+  return ;
 }
   [(set_attr typejump)
(set_attr modenone)
(set_attr length0)])

+; operand 0 is the loop count pseudo register
+; operand 1 is the label to jump to at the top of the loop
+(define_expand doloop_end
+  [(parallel [(set (pc) (if_then_else
+  (ne (match_operand:SI 0  )
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+  (set (match_dup 0)
+   (plus:SI (match_dup 0)
+(const_int -1)))
+  (unspec [(const_int 0)] UNSPEC_LSETUP_END)])]
+  
+{
+  /* The loop optimizer doesn't check the predicates... */
+  if (GET_MODE (operands[0]) != SImode)
+FAIL;
+})
+

 ;; Setting a register from a comparison.

Index: gcc/config/xtensa/xtensa.c
===
--- gcc/config/xtensa/xtensa.c(revision 206431)
+++ gcc/config/xtensa/xtensa.c(working copy)
@@ -1,6 +1,7 @@
 /* Subroutines for insn-output.c for Tensilica's Xtensa architecture.
Copyright (C) 2001-2014 Free Software Foundation, Inc.
Contributed by Bob Wilson (bwil...@tensilica.com) at Tensilica.
+   Zero-overhead looping support by Felix Yang (felix.yang0...@gmail.com).

 This file is part of GCC.

@@ -61,8 +62,9 @@ along with GCC; see the file COPYING3.  If not see
 #include gimple.h
 #include gimplify.h
 #include df.h
+#include hw-doloop.h
+#include dumpfile.h

-
 /* Enumeration for all of the relational tests, so that we can build
arrays indexed by the test type, and not worry about the order
of EQ, NE, etc.  */
@@ -186,6 +188,10 @@ static reg_class_t xtensa_secondary_reload (bool,

 static bool constantpool_address_p (const_rtx addr);
 static bool xtensa_legitimate_constant_p (enum 

Re: [Patch] libgcov.c re-factoring

2014-01-08 Thread Teresa Johnson
On Wed, Jan 8, 2014 at 6:34 AM, Jan Hubicka hubi...@ucw.cz wrote:

 Actually, I tried changing these two, but gcc_checking_assert is
 undefined in libgcov.a. Ok to commit without this change?

 OK.
 incrementally can you please define gcov_nonruntime_assert that will wind into
 gcc_assert for code within gcc/coverage tools and into nothing for libgcov
 runtime and we can change those offenders to that.

Ok, committed as r206435. Will send the assert patch in a follow-up
later this week.

Teresa


 Honza

 Teresa

 
  Thanks,
  Teresa
 
 
  Thanks,
  Honza
 
 
 
  --
  Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.

2014-01-08 Thread Christophe Lyon
Hi Renlin,

The new test you added introduces 2 new FAILs when the target is
arm-none-linux-gnueabi (as opposed to arm-none-linux-gnueabihf).

Christophe.


On 24 December 2013 15:46, Renlin Li renlin...@arm.com wrote:
 Hi,

 I just updated my patch according your suggestion.
 Thank you for committing it for me!

 All you guys have a nice Xmas break!

 Kind regards,
 Renlin Li


 On 04/12/13 11:23, Ramana Radhakrishnan wrote:

 Sorry about the slow response. Been on holiday.

 On 20/11/13 16:27, Renlin Li wrote:

 Hi all,

 This patch will make the arm back-end use vcvt for float to fixed point
 conversions when applicable.

 Test on arm-none-linux-gnueabi has been done on the model.
 Okay for trunk?

 + (define_insn *combine_vcvtf2i
 +   [(set (match_operand:SI 0 s_register_operand =r)
 +   (fix:SI (fix:SF (mult:SF (match_operand:SF 1 s_register_operand
 t)
 +(match_operand 2
 +const_double_vcvt_power_of_two
 Dp)]
 +   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP3 
 !flag_rounding_math
 +   vcvt%?.s32.f32\\t%1, %1, %v2\;vmov%?\\t%0, %1
 +   [(set_attr predicable yes)
 +(set_attr predicable_short_it no)
 +(set_attr ce_count 2)
 +(set_attr type f_cvtf2i)]
 + )
 +

 You need to set length to 8.

 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
 @@ -0,0 +1,15 @@
 +/* Check that vcvt is used for fixed and float data conversions.  */
 +/* { dg-do compile } */
 +/* { dg-options -O1 -mfpu=vfp3 } */
 +/* { dg-require-effective-target arm_vfp_ok } */
 +float fixed_to_float(int i)
 +{
 +return ((float)i / (1  16));
 +}
 +
 +int float_to_fixed(float f)
 +{
 +return ((int)(f*(1  16)));
 +}
 +/* { dg-final { scan-assembler vcvt.f32.s32 } } */
 +/* { dg-final { scan-assembler vcvt.s32.f32 } } */


 GNU coding style for functions.

 Ok with those changes.




 regards
 Ramana


 Kind regards,
 Renlin Li


 gcc/ChangeLog:

 2013-11-20  Renlin Li  renlin...@arm.com

* config/arm/arm-protos.h (vfp_const_double_for_bits): Declare.
* config/arm/constraints.md (Dp): Define new constraint.
* config/arm/predicates.md ( const_double_vcvt_power_of_two):
 Define
new predicate.
* config/arm/arm.c (arm_print_operand): Add print for new
 fucntion.
(vfp3_const_double_for_bits): New function.
* config/arm/vfp.md (combine_vcvtf2i): Define new instruction.

 gcc/testsuite/ChangeLog:

 2013-11-20  Renlin Li  renlin...@arm.com

* gcc.target/arm/fixed_float_conversion.c: New test case.




Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-01-08 Thread Sterling Augustine
On Wed, Jan 8, 2014 at 8:27 AM, Felix Yang fei.yang0...@gmail.com wrote:
 Hi Sterling,

   This patch implements zero-overhead looping for xtensa backend using
 hw-doloop facility.
   If OK for trunk, please apply it for me. Thanks.

Hi Felix,

I last worked on zero-overhead loops for Xtensa in the gcc 4.3
timeframe, but when I did, I ran into several problems related to
later optimizations rearranging the code which I didn't have time to
address.

I'm sure much of that experience is completely stale now, but I would
appreciate a detail of the testing you have done with this patch (in
particular, a description of the different xtensa configurations you
tested it against, especially the ones with and without loop
instructions) before I approve it. Please be sure the assembler can
relax the loops it generates as well. I don't see any particular
problem, but there are many, many gotchas when dealing with xtensa
loop instructions.

It also appears that Tensilica has stopped posting test results for
Xtensa, which makes it difficult to evaluate the quality of this
patch.

Thanks,

Sterling


Re: [PATCH] Fix ifcvt (PR rtl-optimization/58668)

2014-01-08 Thread Uros Bizjak
Hello!

 So like this instead?  Bootstrapped/regtested on x86_64-linux and
 i686-linux.  For 4.8 I'd still prefer the earlier patch though.

 2013-12-18  Jakub Jelinek  ja...@redhat.com

 PR rtl-optimization/58668
 * cfgcleanup.c (flow_find_cross_jump): Don't count
 any jumps if dir_p is NULL.  Remove p1 variable, use active_insn_p
 to determine what is counted.
 (flow_find_head_matching_sequence): Use active_insn_p to determine
 what is counted.
 (try_head_merge_bb): Adjust for the flow_find_head_matching_sequence
 counting change.
 * ifcvt.c (count_bb_insns): Use active_insn_p  !JUMP_P to
 determine what is counted.

 * gcc.dg/pr58668.c: New test.

 This is fine for the trunk. Release manager's call for what they'd prefer on 
 the 4.8 branch.

This caused PR59724 on alpha:

20021116-1.c: In function ‘foo’:
20021116-1.c:31:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 9
 }
 ^
20021116-1.c:31:1: error: insn outside basic block
(jump_insn 94 52 93 9 (return) 20021116-1.c:31 -1
 (nil)
 - return)

Uros.


FW: [PATCH] Fix PR 59631

2014-01-08 Thread Iyer, Balaji V
A small but major typo.

The second sentence should read ...usage of _Cilk_spawn [ and _Cilk_sync] 
*without* -fcilkplus... instead of ...with -fcilkplus...
 
I am sorry about this.

Sincerely,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Tuesday, January 7, 2014 10:15 AM
 To: gcc-patches@gcc.gnu.org
 Subject: [PATCH] Fix PR 59631
 
 Hello Everyone,
   The attached patch will fix the issue reported in PR 59631. The main
 issue was the usage of Cilk spawn [and _Cilk_sync] with -fcilkplus caused an
 ICE. This patch should fix that. The issue was only reported for C++ but the
 issue exists in C compiler also.  This patch fixes both C and C++. A test 
 case is
 also included.
 
 Is this Ok for trunk?
 
 Here are the ChangeLog entries:
 +++ gcc/c/ChangeLog
 +2014-01-07  Balaji V. Iyer  balaji.v.i...@intel.com
 +
 +   PR c++/59631
 +   * c-parser.c (c_parser_postfix_expression): Replaced consecutive if
 +   statements with if-elseif statements.
 
 +++ gcc/testsuite/ChangeLog
 +2014-01-07  Balaji V. Iyer  balaji.v.i...@intel.com
 +
 +   PR c++/59631
 +   * gcc.dg/cilk-plus/cilk-plus.exp: Removed -fcilkplus from flags 
 list.
 +   * g++.dg/cilk-plus/cilk-plus.exp: Likewise.
 +   * c-c++-common/cilk-plus/CK/spawnee_inline.c: Replaced second dg-
 option
 +   with dg-additional-options.
 +   * c-c++-common/cilk-plus/CK/varargs_test.c: Likewise.
 +   * c-c++-common/cilk-plus/CK/steal_check.c: Likewise.
 +   * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise.
 +   * c-c++-common/cilk-plus/CK/spawning_arg.c: Likewise.
 +   * c-c++-common/cilk-plus/CK/invalid_spawns.c: Added a dg-options
 tag.
 +   * c-c++-common/cilk-plus/CK/pr59631.c: New testcase.
 
 +++ gcc/cp/ChangeLog
 +2014-01-07  Balaji V. Iyer  balaji.v.i...@intel.com
 +
 +   PR c++/59631
 +   * parser.c (cp_parser_postfix_expression): Added a new if-statement
 +   and replaced an existing if-statement with else-if statement.
 +   Changed an existing error message wording to match the one from the
 C
 +   parser.
 
 Thanks,
 
 Balaji V. Iyer.
Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 206392)
+++ gcc/c/c-parser.c(working copy)
@@ -7500,7 +7500,7 @@
  expr = c_parser_postfix_expression (parser);
  expr.value = error_mark_node;   
}
- if (c_parser_peek_token (parser)-keyword == RID_CILK_SPAWN)
+ else if (c_parser_peek_token (parser)-keyword == RID_CILK_SPAWN)
{
  error_at (loc, consecutive %_Cilk_spawn% keywords 
are not permitted);
Index: gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp
===
--- gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp(revision 206392)
+++ gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp(working copy)
@@ -51,13 +51,13 @@
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]]  
-fcilkplus -O3 -std=c99  
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]]  
-fcilkplus -g -O0 -std=c99  
 
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-g -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O1 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O2 -std=c99 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O2 -ftree-vectorize -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O3 -g -fcilkplus  
+dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-g   
+dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O1   
+dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O2 -std=c99   
+dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O2 -ftree-vectorize   
+dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O3 -g   
 if { [check_effective_target_lto] } {
-dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O3 -flto -g -fcilkplus  
+dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O3 -flto -g   
 }
 
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/SE/*.c]]  
-g  
Index: gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
===
--- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(revision 206392)
+++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(working copy)
@@ -74,12 +74,12 @@
 dg-finish
 
 dg-init
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain 

Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.

2014-01-08 Thread Renlin Li

Hi Christophe,

There is a minor issue about this test case. It requires the `float-abi` 
of your target to be either `softfp` or `hard` (to utilize the floating 
point hardware).

Could you please check whether this solves the problem or not?

I should add it to the `dg-options` section of the test case and a patch 
is on the way.


Thank you for your notification!

Kind regards,
Renlin Li


On 08/01/14 16:43, Christophe Lyon wrote:

Hi Renlin,

The new test you added introduces 2 new FAILs when the target is
arm-none-linux-gnueabi (as opposed to arm-none-linux-gnueabihf).

Christophe.


On 24 December 2013 15:46, Renlin Li renlin...@arm.com wrote:

Hi,

I just updated my patch according your suggestion.
Thank you for committing it for me!

All you guys have a nice Xmas break!

Kind regards,
Renlin Li


On 04/12/13 11:23, Ramana Radhakrishnan wrote:

Sorry about the slow response. Been on holiday.

On 20/11/13 16:27, Renlin Li wrote:

Hi all,

This patch will make the arm back-end use vcvt for float to fixed point
conversions when applicable.

Test on arm-none-linux-gnueabi has been done on the model.
Okay for trunk?

+ (define_insn *combine_vcvtf2i
+   [(set (match_operand:SI 0 s_register_operand =r)
+   (fix:SI (fix:SF (mult:SF (match_operand:SF 1 s_register_operand
t)
+(match_operand 2
+const_double_vcvt_power_of_two
Dp)]
+   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP3 
!flag_rounding_math
+   vcvt%?.s32.f32\\t%1, %1, %v2\;vmov%?\\t%0, %1
+   [(set_attr predicable yes)
+(set_attr predicable_short_it no)
+(set_attr ce_count 2)
+(set_attr type f_cvtf2i)]
+ )
+

You need to set length to 8.


--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
@@ -0,0 +1,15 @@
+/* Check that vcvt is used for fixed and float data conversions.  */
+/* { dg-do compile } */
+/* { dg-options -O1 -mfpu=vfp3 } */
+/* { dg-require-effective-target arm_vfp_ok } */
+float fixed_to_float(int i)
+{
+return ((float)i / (1  16));
+}
+
+int float_to_fixed(float f)
+{
+return ((int)(f*(1  16)));
+}
+/* { dg-final { scan-assembler vcvt.f32.s32 } } */
+/* { dg-final { scan-assembler vcvt.s32.f32 } } */


GNU coding style for functions.

Ok with those changes.




regards
Ramana



Kind regards,
Renlin Li


gcc/ChangeLog:

2013-11-20  Renlin Li  renlin...@arm.com

* config/arm/arm-protos.h (vfp_const_double_for_bits): Declare.
* config/arm/constraints.md (Dp): Define new constraint.
* config/arm/predicates.md ( const_double_vcvt_power_of_two):
Define
new predicate.
* config/arm/arm.c (arm_print_operand): Add print for new
fucntion.
(vfp3_const_double_for_bits): New function.
* config/arm/vfp.md (combine_vcvtf2i): Define new instruction.

gcc/testsuite/ChangeLog:

2013-11-20  Renlin Li  renlin...@arm.com

* gcc.target/arm/fixed_float_conversion.c: New test case.






Re: [PATCH] _Cilk_for for C and C++

2014-01-08 Thread Jakub Jelinek
On Tue, Jan 07, 2014 at 10:11:59PM +, Iyer, Balaji V wrote:
   I used a similar existing one (safelen). Attached, please find 2
 fixed patches for C and C++ along with their changelogs.

But safelen is something completely different, while if I skim
the _Cilk_for docs, the grain is really a chunk size, where the runtime
library performs the scheduling of grain sized chunks, so using
OMP_CLAUSE_SCHEDULE clause with
OMP_CLAUSE_SCHEDULE_KIND (c) = OMP_CLAUSE_SCHEDULE_RUNTIME;
OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = grain_expr;
sounds like what should be used.  OMP_CLAUSE_SAFELEN says what is the
minimal vectorization factor the compiler can assume is safe for
a simd loop.

Jakub


Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.

2014-01-08 Thread Christophe Lyon
On 8 January 2014 18:15, Renlin Li renlin...@arm.com wrote:
 Hi Christophe,

 There is a minor issue about this test case. It requires the `float-abi` of
 your target to be either `softfp` or `hard` (to utilize the floating point
 hardware).
 Could you please check whether this solves the problem or not?

Indeed I had tried with 'hard' and it's OK. (That's why I said
arm-none-linux-gnueabi as opposed to arm-none-linux-gnueabihf, but I
wasn't clear enough).

Thanks for your upcoming patch :-)

Christophe.


[PATCH, AArch64 2/6] aarch64: Add mulditi3 and umulditi3 patterns

2014-01-08 Thread Richard Henderson
* config/aarch64/aarch64.md (su_optabmulditi3): New expander.
---
 gcc/config/aarch64/aarch64.md | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c4acdfc..0b3943d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2078,6 +2078,23 @@
   [(set_attr type sumull)]
 )
 
+(define_expand su_optabmulditi3
+  [(set (match_operand:TI 0 register_operand)
+   (mult:TI (ANY_EXTEND:TI (match_operand:DI 1 register_operand))
+(ANY_EXTEND:TI (match_operand:DI 2 register_operand]
+  
+{
+  rtx low = gen_reg_rtx (DImode);
+  emit_insn (gen_muldi3 (low, operands[1], operands[2]));
+
+  rtx high = gen_reg_rtx (DImode);
+  emit_insn (gen_sumuldi3_highpart (high, operands[1], operands[2]));
+
+  emit_move_insn (gen_lowpart (DImode, operands[0]), low);
+  emit_move_insn (gen_highpart (DImode, operands[0]), high);
+  DONE;
+})
+
 (define_insn sumuldi3_highpart
   [(set (match_operand:DI 0 register_operand =r)
(truncate:DI
-- 
1.8.4.2



[PATCH, AArch64 1/6] aarch64: Add addti3 and subti3 patterns

2014-01-08 Thread Richard Henderson
* config/aarch64/aarch64 (addti3, subti3): New expanders.
(addGPI3_compare0): Remove leading * from name.
(addGPI3_carryin): Likewise.
(subGPI3_compare0): Likewise.
(subGPI3_carryin): Likewise.
---
 gcc/config/aarch64/aarch64.md | 45 +++
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4e838ee..c4acdfc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1102,7 +1102,26 @@
(set_attr simd *,*,*,yes)]
 )
 
-(define_insn *addmode3_compare0
+(define_expand addti3
+  [(set (match_operand:TI 0 register_operand )
+   (plus:TI (match_operand:TI 1 register_operand )
+(match_operand:TI 2 register_operand )))]
+  
+{
+  rtx low = gen_reg_rtx (DImode);
+  emit_insn (gen_adddi3_compare0 (low, gen_lowpart (DImode, operands[1]),
+ gen_lowpart (DImode, operands[2])));
+
+  rtx high = gen_reg_rtx (DImode);
+  emit_insn (gen_adddi3_carryin (high, gen_highpart (DImode, operands[1]),
+gen_highpart (DImode, operands[2])));
+
+  emit_move_insn (gen_lowpart (DImode, operands[0]), low);
+  emit_move_insn (gen_highpart (DImode, operands[0]), high);
+  DONE;
+})
+
+(define_insn addmode3_compare0
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
 (plus:GPI (match_operand:GPI 1 register_operand %r,r,r)
@@ -1386,7 +1405,7 @@
   [(set_attr type alu_ext)]
 )
 
-(define_insn *addmode3_carryin
+(define_insn addmode3_carryin
   [(set
 (match_operand:GPI 0 register_operand =r)
 (plus:GPI (geu:GPI (reg:CC CC_REGNUM) (const_int 0))
@@ -1554,8 +1573,26 @@
(set_attr simd *,yes)]
 )
 
+(define_expand subti3
+  [(set (match_operand:TI 0 register_operand )
+   (minus:TI (match_operand:TI 1 register_operand )
+ (match_operand:TI 2 register_operand )))]
+  
+{
+  rtx low = gen_reg_rtx (DImode);
+  emit_insn (gen_subdi3_compare0 (low, gen_lowpart (DImode, operands[1]),
+ gen_lowpart (DImode, operands[2])));
+
+  rtx high = gen_reg_rtx (DImode);
+  emit_insn (gen_subdi3_carryin (high, gen_highpart (DImode, operands[1]),
+gen_highpart (DImode, operands[2])));
+
+  emit_move_insn (gen_lowpart (DImode, operands[0]), low);
+  emit_move_insn (gen_highpart (DImode, operands[0]), high);
+  DONE;
+})
 
-(define_insn *submode3_compare0
+(define_insn submode3_compare0
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ (minus:GPI (match_operand:GPI 1 register_operand r)
  (match_operand:GPI 2 register_operand r))
@@ -1702,7 +1739,7 @@
   [(set_attr type alu_ext)]
 )
 
-(define_insn *submode3_carryin
+(define_insn submode3_carryin
   [(set
 (match_operand:GPI 0 register_operand =r)
 (minus:GPI (minus:GPI
-- 
1.8.4.2



[PATCH, AArch64 3/6] aarch64: Add multi3 pattern

2014-01-08 Thread Richard Henderson
* config/aarch64/aarch64.md (multi3): New expander.
(maddGPI): Remove leading * from name.
---
 gcc/config/aarch64/aarch64.md | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0b3943d..0f76cd1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1968,7 +1968,7 @@
   [(set_attr type mul)]
 )
 
-(define_insn *maddmode
+(define_insn maddmode
   [(set (match_operand:GPI 0 register_operand =r)
(plus:GPI (mult:GPI (match_operand:GPI 1 register_operand r)
(match_operand:GPI 2 register_operand r))
@@ -2095,6 +2095,31 @@
   DONE;
 })
 
+;; The default expansion of multi3 using umuldi3_highpart will perform
+;; the additions in an order that fails to combine into two madd insns.
+(define_expand multi3
+  [(set (match_operand:TI 0 register_operand)
+   (mult:TI (match_operand:TI 1 register_operand)
+(match_operand:TI 2 register_operand)))]
+  
+{
+  rtx l0 = gen_reg_rtx (DImode);
+  rtx l1 = gen_lowpart (DImode, operands[1]);
+  rtx l2 = gen_lowpart (DImode, operands[2]);
+  rtx h0 = gen_reg_rtx (DImode);
+  rtx h1 = gen_highpart (DImode, operands[1]);
+  rtx h2 = gen_highpart (DImode, operands[2]);
+
+  emit_insn (gen_muldi3 (l0, l1, l2));
+  emit_insn (gen_umuldi3_highpart (h0, l1, l2));
+  emit_insn (gen_madddi (h0, h1, l2, h0));
+  emit_insn (gen_madddi (h0, l1, h2, h0));
+
+  emit_move_insn (gen_lowpart (DImode, operands[0]), l0);
+  emit_move_insn (gen_highpart (DImode, operands[0]), h0);
+  DONE;
+})
+
 (define_insn sumuldi3_highpart
   [(set (match_operand:DI 0 register_operand =r)
(truncate:DI
-- 
1.8.4.2



[PATCH, AArch64 0/7] TImode and longlong.h improvements

2014-01-08 Thread Richard Henderson
The recent longlong.h patch 

  http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00286.html

reminded me that the other common patterns really ought to be supported
somehow.  We had patterns defining ADDS, ADC, and UMULH, but we didn't
have the proper expanders in place to make use of them.

The final longlong.h patch has nothing that's really aarch64 specific,
but I chickened out in making the generic patterns use builtin double
word arithmetic.  Perhaps some define set in the cpu-specific portion
of the file ought to select this from the final common portion, but
that sort of thing begs the question of large-scale cleanup.


r~


Richard Henderson (6):
  aarch64: Add addti3 and subti3 patterns
  aarch64: Add mulditi3 and umulditi3 patterns
  aarch64: Add multi3 pattern
  soft-fp: Commonize creation of TImode types
  soft-fp: Define UDWtype for longlong.h
  aarch64: Define add_ss, sub_ddmmss, umul_ppmm

 gcc/config/aarch64/aarch64.md| 89 ++--
 include/longlong.h   | 28 +---
 libgcc/config/aarch64/sfp-machine.h  |  4 --
 libgcc/config/i386/64/sfp-machine.h  |  5 --
 libgcc/config/ia64/sfp-machine.h |  5 --
 libgcc/config/tilegx/sfp-machine32.h |  5 --
 libgcc/config/tilegx/sfp-machine64.h |  5 --
 libgcc/soft-fp/soft-fp.h | 14 ++
 8 files changed, 120 insertions(+), 35 deletions(-)

-- 
1.8.4.2


[PATCH, AArch64 6/6] aarch64: Define add_ssaaaa, sub_ddmmss, umul_ppmm

2014-01-08 Thread Richard Henderson
We have good support for TImode arithmetic, so no need to do anything
with inline assembly.

include/
* longlong.h [__aarch64__] (add_ss, sub_ddmmss, umul_ppmm): New.
[__aarch64__] (COUNT_LEADING_ZEROS_0): Define in terms of W_TYPE_SIZE.
---
 include/longlong.h | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/longlong.h b/include/longlong.h
index b4c1f400..1b11fc7 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -123,19 +123,35 @@ extern const UQItype __clz_tab[256] attribute_hidden;
 #endif /* __GNUC__  2 */
 
 #if defined (__aarch64__)
+#define add_ss(sh, sl, ah, al, bh, bl) \
+  do { \
+UDWtype __x = (UDWtype)(UWtype)(ah)  64 | (UWtype)(al);  \
+__x += (UDWtype)(UWtype)(bh)  64 | (UWtype)(bl); \
+(sh) = __x  W_TYPE_SIZE; \
+(sl) = __x;
\
+  } while (0)
+#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+  do { \
+UDWtype __x = (UDWtype)(UWtype)(ah)  64 | (UWtype)(al);  \
+__x -= (UDWtype)(UWtype)(bh)  64 | (UWtype)(bl); \
+(sh) = __x  W_TYPE_SIZE; \
+(sl) = __x;
\
+  } while (0)
+#define umul_ppmm(ph, pl, m0, m1)  \
+  do { \
+UDWtype __x = (UDWtype)(UWtype)(m0) * (UWtype)(m1);
\
+(ph) = __x  W_TYPE_SIZE; \
+(pl) = __x;
\
+  } while (0)
 
+#define COUNT_LEADING_ZEROS_0   W_TYPE_SIZE
 #if W_TYPE_SIZE == 32
 #define count_leading_zeros(COUNT, X)  ((COUNT) = __builtin_clz (X))
 #define count_trailing_zeros(COUNT, X)   ((COUNT) = __builtin_ctz (X))
-#define COUNT_LEADING_ZEROS_0 32
-#endif /* W_TYPE_SIZE == 32 */
-
-#if W_TYPE_SIZE == 64
+#elif W_TYPE_SIZE == 64
 #define count_leading_zeros(COUNT, X)  ((COUNT) = __builtin_clzll (X))
 #define count_trailing_zeros(COUNT, X)   ((COUNT) = __builtin_ctzll (X))
-#define COUNT_LEADING_ZEROS_0 64
 #endif /* W_TYPE_SIZE == 64 */
-
 #endif /* __aarch64__ */
 
 #if defined (__alpha)  W_TYPE_SIZE == 64
-- 
1.8.4.2



[PATCH, AArch64 4/6] soft-fp: Commonize creation of TImode types

2014-01-08 Thread Richard Henderson
No need to do this over and over for different 64-bit hosts.

libgcc/
* config/soft-fp/soft-fp.h (TItype, UTItype, TI_BITS): New.
* config/aarch64/sfp-machine.h (TItype, UTItype, TI_BITS): Remove.
* config/i386/64/sfp-machine.h: Likewise.
* config/ia64/sfp-machine.h: Likewise.
* config/tilegx/sfp-machine32.h: Likewise.
* config/tilegx/sfp-machine64.h: Likewise.
---
 libgcc/config/aarch64/sfp-machine.h  | 4 
 libgcc/config/i386/64/sfp-machine.h  | 5 -
 libgcc/config/ia64/sfp-machine.h | 5 -
 libgcc/config/tilegx/sfp-machine32.h | 5 -
 libgcc/config/tilegx/sfp-machine64.h | 5 -
 libgcc/soft-fp/soft-fp.h | 8 
 6 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/libgcc/config/aarch64/sfp-machine.h 
b/libgcc/config/aarch64/sfp-machine.h
index 61b5f72..5e676be 100644
--- a/libgcc/config/aarch64/sfp-machine.h
+++ b/libgcc/config/aarch64/sfp-machine.h
@@ -28,10 +28,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define _FP_WS_TYPEsigned long long
 #define _FP_I_TYPE int
 
-typedef int TItype __attribute__ ((mode (TI)));
-typedef unsigned int UTItype __attribute__ ((mode (TI)));
-#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype))
-
 /* The type of the result of a floating point comparison.  This must
match __libgcc_cmp_return__ in GCC for the target.  */
 typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
diff --git a/libgcc/config/i386/64/sfp-machine.h 
b/libgcc/config/i386/64/sfp-machine.h
index 1ff94c2..8197536 100644
--- a/libgcc/config/i386/64/sfp-machine.h
+++ b/libgcc/config/i386/64/sfp-machine.h
@@ -3,11 +3,6 @@
 #define _FP_WS_TYPEsigned long long
 #define _FP_I_TYPE long long
 
-typedef int TItype __attribute__ ((mode (TI)));
-typedef unsigned int UTItype __attribute__ ((mode (TI)));
-
-#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype))
-
 #define _FP_MUL_MEAT_Q(R,X,Y)  \
   _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
 
diff --git a/libgcc/config/ia64/sfp-machine.h b/libgcc/config/ia64/sfp-machine.h
index e06bc9a..f7dd928 100644
--- a/libgcc/config/ia64/sfp-machine.h
+++ b/libgcc/config/ia64/sfp-machine.h
@@ -3,11 +3,6 @@
 #define _FP_WS_TYPEsigned long
 #define _FP_I_TYPE long
 
-typedef int TItype __attribute__ ((mode (TI)));
-typedef unsigned int UTItype __attribute__ ((mode (TI)));
-
-#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype))
-
 /* The type of the result of a floating point comparison.  This must
match `__libgcc_cmp_return__' in GCC for the target.  */
 typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
diff --git a/libgcc/config/tilegx/sfp-machine32.h 
b/libgcc/config/tilegx/sfp-machine32.h
index 31a2032..a921533 100644
--- a/libgcc/config/tilegx/sfp-machine32.h
+++ b/libgcc/config/tilegx/sfp-machine32.h
@@ -3,11 +3,6 @@
 #define _FP_WS_TYPEsigned long
 #define _FP_I_TYPE long
 
-typedef int TItype __attribute__ ((mode (TI)));
-typedef unsigned int UTItype __attribute__ ((mode (TI)));
-
-#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype))
-
 /* The type of the result of a floating point comparison.  This must
match `__libgcc_cmp_return__' in GCC for the target.  */
 typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
diff --git a/libgcc/config/tilegx/sfp-machine64.h 
b/libgcc/config/tilegx/sfp-machine64.h
index 7cf352e..2586dd5 100644
--- a/libgcc/config/tilegx/sfp-machine64.h
+++ b/libgcc/config/tilegx/sfp-machine64.h
@@ -3,11 +3,6 @@
 #define _FP_WS_TYPEsigned long
 #define _FP_I_TYPE long
 
-typedef int TItype __attribute__ ((mode (TI)));
-typedef unsigned int UTItype __attribute__ ((mode (TI)));
-
-#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype))
-
 /* The type of the result of a floating point comparison.  This must
match `__libgcc_cmp_return__' in GCC for the target.  */
 typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
diff --git a/libgcc/soft-fp/soft-fp.h b/libgcc/soft-fp/soft-fp.h
index 696fc86..b54b1ed 100644
--- a/libgcc/soft-fp/soft-fp.h
+++ b/libgcc/soft-fp/soft-fp.h
@@ -237,6 +237,11 @@ typedef int DItype __attribute__ ((mode (DI)));
 typedef unsigned int UQItype __attribute__ ((mode (QI)));
 typedef unsigned int USItype __attribute__ ((mode (SI)));
 typedef unsigned int UDItype __attribute__ ((mode (DI)));
+#if _FP_W_TYPE_SIZE == 64
+typedef int TItype __attribute__ ((mode (TI)));
+typedef unsigned int UTItype __attribute__ ((mode (TI)));
+#endif
+
 #if _FP_W_TYPE_SIZE == 32
 typedef unsigned int UHWtype __attribute__ ((mode (HI)));
 #elif _FP_W_TYPE_SIZE == 64
@@ -249,6 +254,9 @@ typedef USItype UHWtype;
 
 #define SI_BITS(__CHAR_BIT__ * (int) sizeof (SItype))
 #define DI_BITS(__CHAR_BIT__ * (int) sizeof (DItype))
+#if _FP_W_TYPE_SIZE == 64
+# 

[PATCH, AArch64 5/6] soft-fp: Define UDWtype for longlong.h

2014-01-08 Thread Richard Henderson
The documentation for longlong.h says this type must be defined.
We've gotten away with this because so far longlong.h hasn't
actually used the type.

libgcc/
* soft-fp/soft-fp.h: (UDWtype): New define.
---
 libgcc/soft-fp/soft-fp.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgcc/soft-fp/soft-fp.h b/libgcc/soft-fp/soft-fp.h
index b54b1ed..8f80ea6 100644
--- a/libgcc/soft-fp/soft-fp.h
+++ b/libgcc/soft-fp/soft-fp.h
@@ -248,6 +248,12 @@ typedef unsigned int UHWtype __attribute__ ((mode (HI)));
 typedef USItype UHWtype;
 #endif
 
+#if _FP_W_TYPE_SIZE == 32
+# define UDWtype   UDItype
+#elif _FP_W_TYPE_SIZE == 64
+# define UDWtype   UTItype
+#endif
+
 #ifndef CMPtype
 # define CMPtype   int
 #endif
-- 
1.8.4.2



[PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 01:45:40PM +0100, Jakub Jelinek wrote:
 I'd like to get rid of all the XCNEW calls in target-globals.c as a
 follow-up.

Here it is.  The rationale is both to avoid many separate heap allocations
and if TARGET_OPTION_NODE is no longer needed (all FUNCTION_DECLs
referencing it are e.g. optimized away, say static unused functions)
to avoid leaking memory.

Bootstrapped/regtested on x86_64-linux and i686-linux (together
with the i386 SWITCHABLE_TARGET patch).

Though, looking at the sizes, i686-linux allocates 0x67928
bytes which I think with ggc-page.c we allocate 0.5MB for it (acceptable),
on x86_64-linux the allocation size is 0x83aa8 and thus only ~ 15KB over
to fit into 0.5MB, thus I think we allocate 1MB.
So, if we wanted to tune for x86_64, we could not allocate say
target_flag_state (size 0x5008) in the big chunk, but instead make
it GTY((atomic)) and allocate separately.

Or perhaps do that for other very large structs?  In any case, that doesn't
look like something that probably would need to be retuned for every
release.

The current sizes of the structs are:
struct target_globals   0x800x40
struct target_flag_state0x200x20
struct target_regs  0x5008  0x5008
struct target_hard_regs 0x35c8  0x33f8
struct target_reload0xef70  0xef70
struct target_expmed0x180b0 0xf4b0
struct target_optabs0x4f0   0x4b9
struct target_cfgloop   0x1c0x1c
struct target_ira   0x9628  0x9620
struct target_ira_int   0x3fca8 0x322e4
struct target_lra_int   0xa718  0x4e70
struct target_builtins  0x268   0x268
struct target_gcse  0x620x62
struct target_bb_reorder0x4 0x4
struct target_lower_subreg  0x24c   0x18c

Perhaps use cut-off of 4KB with current sizes, anything below that
would be allocated in the single block, anything above it separately.
So 7 structs allocated together, 7 separately.

2014-01-08  Jakub Jelinek  ja...@redhat.com

* target-globals.c (save_target_globals): Allocate most of the
structs using GC in payload of target_globals struct instead
of allocating them on the heap.

--- gcc/target-globals.c.jj 2014-01-08 10:23:22.0 +0100
+++ gcc/target-globals.c2014-01-08 14:00:13.183231122 +0100
@@ -68,24 +68,43 @@ struct target_globals *
 save_target_globals (void)
 {
   struct target_globals *g;
-
-  g = ggc_alloc_target_globals ();
-  g-flag_state = XCNEW (struct target_flag_state);
-  g-regs = XCNEW (struct target_regs);
+  struct target_globals_extra {
+struct target_globals g;
+struct target_flag_state flag_state;
+struct target_regs regs;
+struct target_hard_regs hard_regs;
+struct target_reload reload;
+struct target_expmed expmed;
+struct target_optabs optabs;
+struct target_cfgloop cfgloop;
+struct target_ira ira;
+struct target_ira_int ira_int;
+struct target_lra_int lra_int;
+struct target_builtins builtins;
+struct target_gcse gcse;
+struct target_bb_reorder bb_reorder;
+struct target_lower_subreg lower_subreg;
+  } *p;
+  p = (struct target_globals_extra *)
+  ggc_internal_cleared_alloc_stat (sizeof (struct target_globals_extra)
+  PASS_MEM_STAT);
+  g = (struct target_globals *) p;
+  g-flag_state = p-flag_state;
+  g-regs = p-regs;
   g-rtl = ggc_alloc_cleared_target_rtl ();
-  g-hard_regs = XCNEW (struct target_hard_regs);
-  g-reload = XCNEW (struct target_reload);
-  g-expmed = XCNEW (struct target_expmed);
-  g-optabs = XCNEW (struct target_optabs);
+  g-hard_regs = p-hard_regs;
+  g-reload = p-reload;
+  g-expmed = p-expmed;
+  g-optabs = p-optabs;
   g-libfuncs = ggc_alloc_cleared_target_libfuncs ();
-  g-cfgloop = XCNEW (struct target_cfgloop);
-  g-ira = XCNEW (struct target_ira);
-  g-ira_int = XCNEW (struct target_ira_int);
-  g-lra_int = XCNEW (struct target_lra_int);
-  g-builtins = XCNEW (struct target_builtins);
-  g-gcse = XCNEW (struct target_gcse);
-  g-bb_reorder = XCNEW (struct target_bb_reorder);
-  g-lower_subreg = XCNEW (struct target_lower_subreg);
+  g-cfgloop = p-cfgloop;
+  g-ira = p-ira;
+  g-ira_int = p-ira_int;
+  g-lra_int = p-lra_int;
+  g-builtins = p-builtins;
+  g-gcse = p-gcse;
+  g-bb_reorder = p-bb_reorder;
+  g-lower_subreg = p-lower_subreg;
   restore_target_globals (g);
   init_reg_sets ();
   target_reinit ();


Jakub


[PATCH] Fix up ipa-prop caused -fcompare-debug failures (PR ipa/59722)

2014-01-08 Thread Jakub Jelinek
Hi!

The recent ipa_analyze_params_uses changes broke i686-linux bootstrap
with --enable-checking=release, the reduced testcase below shows it.
Obviously we need to ignore debug stmt uses during analysis.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2014-01-08  Jakub Jelinek  ja...@redhat.com

PR ipa/59722
* ipa-prop.c (ipa_analyze_params_uses): Ignore uses in debug stmts.

* gcc.dg/pr59722.c: New test.

--- gcc/ipa-prop.c.jj   2014-01-06 22:32:17.101586391 +0100
+++ gcc/ipa-prop.c  2014-01-08 16:07:29.203641224 +0100
@@ -2127,8 +2127,11 @@ ipa_analyze_params_uses (struct cgraph_n
  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, ddef)
if (!is_gimple_call (USE_STMT (use_p)))
  {
-   controlled_uses = IPA_UNDESCRIBED_USE;
-   break;
+   if (!is_gimple_debug (USE_STMT (use_p)))
+ {
+   controlled_uses = IPA_UNDESCRIBED_USE;
+   break;
+ }
  }
else
  controlled_uses++;
--- gcc/testsuite/gcc.dg/pr59722.c.jj   2014-01-08 16:06:34.325960016 +0100
+++ gcc/testsuite/gcc.dg/pr59722.c  2014-01-08 16:06:03.0 +0100
@@ -0,0 +1,36 @@
+/* PR ipa/59722 */
+/* { dg-do compile } */
+/* { dg-options -O2 -fcompare-debug } */
+
+extern void abrt (const char *, int) __attribute__((noreturn));
+void baz (int *, int *);
+
+static inline int
+bar (void)
+{
+  return 1;
+}
+
+static inline void
+foo (int *x, int y (void))
+{
+  while (1)
+{
+  int a = 0;
+  if (*x)
+   {
+ baz (x, a);
+ while (a  !y ())
+   ;
+ break;
+   }
+  abrt (, 1);
+}
+}
+
+void
+test (int x)
+{
+  foo (x, bar);
+  foo (x, bar);
+}

Jakub


C++ PATCH for c++/59614 (compile hog with lots of templates)

2014-01-08 Thread Jason Merrill
I was forgetting that recursing into template arguments would in turn 
recurse into their template arguments, leading to quadratic behavior. 
So, look at template arguments only once and add any inherited tags to 
the instantiated type.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f97c952a82d54a4cf0fc4583560de78589fa5664
Author: Jason Merrill ja...@redhat.com
Date:   Tue Jan 7 17:19:20 2014 -0500

	PR c++/59614
	* class.c (abi_tag_data): Add tags field.
	(check_abi_tags): Initialize it.
	(find_abi_tags_r): Support collecting missing tags.
	(mark_type_abi_tags): Don't look at template args.
	(inherit_targ_abi_tags): New.
	(check_bases_and_members): Use it.
	* cp-tree.h (ABI_TAG_IMPLICIT): New.
	* mangle.c (write_abi_tags): Check it.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index c961b22..0c3ce47 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1340,14 +1340,20 @@ struct abi_tag_data
 {
   tree t;
   tree subob;
+  // error_mark_node to get diagnostics; otherwise collect missing tags here
+  tree tags;
 };
 
 static tree
-find_abi_tags_r (tree *tp, int */*walk_subtrees*/, void *data)
+find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 {
   if (!OVERLOAD_TYPE_P (*tp))
 return NULL_TREE;
 
+  /* walk_tree shouldn't be walking into any subtrees of a RECORD_TYPE
+ anyway, but let's make sure of it.  */
+  *walk_subtrees = false;
+
   if (tree attributes = lookup_attribute (abi_tag, TYPE_ATTRIBUTES (*tp)))
 {
   struct abi_tag_data *p = static_caststruct abi_tag_data*(data);
@@ -1358,7 +1364,20 @@ find_abi_tags_r (tree *tp, int */*walk_subtrees*/, void *data)
 	  tree id = get_identifier (TREE_STRING_POINTER (tag));
 	  if (!IDENTIFIER_MARKED (id))
 	{
-	  if (TYPE_P (p-subob))
+	  if (p-tags != error_mark_node)
+		{
+		  /* We're collecting tags from template arguments.  */
+		  tree str = build_string (IDENTIFIER_LENGTH (id),
+	   IDENTIFIER_POINTER (id));
+		  p-tags = tree_cons (NULL_TREE, str, p-tags);
+		  ABI_TAG_IMPLICIT (p-tags) = true;
+
+		  /* Don't inherit this tag multiple times.  */
+		  IDENTIFIER_MARKED (id) = true;
+		}
+
+	  /* Otherwise we're diagnosing missing tags.  */
+	  else if (TYPE_P (p-subob))
 		{
 		  warning (OPT_Wabi_tag, %qT does not have the %E abi tag 
 			   that base %qT has, p-t, tag, p-subob);
@@ -1397,22 +1416,6 @@ mark_type_abi_tags (tree t, bool val)
 	  IDENTIFIER_MARKED (id) = val;
 	}
 }
-
-  /* Also mark ABI tags from template arguments.  */
-  if (CLASSTYPE_TEMPLATE_INFO (t))
-{
-  tree args = CLASSTYPE_TI_ARGS (t);
-  for (int i = 0; i  TMPL_ARGS_DEPTH (args); ++i)
-	{
-	  tree level = TMPL_ARGS_LEVEL (args, i+1);
-	  for (int j = 0; j  TREE_VEC_LENGTH (level); ++j)
-	{
-	  tree arg = TREE_VEC_ELT (level, j);
-	  if (CLASS_TYPE_P (arg))
-		mark_type_abi_tags (arg, val);
-	}
-	}
-}
 }
 
 /* Check that class T has all the abi tags that subobject SUBOB has, or
@@ -1424,13 +1427,50 @@ check_abi_tags (tree t, tree subob)
   mark_type_abi_tags (t, true);
 
   tree subtype = TYPE_P (subob) ? subob : TREE_TYPE (subob);
-  struct abi_tag_data data = { t, subob };
+  struct abi_tag_data data = { t, subob, error_mark_node };
 
   cp_walk_tree_without_duplicates (subtype, find_abi_tags_r, data);
 
   mark_type_abi_tags (t, false);
 }
 
+void
+inherit_targ_abi_tags (tree t)
+{
+  if (CLASSTYPE_TEMPLATE_INFO (t) == NULL_TREE)
+return;
+
+  mark_type_abi_tags (t, true);
+
+  tree args = CLASSTYPE_TI_ARGS (t);
+  struct abi_tag_data data = { t, NULL_TREE, NULL_TREE };
+  for (int i = 0; i  TMPL_ARGS_DEPTH (args); ++i)
+{
+  tree level = TMPL_ARGS_LEVEL (args, i+1);
+  for (int j = 0; j  TREE_VEC_LENGTH (level); ++j)
+	{
+	  tree arg = TREE_VEC_ELT (level, j);
+	  data.subob = arg;
+	  cp_walk_tree_without_duplicates (arg, find_abi_tags_r, data);
+	}
+}
+
+  // If we found some tags on our template arguments, add them to our
+  // abi_tag attribute.
+  if (data.tags)
+{
+  tree attr = lookup_attribute (abi_tag, TYPE_ATTRIBUTES (t));
+  if (attr)
+	TREE_VALUE (attr) = chainon (data.tags, TREE_VALUE (attr));
+  else
+	TYPE_ATTRIBUTES (t)
+	  = tree_cons (get_identifier (abi_tag), data.tags,
+		   TYPE_ATTRIBUTES (t));
+}
+
+  mark_type_abi_tags (t, false);
+}
+
 /* Run through the base classes of T, updating CANT_HAVE_CONST_CTOR_P,
and NO_CONST_ASN_REF_P.  Also set flag bits in T based on
properties of the bases.  */
@@ -5431,6 +5471,9 @@ check_bases_and_members (tree t)
   bool saved_nontrivial_dtor;
   tree fn;
 
+  /* Pick up any abi_tags from our template arguments before checking.  */
+  inherit_targ_abi_tags (t);
+
   /* By default, we use const reference arguments and generate default
  constructors.  */
   cant_have_const_ctor = 0;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index bdae500..96af562f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -65,6 +65,7 @@ 

[GOOGLE] Remove mod_id_to_name map

2014-01-08 Thread Dehao Chen
This patch removes mod_id_to_name map because the info is already
there in module_infos. And also, AutoFDO don't have access to update
this map because its a file-static structure.

Bootstrapped and passed regression test.

OK for google branch?

Thanks,
Dehao

Index: gcc/coverage.c
===
--- gcc/coverage.c (revision 206366)
+++ gcc/coverage.c (working copy)
@@ -615,37 +615,17 @@ reorder_module_groups (const char *imports_file, u
   module_name_tab.dispose ();
 }

-typedef struct {
-  unsigned int mod_id;
-  const char *mod_name;
-} mod_id_to_name_t;
-
-static vecmod_id_to_name_t *mod_names;
-
-static void
-record_module_name (unsigned int mod_id, const char *name)
-{
-  mod_id_to_name_t t;
-
-  t.mod_id = mod_id;
-  t.mod_name = xstrdup (name);
-  if (!mod_names)
-vec_alloc (mod_names, 10);
-  mod_names-safe_push (t);
-}
-
 /* Return the module name for module with MOD_ID.  */

 const char *
 get_module_name (unsigned int mod_id)
 {
   size_t i;
-  mod_id_to_name_t *elt;

-  for (i = 0; mod_names-iterate (i, elt); i++)
+  for (i = 0; i  num_in_fnames; i++)
 {
-  if (elt-mod_id == mod_id)
-return elt-mod_name;
+  if (module_infos[i]-ident == mod_id)
+return lbasename (module_infos[i]-source_filename);
 }

   gcc_assert (0);
@@ -927,9 +907,6 @@ read_counts_file (const char *da_file_name, unsign
  }
 }

-  record_module_name (mod_info-ident,
-  lbasename (mod_info-source_filename));
-
   if (dump_enabled_p ())
 {
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,


Re: PATCH: PR target/59587: cpu_names in i386.c is accessed with wrong index

2014-01-08 Thread H.J. Lu
On Wed, Dec 25, 2013 at 2:32 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Dec 25, 2013 at 10:31 PM, H.J. Lu hjl.to...@gmail.com wrote:

  cpu_names in i386.c is only used by ix86_function_specific_print 
  which
  accesses it with enum processor_type index. But cpu_names is 
  defined as
  array with enum target_cpu_default index.  This patch adds 
  processor
  names to processor_target_table and uses processor_target_table 
  instead
  of cpu_names.  It removes cpu_names and target_cpu_default.  
  Tested on
  Linux/x86-64.  OK to install?
 
  Wait a moment,
 
  it looks to me that TARGET_CPU_DEFAULT has to be synchronized with
  const processor_alias_table, so we are able to define various ISA
  extensions by selecting TARGET_CPU_*. The TARGET_CPU_DEFAULT can 
  then
 
  TARGET_CPU_DEFAULT sets the default -mtune=, not -march=.
 
  be used to select extensions in the same way as PROCESSOR_* selects
  tuning for certain processor.
 
  It has been like this for a long time.  For x86, TARGET_CPU_DEFAULT
  isn't defined no matter which configure options are used.  We can
  change config.gcc to set TARGET_CPU_DEFAULT to proper PROCESSOR_XXX 
  or
  set it to a string xxx for processor xxx.
  But GCC driver always passes -march=/-mtune= to toplev.c so that
  TARGET_CPU_DEFAULT is normally used.
 
  I meant to say TARGET_CPU_DEFAULT isn't normally used.
 
 
  Let me rethink this a bit, please do not commit the patch.
 
 
  TARGET_CPU_DEFAULT is left over for 32-bit target before --with-arch=
  and --with-cpu= were added.  Today, -mtune=xxx -march=xxx are
  always passed to cc1 by GCC driver.  If cc1 is run by hand and
  -mtune=xxx -march=xxx aren't passed to cc1, we should do
 
  1. For 64-bit, it should be the same as -mtune=generic -march=x86_64
  are passed.
  2. For 32-bit, it should be the same as -mtune=cpu -march=cpu are
  passed, where cpu is the target cpu used to configure GCC,
  like i386 in i386-linux, i486 in i486-linux,  But there is no i786
  cpu.  i786 is treated as i686.  If SUBTARGET32_DEFAULT_CPU
  is defined, it should be the same -mtune=SUBTARGET32_DEFAULT_CPU
  -march=SUBTARGET32_DEFAULT_CPU.
 
  Here is the patch to implement this.
 
  Let's do one step at a time. So, let's split the patch back to 
  target/59587 fix:

 2013-12-25   H.J. Lu  hongjiu...@intel.com

 PR target/59587
 * config/i386/i386.c (struct ptt): Add a field for processor
 name.
 (processor_target_table): Sync with processor_type.  Add
 processor names.
 (cpu_names): Removed.
 (ix86_option_override_internal): Default x_ix86_tune_string
 to processor_target_table[TARGET_CPU_DEFAULT].name.
 (ix86_function_specific_print): Assert arch and tune 
 PROCESSOR_max.  Use processor_target_table to print arch and
 tune names.
 * config/i386/i386.h (TARGET_CPU_DEFAULT): Default to
 PROCESSOR_GENERIC.
 (target_cpu_default): Removed.
 (processor_type): Reordered.

 OK for mainline and for 4.8 after a few days in mainline.

 Thanks,
 Uros.

I am testing this patch.  I will check it into 4.8 branch after
finishing regression test.

Thanks.


-- 
H.J.
---
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6493bb2..f17bf56 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,24 @@
+2014-01-08   H.J. Lu  hongjiu...@intel.com
+
+ Backport from mainline
+ 2013-12-25   H.J. Lu  hongjiu...@intel.com
+
+ PR target/59587
+ * config/i386/i386.c (struct ptt): Add a field for processor
+ name.
+ (processor_target_table): Sync with processor_type.  Add
+ processor names.
+ (cpu_names): Removed.
+ (ix86_option_override_internal): Default x_ix86_tune_string
+ to processor_target_table[TARGET_CPU_DEFAULT].name.
+ (ix86_function_specific_print): Assert arch and tune 
+ PROCESSOR_max.  Use processor_target_table to print arch and
+ tune names.
+ * config/i386/i386.h (TARGET_CPU_DEFAULT): Default to
+ PROCESSOR_GENERIC32.
+ (target_cpu_default): Removed.
+ (processor_type): Reordered.
+
 2014-01-08  Uros Bizjak  ubiz...@gmail.com

  Backport from mainline
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e03aa72..c06c220 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2409,6 +2409,7 @@ static tree ix86_veclibabi_acml (enum
built_in_function, tree, tree);
 /* Processor target table, indexed by processor number */
 struct ptt
 {
+  const char *const name; /* processor name  */
   const struct processor_costs *cost; /* Processor costs */
   const int align_loop; /* Default alignments.  */
   const int align_loop_max_skip;
@@ -2417,66 +2418,31 @@ struct ptt
   const int align_func;
 };

+/* This table must be in sync with enum processor_type in i386.h.  */
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {i386_cost, 4, 3, 4, 3, 4},
-  {i486_cost, 16, 15, 16, 15, 16},
-  {pentium_cost, 16, 7, 16, 7, 16},
-  {pentiumpro_cost, 16, 15, 16, 10, 16},
-  {geode_cost, 0, 0, 

PR 59137: Incorrect liveness info during dbr_schedule

2014-01-08 Thread Richard Sandiford
PR 59137 is another case where dbr_schedule gets confused about liveness.
We start out with:

A: $2 = x
B: if $4 == $2 goto L1  [REG_DEAD: $2]
C: if $4  0 goto L2
   ...
L1:
D: $2 = y
E: goto L3
L2:
F: $2 = x
G: goto L3
   ...
L3:
   ...
   return $2

We fill G's delay slot in the obvious way:

L2:
G: goto L3
F:   $2 = x

Then we try to steal G's delay slot for C.  F is obviously redundant
with A in this context, so we drop it and end with a simple threaded
branch to L3:

A: $2 = x
B: if $4 == $2 goto L1  [REG_DEAD: $2]
C: if $4  0 goto L3

The problem is that the REG_DEAD note is no longer accurate, so when
we go on to fill B's delay slot we mistakenly think that we can use D:

A: $2 = x
B: if $4 == $2 goto L3
D:   $2 = y
C: if $4  0 goto L3

and so the return value for $4  0 changes from x to y.

reorg's mechanism for handling deleted redundant instructions seems
to be update_block, which adds a USE containing the redundant instruction
just before the place that it was supposed to occur.  The patch therefore
uses update_block in steal_delay_list_from_target.

I went through the other calls to redundant_insn and a few of them
also seem to be missing an update_block.  I don't have testcases
for these though, so it's going to be be a matter of opinion whether
adding them or leaving them out is the defensive thing to do.  I'm happy
either way.

(redundant_insn is pretty conservative, so the branch whose delay slot
we're trying to fill can never be the one that makes a delay slot redundant.
It must always be an instruction from before the branch.  So inserting the
(use ...) immediately before the branch should be correct.)

Tested on mips64-linux-gnu.  OK for trunk?  OK for 4.8?

Thanks,
Richard


gcc/
PR rtl-optimization/59137
* reorg.c (steal_delay_list_from_target): Call update_block for
elided insns.
(steal_delay_list_from_fallthrough, relax_delay_slots): Likewise.

gcc/testsuite/
PR rtl-optimization/59137
* gcc.target/mips/pr59137.c: New test.

Index: gcc/reorg.c
===
--- gcc/reorg.c 2014-01-08 18:04:23.420954812 +
+++ gcc/reorg.c 2014-01-08 19:17:12.005446964 +
@@ -1093,6 +1093,7 @@ steal_delay_list_from_target (rtx insn,
   int used_annul = 0;
   int i;
   struct resources cc_set;
+  bool *redundant;
 
   /* We can't do anything if there are more delay slots in SEQ than we
  can handle, or if we don't know that it will be a taken branch.
@@ -1133,6 +1134,7 @@ steal_delay_list_from_target (rtx insn,
 return delay_list;
 #endif
 
+  redundant = XALLOCAVEC (bool, XVECLEN (seq, 0));
   for (i = 1; i  XVECLEN (seq, 0); i++)
 {
   rtx trial = XVECEXP (seq, 0, i);
@@ -1154,7 +1156,8 @@ steal_delay_list_from_target (rtx insn,
 
   /* If this insn was already done (usually in a previous delay slot),
 pretend we put it in our delay slot.  */
-  if (redundant_insn (trial, insn, new_delay_list))
+  redundant[i] = redundant_insn (trial, insn, new_delay_list);
+  if (redundant[i])
continue;
 
   /* We will end up re-vectoring this branch, so compute flags
@@ -1187,6 +1190,12 @@ steal_delay_list_from_target (rtx insn,
return delay_list;
 }
 
+  /* Record the effect of the instructions that were redundant and which
+ we therefore decided not to copy.  */
+  for (i = 1; i  XVECLEN (seq, 0); i++)
+if (redundant[i])
+  update_block (XVECEXP (seq, 0, i), insn);
+
   /* Show the place to which we will be branching.  */
   *pnew_thread = first_active_target_insn (JUMP_LABEL (XVECEXP (seq, 0, 0)));
 
@@ -1250,6 +1259,7 @@ steal_delay_list_from_fallthrough (rtx i
   /* If this insn was already done, we don't need it.  */
   if (redundant_insn (trial, insn, delay_list))
{
+ update_block (trial, insn);
  delete_from_delay_slot (trial);
  continue;
}
@@ -3236,6 +3246,7 @@ relax_delay_slots (rtx first)
 to reprocess this insn.  */
   if (redundant_insn (XVECEXP (pat, 0, 1), delay_insn, 0))
{
+ update_block (XVECEXP (pat, 0, 1), insn);
  delete_from_delay_slot (XVECEXP (pat, 0, 1));
  next = prev_active_insn (next);
  continue;
@@ -3355,6 +3366,7 @@ relax_delay_slots (rtx first)
   redirect_with_delay_slots_safe_p (delay_insn, target_label,
   insn))
{
+ update_block (XVECEXP (PATTERN (trial), 0, 1), insn);
  reorg_redirect_jump (delay_insn, target_label);
  next = insn;
  continue;
Index: gcc/testsuite/gcc.target/mips/pr59137.c
===
--- /dev/null   2013-12-26 20:29:50.272541227 +
+++ gcc/testsuite/gcc.target/mips/pr59137.c 2014-01-08 19:17:12.006448250 

[MIPS, committed] Revert some Octeon BADDU patches

2014-01-08 Thread Richard Sandiford
This patch just reverts some changes I'd made to the BADDU patterns
for the infamous (truncate:QI (plus:SI ...)) - (plus:QI ...) simplification.
That simplification was limited to CISCy targets for PR 58295.

Tested on mips64-linux-gnu and applied.  It fixes the octeon-baddu-1.c
failures.

Thanks,
Richard


gcc/
Revert:
2012-10-07  Richard Sandiford  rdsandif...@googlemail.com

* config/mips/mips.c (mips_truncated_op_cost): New function.
(mips_rtx_costs): Adjust test for BADDU.
* config/mips/mips.md (*baddu_dimode): Push truncates to operands.

2012-10-02  Richard Sandiford  rdsandif...@googlemail.com

* config/mips/mips.md (*baddu_si_eb, *baddu_si_el): Merge into...
(*baddu_si): ...this new pattern.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  2014-01-02 22:16:09.486330453 +
+++ gcc/config/mips/mips.c  2014-01-08 10:42:17.727013965 +
@@ -3634,17 +3634,6 @@ mips_set_reg_reg_cost (enum machine_mode
 }
 }
 
-/* Return the cost of an operand X that can be trucated for free.
-   SPEED says whether we're optimizing for size or speed.  */
-
-static int
-mips_truncated_op_cost (rtx x, bool speed)
-{
-  if (GET_CODE (x) == TRUNCATE)
-x = XEXP (x, 0);
-  return set_src_cost (x, speed);
-}
-
 /* Implement TARGET_RTX_COSTS.  */
 
 static bool
@@ -4037,13 +4026,12 @@ mips_rtx_costs (rtx x, int code, int out
 case ZERO_EXTEND:
   if (outer_code == SET
   ISA_HAS_BADDU
+  (GET_CODE (XEXP (x, 0)) == TRUNCATE
+ || GET_CODE (XEXP (x, 0)) == SUBREG)
   GET_MODE (XEXP (x, 0)) == QImode
-  GET_CODE (XEXP (x, 0)) == PLUS)
+  GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS)
{
- rtx plus = XEXP (x, 0);
- *total = (COSTS_N_INSNS (1)
-   + mips_truncated_op_cost (XEXP (plus, 0), speed)
-   + mips_truncated_op_cost (XEXP (plus, 1), speed));
+ *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed);
  return true;
}
   *total = mips_zero_extend_cost (mode, XEXP (x, 0));
Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md 2014-01-08 10:29:42.171963087 +
+++ gcc/config/mips/mips.md 2014-01-08 10:38:05.799078793 +
@@ -1312,20 +1312,32 @@ (define_insn_and_split *addsi3_extended
 
 ;; Combiner patterns for unsigned byte-add.
 
-(define_insn *baddu_si
+(define_insn *baddu_si_eb
   [(set (match_operand:SI 0 register_operand =d)
 (zero_extend:SI
-(plus:QI (match_operand:QI 1 register_operand d)
- (match_operand:QI 2 register_operand d]
-  ISA_HAS_BADDU
+(subreg:QI
+ (plus:SI (match_operand:SI 1 register_operand d)
+  (match_operand:SI 2 register_operand d)) 3)))]
+  ISA_HAS_BADDU  BYTES_BIG_ENDIAN
+  baddu\\t%0,%1,%2
+  [(set_attr alu_type add)])
+
+(define_insn *baddu_si_el
+  [(set (match_operand:SI 0 register_operand =d)
+(zero_extend:SI
+(subreg:QI
+ (plus:SI (match_operand:SI 1 register_operand d)
+  (match_operand:SI 2 register_operand d)) 0)))]
+  ISA_HAS_BADDU  !BYTES_BIG_ENDIAN
   baddu\\t%0,%1,%2
   [(set_attr alu_type add)])
 
 (define_insn *baddu_dimode
   [(set (match_operand:GPR 0 register_operand =d)
 (zero_extend:GPR
-(plus:QI (truncate:QI (match_operand:DI 1 register_operand d))
- (truncate:QI (match_operand:DI 2 register_operand d)]
+(truncate:QI
+ (plus:DI (match_operand:DI 1 register_operand d)
+  (match_operand:DI 2 register_operand d)]
   ISA_HAS_BADDU  TARGET_64BIT
   baddu\\t%0,%1,%2
   [(set_attr alu_type add)])


Re: Drop -m32 from pr59099.c

2014-01-08 Thread Uros Bizjak
Hello!

 gcc.target/i386/pr59099.c fails on x86_64-redhat-linux-gnu with
 --disable-multilib because linking -m32 code is not supported.  The
 test case passes in 64-bit mode as well.  The other -m32 tests do
 not use dg-do run, so they do not exhibit this problem.

 Okay for trunk?

 No, this IMHO really should be:
 /* { dg-do run { target { ia32  fpic } } } */
 /* { dg-options -O2 -fPIC } */

 All tests in gcc.target/i386 having -m32 (or -m64) in dg-options
 are buggy and should be fixed, either by adding { target ia32 }
 to their dg-do compile or whatever other dg-do they have, or
 adding
 /* { dg-require-effective-target ia32 } */
 line and dropping the -m32 from dg-options.

I have committed following testsuite patch that removes -m32 from
options. Also, the patch includes check for fpic effective target when
-fpic is used.

2014-01-08  Uros Bizjak  ubiz...@gmail.com

* gcc.target/i386/asm-1.c: Remove dg-options.
* gcc.target/i386/incoming-5.c (dg-options): Remove -m32.
* gcc.target/i386/pr55433.c (dg-options): Ditto.
* gcc.target/i386/pr57848.c (dg-options): Ditto.
* gcc.target/i386/pr59099.c (dg-options): Ditto.
Require fpic effective target.
* gcc.target/i386/pr56246.c (dg-do): Compile for fpic target only.

Tested on x86_64-pc-linux-gnu {,-m32}, will be committed to mainline
in a moment.

Uros.
Index: gcc.target/i386/asm-1.c
===
--- gcc.target/i386/asm-1.c (revision 206436)
+++ gcc.target/i386/asm-1.c (working copy)
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options -m32 } */
 
 register unsigned int EAX asm (r14); /* { dg-error register name } */
 
Index: gcc.target/i386/incoming-5.c
===
--- gcc.target/i386/incoming-5.c(revision 206436)
+++ gcc.target/i386/incoming-5.c(working copy)
@@ -1,6 +1,6 @@
 /* PR middle-end/37009 */
 /* { dg-do compile { target { { ! *-*-darwin* }  ia32 } } } */
-/* { dg-options -m32 -mincoming-stack-boundary=2 
-mpreferred-stack-boundary=2 } */
+/* { dg-options -mincoming-stack-boundary=2 -mpreferred-stack-boundary=2 } */
 
 extern void bar (double *);
 
Index: gcc.target/i386/pr55433.c
===
--- gcc.target/i386/pr55433.c   (revision 206436)
+++ gcc.target/i386/pr55433.c   (working copy)
@@ -1,5 +1,5 @@
-/* { dg-do compile {target { *-*-darwin* } } } */
-/* { dg-options -O1 -m32 } */
+/* { dg-do compile { target { *-*-darwin* } } } */
+/* { dg-options -O1 } */
 
 typedef unsigned long long tick_t;
 extern int foo(void);
Index: gcc.target/i386/pr56246.c
===
--- gcc.target/i386/pr56246.c   (revision 206436)
+++ gcc.target/i386/pr56246.c   (working copy)
@@ -1,5 +1,5 @@
 /* PR target/56225 */
-/* { dg-do compile { target { ia32 } } } */
+/* { dg-do compile { target { ia32  fpic } } } */
 /* { dg-options -O2 -fno-omit-frame-pointer -march=i686 -fpic } */
 
 void NoBarrier_AtomicExchange (long long *ptr) {
Index: gcc.target/i386/pr57848.c
===
--- gcc.target/i386/pr57848.c   (revision 206436)
+++ gcc.target/i386/pr57848.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O1 -m32 } */
+/* { dg-options -O1 } */
 
 extern unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int);
 #pragma GCC target(sse4.2)
Index: gcc.target/i386/pr59099.c
===
--- gcc.target/i386/pr59099.c   (revision 206436)
+++ gcc.target/i386/pr59099.c   (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fPIC -m32 } */
+/* { dg-require-effective-target fpic } */
+/* { dg-options -O2 -fPIC } */
 
 void (*pfn)(void);
 


Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs

2014-01-08 Thread Richard Sandiford
Jakub Jelinek ja...@redhat.com writes:
 2014-01-08  Jakub Jelinek  ja...@redhat.com

   * target-globals.c (save_target_globals): Allocate most of the
   structs using GC in payload of target_globals struct instead
   of allocating them on the heap.

Looks good to me FWIW.  I don't know either way about the one-big-blob thing.

Note that we'll still leak memory when deleting TARGET_OPTION_NODEs
because target_ira_int and target_lra_int have pointers to heap-allocated
storage.

Thanks,
Richard


Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.

2014-01-08 Thread Teresa Johnson
On Tue, Jan 7, 2014 at 8:07 AM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 01/05/2014 05:10 PM, Teresa Johnson wrote:

 On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt ber...@codesourcery.com
 wrote:

 I have a different patch which I'll submit next week after some more
 testing. The assert in cfgrtl is unnecessarily broad and really only
 needs
 to trigger if -freorder-blocks-and-partition; there's nothing wrong with
 entering cfglayout after normal bb-reorder.


 Currently -freorder-blocks-and-partition is the default for x86. I
 assume that hw-doloop is not enabled for any i386 targets, which is
 why we haven't seen this?


 Precisely.


 And will this mean that -freorder-blocks-and-partition cannot be used
 for the targets that use hw-doloop? If so, should
 -freorder-blocks-and-partition be prevented with a warning for those
 targets?


 If someone explicitly chooses that option we can turn off the reordering in
 hw-doloop. That should happen sufficiently rarely that it isn't a problem.
 That's what the patch below does - bootstraped on x86_64-linux, tested there
 and with bfin-elf. Ok?

Ok, looks good to me.



 I've also tested that Blackfin still benefits from the hw-doloop
 reordering
 code and generates more hardware loops if it's enabled. So we want to be
 able to run it at -O2.


 I looked at hw-doloop briefly and since it seems to be doing some
 manual bb reordering I guess it can't simply be moved before bbro. It
 seems like a better long-term solution would be to make bbro
 hw-doloop-aware as Felix suggested earlier.


 Maybe. It could be argued that the code in hw-doloop is relevant only for a
 small class of targets so it should only be enabled for them. In any case,
 that's not stage 3 material and two ports are broken...

Ok, that makes sense. Thanks, Teresa



 Bernd




-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: PR 59137: Incorrect liveness info during dbr_schedule

2014-01-08 Thread Steven Bosscher
On Wed, Jan 8, 2014 at 8:27 PM, Richard Sandiford wrote:
 gcc/
 PR rtl-optimization/59137
 * reorg.c (steal_delay_list_from_target): Call update_block for
 elided insns.
 (steal_delay_list_from_fallthrough, relax_delay_slots): Likewise.

 gcc/testsuite/
 PR rtl-optimization/59137
 * gcc.target/mips/pr59137.c: New test.

This is OK for trunk. For release branches I'll defer to the RMs.

Ciao!
Steven


Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 07:41:26PM +, Richard Sandiford wrote:
 Jakub Jelinek ja...@redhat.com writes:
  2014-01-08  Jakub Jelinek  ja...@redhat.com
 
  * target-globals.c (save_target_globals): Allocate most of the
  structs using GC in payload of target_globals struct instead
  of allocating them on the heap.
 
 Looks good to me FWIW.  I don't know either way about the one-big-blob thing.
 
 Note that we'll still leak memory when deleting TARGET_OPTION_NODEs
 because target_ira_int and target_lra_int have pointers to heap-allocated
 storage.

Yeah, perhaps that is something to fix incrementally.

But, at least we will not leak ~ 0.5MB per (unique) target attribute
used on some unused function.

Jakub


PING: PATCH: PRs bootstrap/59580/59583: Improve x86 --with-arch/--with-cpu= configure handling

2014-01-08 Thread H.J. Lu
On Mon, Dec 23, 2013 at 6:14 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Dec 22, 2013 at 11:11:12PM +0100, Uros Bizjak wrote:

 Please get someone to review config.gcc changes. They are OK as far as
 x86 rename is concerned, but I can't review functional changes.

 Hi Paolo,

 Can you review this config.gcc change?


  @@ -588,6 +588,22 @@ esac
   # Common C libraries.
   tm_defines=$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3
 
  +# 32-bit x86 processors supported by --with-arch=.  Each processor
  +# MUST be separated by exactly one space.
  +x86_archs=athlon athlon-4 athlon-fx athlon-mp athlon-tbird \
  +athlon-xp k6 k6-2 k6-3 geode c3 c3-2 winchip-c6 winchip2 i386 i486 \
  +i586 i686 pentium pentium-m pentium-mmx pentium2 pentium3 pentium3m \
  +pentium4 pentium4m pentiumpro prescott

 Missing native.

 x86_archs contains 32-bit x86 processors.  native is allowed for
 64-bit targets and is included in x86_64_archs.  64-bit processors
 can be used in --with-arch/--with-cpu= for 32-bit targets.

 Here is a patch to improve x86 x86 --with-arch/--with-cpu= configure
 handling.  This patch defines 3 variables:

 1. x86_archs: It contains 32-bit x86 processors supported by
 --with-arch=, which aren't allowed for 64-bit targets.
 2. x86_64_archs: It contains 64-bit x86 processors supported by
 --with-arch=, which are allowed for both 32-bit and 64-bit targets.
 3. x86_cpus.  It contains x86 processors supported by --with-cpu=,
 which are allowed for both 32-bit and 64-bit targets.

 Each processor in those 3 variables are separated by exactly one space.

 Instead of checking if a value of --with-arch/--with-cpu= is valid in many
 difference places with

 case ${val} in
 valid pattern list)
   OK
   ;;
 *)
   error
   exit 1
   ;;
 esac

 and updating all pattern lists when adding a new processor, this patch
 uses

 case  valid processor list separated by exactly one space  in
 * ${val} *)
   OK
   ;;
 *)
   error
   exit 1
   ;;
 esac

 valid processor list separated by exactly one space is combination
 of 3 processor variables.  It only needs separate a check for empty
 value with

 if test x${val} != x; then
   $val isn't empty
 else
   $val is empty
 fi

 With this approach, we only need to add new 32-bit processors to x86_archs
 and new 64-bit processors to x86_64_archs.  They will be supported by
 --with-arch/--with-cpu= automatically.  OK to install?

 Thanks.


 H.J.
 ---
 2013-12-23   H.J. Lu  hongjiu...@intel.com

 PR bootstrap/59580
 PR bootstrap/59583
 * config.gcc (x86_archs): New variable.
 (x86_64_archs): Likewise.
 (x86_cpus): Likewise.
 Use $x86_archs, $x86_64_archs and $x86_cpus to check valid
 --with-arch/--with-cpu= options.
 Support --with-arch=/--with-cpu={nehalem,westmere,
 sandybridge,ivybridge,haswell,broadwell,bonnell,silvermont}.

 diff --git a/gcc/config.gcc b/gcc/config.gcc
 index 24dbaf9..51eb2b1 100644
 --- a/gcc/config.gcc
 +++ b/gcc/config.gcc
 @@ -588,6 +588,22 @@ esac
  # Common C libraries.
  tm_defines=$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3

 +# 32-bit x86 processors supported by --with-arch=.  Each processor
 +# MUST be separated by exactly one space.
 +x86_archs=athlon athlon-4 athlon-fx athlon-mp athlon-tbird \
 +athlon-xp k6 k6-2 k6-3 geode c3 c3-2 winchip-c6 winchip2 i386 i486 \
 +i586 i686 pentium pentium-m pentium-mmx pentium2 pentium3 pentium3m \
 +pentium4 pentium4m pentiumpro prescott
 +# 64-bit x86 processors supported by --with-arch=.  Each processor
 +# MUST be separated by exactly one space.
 +x86_64_archs=amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
 +bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
 +core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
 +sandybridge ivybridge haswell broadwell bonnell silvermont x86-64 native
 +# Additional x86 processors supported by --with-cpu=.  Each processor
 +# MUST be separated by exactly one space.
 +x86_cpus=generic intel
 +
  # Common parts for widely ported systems.
  case ${target} in
  *-*-darwin*)
 @@ -1392,20 +1408,21 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | 
 i[34567]86-*-knetbsd*-gnu | i
 done
 TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 
 's/^,//'`
 need_64bit_isa=yes
 -   case X${with_cpu} in
 -   
 Xgeneric|Xintel|Xatom|Xslm|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver4|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 -   ;;
 -   X)
 +   if test x$with_cpu = x; then
 if test x$with_cpu_64 = x; then
 with_cpu_64=generic
 fi
 -   ;;
 - 

[PATCH,rs6000,committed] Remove duplicates from altivec_overloaded_builtins

2014-01-08 Thread Bill Schmidt
This patch removes a couple of redundant entries I noticed in
altivec_overloaded_builtins.  Identical entries occur nearby.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
regressions, applied as obvious.

Thanks,
Bill


2014-01-08  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
two duplicate entries.


Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 206375)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -608,10 +608,6 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH,
 RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V8HI, 0, 0 },
-  { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW,
-RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
-  { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW,
-RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, P8V_BUILTIN_VUPKHSW,
 RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, P8V_BUILTIN_VUPKHSW,




Re: [PATCH, AArch64 5/6] soft-fp: Define UDWtype for longlong.h

2014-01-08 Thread Joseph S. Myers
soft-fp patches should go first to glibc.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, AArch64 4/6] soft-fp: Commonize creation of TImode types

2014-01-08 Thread Joseph S. Myers
On Wed, 8 Jan 2014, Richard Henderson wrote:

 diff --git a/libgcc/soft-fp/soft-fp.h b/libgcc/soft-fp/soft-fp.h
 index 696fc86..b54b1ed 100644
 --- a/libgcc/soft-fp/soft-fp.h
 +++ b/libgcc/soft-fp/soft-fp.h
 @@ -237,6 +237,11 @@ typedef int DItype __attribute__ ((mode (DI)));
  typedef unsigned int UQItype __attribute__ ((mode (QI)));
  typedef unsigned int USItype __attribute__ ((mode (SI)));
  typedef unsigned int UDItype __attribute__ ((mode (DI)));
 +#if _FP_W_TYPE_SIZE == 64
 +typedef int TItype __attribute__ ((mode (TI)));
 +typedef unsigned int UTItype __attribute__ ((mode (TI)));
 +#endif

This isn't the right conditional.  _FP_W_TYPE_SIZE is ultimately an 
optimization choice and need not be related to whether any TImode 
functions are being defined using soft-fp, or whether TImode is supported 
at all.  I think the most you can do is have sfp-machine.h define a macro 
to say that TImode should be supported in soft-fp, rather than actually 
defining the types itself.

(If someone were to use soft-fp on hppa64, then they might well use 
_FP_W_TYPE_SIZE == 64, but hppa64 doesn't support TImode.)

-- 
Joseph S. Myers
jos...@codesourcery.com


microMIPS jump instructions

2014-01-08 Thread Moore, Catherine
Hi Richard,

It looks like the microMIPS implementation is missing support for the JRC 
instruction and also misses an opportunity to generate JALS.
I've attached a patch, plus some new test cases to correct this.  Does this 
look okay to commit?  I'd like to get it in 4.9.

Thanks,
Catherine



jrc-jals.cl
Description: jrc-jals.cl


jrc-jals.patch
Description: jrc-jals.patch


[PATCH] Fix for PR 59524

2014-01-08 Thread Iyer, Balaji V
Hello Everyone,
Attached, please find a patch will fix the bug mentioned in PR 59524. 
The main issue was that Cilk keywords tests are running even when the user 
configured the compiler with --disable-libcilkrts. This patch should fix this 
issue for C and C++. This is tested on x86 and x86_64.

Here are the ChangeLog entries

gcc/testsuite/ChangeLog
+2014-01-08  Balaji V. Iyer  balaji.v.i...@intel.com
+
+   PR testsuite/59524
+   * gcc.dg/cilk-plus/cilk-plus.exp: Make sure the cilk keywords tests
+   are run only if the Cilk library is available/enabled.
+   * g++.dg/cilk-plus/cilk-plus.exp: Likewise.
+   * lib/target-supports.exp (check_libcilkrts_available): New function.
+

Is this Ok for trunk?

Thanks,

Balaji V. Iyer.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 519d472..e0a0e43 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2014-01-08  Balaji V. Iyer  balaji.v.i...@intel.com
+
+   PR testsuite/59524
+   * gcc.dg/cilk-plus/cilk-plus.exp: Make sure the cilk keywords tests
+   are run only if the Cilk library is available/enabled.
+   * g++.dg/cilk-plus/cilk-plus.exp: Likewise.
+   * lib/target-supports.exp (check_libcilkrts_available): New function.
+
 2014-01-07  Yufeng Zhang  yufeng.zh...@arm.com
 
* gcc.target/arm/neon/vst1Q_laneu64-1.c: New test.
diff --git a/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp 
b/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
index e201fd2..b08be25 100644
--- a/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
+++ b/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
@@ -47,9 +47,7 @@ dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/AN/*.c]]  -g
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]]  
-g -O2 -ftree-vectorize -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]]  
-g -O3 -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]]  
-O3 -ftree-vectorize -fcilkplus -g  
-dg-finish
 
-dg-init
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  
-fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -O0 
-fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -O1 
-fcilkplus  
@@ -61,25 +59,17 @@ dg-runtest [lsort [glob -nocomplain 
$srcdir/g++.dg/cilk-plus/AN/*.cc]]  -g -O1
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -g 
-O2 -ftree-vectorize -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -g 
-O3 -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -O3 
-ftree-vectorize -fcilkplus -g  
-dg-finish
 
-dg-init
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  -O1 
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  -O2 
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  -O3 
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  -g 
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  -g 
-O2 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  -g 
-O3 -fcilkplus  
-dg-finish
+if { [check_libcilkrts_available] } {
+dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-O1 -fcilkplus  
+dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-O3 -fcilkplus  
+dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-g -fcilkplus  
+dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-g -O2 -fcilkplus  
 
-dg-init
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O1 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O2 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-O3 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-g -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-g -O2 -fcilkplus  
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]]  
-g -O3 -fcilkplus  
+dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O1 -fcilkplus  
+dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O3 -fcilkplus  
+dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -g -fcilkplus  
+dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -g -O2 -fcilkplus  
+  }
 dg-finish
 unset TEST_EXTRA_LIBS
diff --git 

Re: microMIPS jump instructions

2014-01-08 Thread Richard Sandiford
Moore, Catherine catherine_mo...@mentor.com writes:
 2014-01-08  Catherine Moore  c...@codesourcery.com
 
   gcc/testsuite/
   * gcc.target/mips/umips-branch-3.c: New test.
   * gcc.target/mips/umips-branch-4.c: New test.
 
   gcc/
   * config/mips/mips.md (simple_return): Attempt to use JRC for microMIPS.
   * config/mips/mips.h (MIPS_CALL): Attempt to use JALS for microMIPS.

OK, thanks, but:

 Index: gcc/config/mips/mips.md
 ===
 --- gcc/config/mips/mips.md   (revision 206407)
 +++ gcc/config/mips/mips.md   (working copy)
 @@ -1,5 +1,5 @@
  ;;  Mips.md   Machine Description for MIPS based processors
 -;;  Copyright (C) 1989-2014 Free Software Foundation, Inc.
 +;;  Copyright (C) 1989-2013 Free Software Foundation, Inc.
  ;;  Contributed by   A. Lichnewsky, l...@inria.inria.fr
  ;;  Changes by   Michael Meissner, meiss...@osf.org
  ;;  64-bit r4000 support by Ian Lance Taylor, i...@cygnus.com, and

please drop this bit.

Richard


Re: [PATCH,rs6000,committed] Remove duplicates from altivec_overloaded_builtins

2014-01-08 Thread David Edelsohn
On Wed, Jan 8, 2014 at 3:15 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 This patch removes a couple of redundant entries I noticed in
 altivec_overloaded_builtins.  Identical entries occur nearby.

 Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
 regressions, applied as obvious.

 Thanks,
 Bill


 2014-01-08  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
 two duplicate entries.

Okay, good catch.

Thanks, David


Re: [PATCH,rs6000] Add -maltivec={le,be} options

2014-01-08 Thread David Edelsohn
On Tue, Jan 7, 2014 at 6:59 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 On Tue, 2014-01-07 at 22:18 +, Joseph S. Myers wrote:
 On Tue, 7 Jan 2014, Bill Schmidt wrote:

  Yes, sorry for not being more clear.  This is indeed for interpretation
  of element numbers in Altivec intrinsics such as vec_splat, vec_extract,
  vec_insert, and so forth.  By default these will match array element
  order for the target endianness.  But with -maltivec=be for a little
  endian target, we will force use of big-endian element order (matching
  the behavior of the underlying hardware instructions).

 Thanks for the explanation.  I think you should make the .texi
 documentation say something more like this.


 Sure, I can wordsmith something along those lines.  Thanks for the
 feedback!

This patch is okay with the documentation clarification requested by Joseph.

I also would suggest removing but may be enabled in the future from
the le option and limit the comment to ignored on big-endian
targets.

Also, please add a comment to -maltivec that it defaults to the native
endian order.  And for -maltivec=be, please state that this is the
default for big-endian; for -maltivec=le, please state that this is
the default for little-endian. It's important to be clear and
redundant in this type of documentation.

Thanks, David


[patch][i386] Remove code executed only if reload_in_progress (i.e. never)

2014-01-08 Thread Steven Bosscher
Hello Uros, and everyone else,

Now that LRA is always used for the i386 targets, reload_in_progress
is never set so all code conditional on it is now dead. The attached
patch removes this code.

Sadly I'm having difficulty testing the patch because I have no access
to a suitable x86_64 or ix86 box :-) I'll try to test the patch on a
compile farm machine, but I'm already posting the patch to hear if
this is still OK for this late stage of the development cycle. It's
not as if we're going to go back to reload so the code really is dead
AFAICT, but it's obviously not a bug fix.

Ciao!
Steven

* i386/i386.c (legitimize_pic_address): Remove never-executed code,
reload_in_progress is never set if LRA is used.
(legitimize_tls_address): Likewise.
(ix86_expand_move): Likewise.
(ix86_expand_binary_operator): Likewise.
(ix86_expand_unary_operator): Likewise.
* i386/predicates.md (index_register_operand): Likewise.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 206444)
+++ config/i386/i386.c  (working copy)
@@ -13013,11 +13013,7 @@ legitimize_pic_address (rtx orig, rtx reg)
ix86_cmodel != CM_SMALL_PIC  gotoff_operand (addr, Pmode))
 {
   rtx tmpreg;
-  /* This symbol may be referenced via a displacement from the PIC
-base address (@GOTOFF).  */
 
-  if (reload_in_progress)
-   df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
   if (GET_CODE (addr) == CONST)
addr = XEXP (addr, 0);
   if (GET_CODE (addr) == PLUS)
@@ -13046,11 +13042,6 @@ legitimize_pic_address (rtx orig, rtx reg)
 }
   else if (!TARGET_64BIT  !TARGET_PECOFF  gotoff_operand (addr, Pmode))
 {
-  /* This symbol may be referenced via a displacement from the PIC
-base address (@GOTOFF).  */
-
-  if (reload_in_progress)
-   df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
   if (GET_CODE (addr) == CONST)
addr = XEXP (addr, 0);
   if (GET_CODE (addr) == PLUS)
@@ -13108,11 +13099,6 @@ legitimize_pic_address (rtx orig, rtx reg)
}
   else
{
- /* This symbol must be referenced via a load from the
-Global Offset Table (@GOT).  */
-
- if (reload_in_progress)
-   df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_GOT);
  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
  if (TARGET_64BIT)
@@ -13164,8 +13150,6 @@ legitimize_pic_address (rtx orig, rtx reg)
{
  if (!TARGET_64BIT)
{
- if (reload_in_progress)
-   df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0),
UNSPEC_GOTOFF);
  new_rtx = gen_rtx_PLUS (Pmode, new_rtx, op1);
@@ -13453,8 +13437,6 @@ legitimize_tls_address (rtx x, enum tls_model mode
}
   else if (flag_pic)
{
- if (reload_in_progress)
-   df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
  pic = pic_offset_table_rtx;
  type = TARGET_ANY_GNU_TLS ? UNSPEC_GOTNTPOFF : UNSPEC_GOTTPOFF;
}
@@ -16644,10 +16626,8 @@ ix86_expand_move (enum machine_mode mode, rtx oper
  /* dynamic-no-pic */
  if (MACHOPIC_INDIRECT)
{
- rtx temp = ((reload_in_progress
-  || ((op0  REG_P (op0))
-   mode == Pmode))
- ? op0 : gen_reg_rtx (Pmode));
+ rtx temp = (op0  REG_P (op0)  mode == Pmode)
+ ? op0 : gen_reg_rtx (Pmode);
  op1 = machopic_indirect_data_reference (op1, temp);
  if (MACHOPIC_PURE)
op1 = machopic_legitimize_pic_address (op1, mode,
@@ -17318,16 +17298,9 @@ ix86_expand_binary_operator (enum rtx_code code, e
  /* Emit the instruction.  */
 
   op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_ee (code, mode, src1, src2));
-  if (reload_in_progress)
-{
-  /* Reload doesn't know about the flags register, and doesn't know that
- it doesn't want to clobber it.  We can only do this with PLUS.  */
-  gcc_assert (code == PLUS);
-  emit_insn (op);
-}
-  else if (reload_completed
-   code == PLUS
-   !rtx_equal_p (dst, src1))
+  if (reload_completed
+   code == PLUS
+   !rtx_equal_p (dst, src1))
 {
   /* This is going to be an LEA; avoid splitting it later.  */
   emit_insn (op);
@@ -17494,13 +17467,8 @@ ix86_expand_unary_operator (enum rtx_code code, en
   /* Emit the instruction.  */
 
   op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_e (code, mode, src));
-  if (reload_in_progress || code == NOT)
-{
-  /* Reload doesn't know about the flags register, and doesn't know that
-

Re: [PATCH,rs6000] Add -maltivec={le,be} options

2014-01-08 Thread Bill Schmidt
On Wed, 2014-01-08 at 16:46 -0500, David Edelsohn wrote:
 On Tue, Jan 7, 2014 at 6:59 PM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  On Tue, 2014-01-07 at 22:18 +, Joseph S. Myers wrote:
  On Tue, 7 Jan 2014, Bill Schmidt wrote:
 
   Yes, sorry for not being more clear.  This is indeed for interpretation
   of element numbers in Altivec intrinsics such as vec_splat, vec_extract,
   vec_insert, and so forth.  By default these will match array element
   order for the target endianness.  But with -maltivec=be for a little
   endian target, we will force use of big-endian element order (matching
   the behavior of the underlying hardware instructions).
 
  Thanks for the explanation.  I think you should make the .texi
  documentation say something more like this.
 
 
  Sure, I can wordsmith something along those lines.  Thanks for the
  feedback!
 
 This patch is okay with the documentation clarification requested by Joseph.
 
 I also would suggest removing but may be enabled in the future from
 the le option and limit the comment to ignored on big-endian
 targets.
 
 Also, please add a comment to -maltivec that it defaults to the native
 endian order.  And for -maltivec=be, please state that this is the
 default for big-endian; for -maltivec=le, please state that this is
 the default for little-endian. It's important to be clear and
 redundant in this type of documentation.
 
 Thanks, David
 

OK, thanks very much for the review.  I'll clean up the documentation as
requested this evening.

Thanks,
Bill



Re: [patch][i386] Remove code executed only if reload_in_progress (i.e. never)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 10:51:53PM +0100, Steven Bosscher wrote:
 Hello Uros, and everyone else,
 
 Now that LRA is always used for the i386 targets, reload_in_progress
 is never set so all code conditional on it is now dead. The attached
 patch removes this code.
 
 Sadly I'm having difficulty testing the patch because I have no access
 to a suitable x86_64 or ix86 box :-) I'll try to test the patch on a
 compile farm machine, but I'm already posting the patch to hear if
 this is still OK for this late stage of the development cycle. It's
 not as if we're going to go back to reload so the code really is dead
 AFAICT, but it's obviously not a bug fix.

While LRA is always on, making it harder to test with reload doesn't seem to
be a good idea to me for 4.9, when some RA issue is reported for these
architectures, often one just patches config/i386/i386.c by hand to enable
reload instead of LRA and tests it with that instead.  This patch would mean
we'd need to keep around a patchset to apply for those purposes.

   * i386/i386.c (legitimize_pic_address): Remove never-executed code,
   reload_in_progress is never set if LRA is used.
   (legitimize_tls_address): Likewise.
   (ix86_expand_move): Likewise.
   (ix86_expand_binary_operator): Likewise.
   (ix86_expand_unary_operator): Likewise.
   * i386/predicates.md (index_register_operand): Likewise.

config/ prefix would be needed in the ChangeLog entries.

Jakub


Re: [RFC] libgcov.c re-factoring and offline profile-tool

2014-01-08 Thread Rong Xu
Here is the patch that addresses Honza's concern about bss increment.
It just makes this_prg a local variable.

Some comments are inlined.

On Fri, Dec 6, 2013 at 6:23 AM, Jan Hubicka hubi...@ucw.cz wrote:

 
 Do you know how the size of libgcov changed with your patch?
 Quick check of current mainline on compiling empty main gives:

 jh@gcc10:~/trunk/build/gcc$ cat t.c
 main()
 {
 }
 jh@gcc10:~/trunk/build/gcc$ ./xgcc -B ./ -O2 -fprofile-generate -o a.out-new 
 --static t.c
 jh@gcc10:~/trunk/build/gcc$ gcc -O2 -fprofile-generate -o a.out-old --static 
 t.c
 jh@gcc10:~/trunk/build/gcc$ size a.out-old
textdata bss dec hex filename
  6081413560   16728  628429   996cd a.out-old
 jh@gcc10:~/trunk/build/gcc$ size a.out-new
textdata bss dec hex filename
  6126213688   22880  639189   9c0d5 a.out-new

 Without profiling I get:
 jh@gcc10:~/trunk/build/gcc$ size a.out-new-no
 jh@gcc10:~/trunk/build/gcc$ size a.out-old-no
textdata bss dec hex filename
  5997193448   12568  615735   96537 a.out-old-no
textdata bss dec hex filename
  6002473448   12568  616263   96747 a.out-new-no

 Quite big for empty program, but mostly glibc fault, I suppose
 (that won't be an issue for embedded platforms). But anyway
 we increased text size overhead from 8k to 12k, BSS size
 overhead from 4k to 10k and data by another 1k.


I think it would more fair to compare r204729 and r204730. Your
comparison had some other changes in libgcov such as time_profiler and
indirecto_call_profiler_v2.

Using the same empty t.c, for r204729, we have
xur2%208:gcc  ./xgcc -B ./ -O2 -fprofile-generate --static -o
a.out-r204729 t.c
xur2%209:gcc  size a.out-r204729
   text   databssdechex filename
 803207   6352  15448 825007  c96af a.out-r204729
xur2%210:gcc  ./xgcc -B ./ -O2 --static -o a.out-r204729-no t.c
xur2%211:gcc  size a.out-r204729-no
   text   databssdechex filename
 790337   6112  11336 807785  c5369 a.out-r204729-no

For r204730, we have
xur2%216:gcc  ./xgcc -B ./ -O2 -fprofile-generate --static -o
a.out-r204730 t.c
xur2%217:gcc  size a.out-r204730
   text   databssdechex filename
 802919   6384  21592 830895  cadaf a.out-r204730
xur2%218:gcc  ./xgcc -B ./ -O2  --static -o a.out-r204730-no t.c
xur2%219:gcc  size a.out-r204730-no
   text   databssdechex filename
 790337   6112  11336 807785  c5369 a.out-r204730-no

r204730 actually has smaller text, data size with -fprofile-generate.
You are right about there are 6kb more bss space due to the static
variables introduced. It mostly caused by this_prg object.

With the attached trunk patch that localizes this_prg, we have
xur2%42:fdo  size a.out-new
   text   databssdechex filename
 803479   6456  15512 825447  c9867 a.out-new
xur2%43:fdo  size a.out-new-no
   text   databssdechex filename
 790545   6112  11368 808025  c5459 a.out-new-no

We are now using 64 more bytes in m64.

Objects size for r204730:
   text   databssdechex filename
 57  0  0 57 39 _gcov_average_profiler.o
 66  0  0 66 42 _gcov_dump.o
516  0  0516204 _gcov_execle.o
476  0  04761dc _gcov_execl.o
476  0  04761dc _gcov_execlp.o
108  0  0108 6c _gcov_execve.o
 98  0  0 98 62 _gcov_execv.o
 98  0  0 98 62 _gcov_execvp.o
126  0 40166 a6 _gcov_flush.o
101  0  0101 65 _gcov_fork.o
122  0  0122 7a _gcov_indirect_call_profiler.o
178  0 16194 c2 _gcov_indirect_call_profiler_v2.o
 89  0  0 89 59 _gcov_interval_profiler.o
 52  0  0 52 34 _gcov_ior_profiler.o
126  0  0126 7e _gcov_merge_add.o
242  0  0242 f2 _gcov_merge_delta.o
126  0  0126 7e _gcov_merge_ior.o
251  0  0251 fb _gcov_merge_single.o
156  0  0156 9c _gcov_merge_time_profile.o
   9252  0   6144  15396   3c24 _gcov.o
115  0  0115 73 _gcov_one_value_profiler.o
 69  0  0 69 45 _gcov_pow2_profiler.o
 66  0  0 66 42 _gcov_reset.o
 77  0  8 85 55 _gcov_time_profiler.o

Objects size for r204729:
   text   databssdechex filename
 57  0  0 57 39 _gcov_average_profiler.o
 72  0  0 72 48 _gcov_dump.o
516  0  0516204 _gcov_execle.o
476  0  04761dc _gcov_execl.o
476  0  04761dc _gcov_execlp.o
108  0  0108 6c _gcov_execve.o
 98  0  0 98 62 _gcov_execv.o
 98  0  0 98 62 _gcov_execvp.o
101  0  0101 65 _gcov_fork.o
122  0  0122 7a 

Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Tim Shen
On Wed, Jan 8, 2014 at 5:20 AM, Paolo Carlini paolo.carl...@oracle.com wrote:
 On 01/08/2014 10:24 AM, Jonathan Wakely wrote:
 Ouch! Yes, that's quite a bit slower, and this code is already very
 slow to compile.

With this patch (who is based on a-fixed.diff, committed earlerly),
who use templated member functions instead of templating the whole
_Compiler, time consumption is:
g++ -g -Wall -std=c++11 -g -Wall -std=c++11 -O3 regextest.cc  3.79s
user 0.14s system 98% cpu 3.981 total

Comparing to 4.5s it's better and probably fine.

Booted and tested with -m32 and -m64 respectively.

 I only want to add that, besides keeping compile-time under control for
 4.9.0 - please investigate a bit more along the mentioned lines - we should
 also start experimenting with exporting the instantiations. I don't know
 what the other implementations are doing, but in general it definitely makes
 sense, for compile-time performance too. I think we already said that some
 time ago, but the issue seems more important now. Maybe it's really
 unavoidable if we need template complexity for first class run-time
 performance.

After this patch I plan to instantiate _Compiler and _Executor.


-- 
Regards,
Tim Shen


[MIPS, committed] Fix all but one gcc.dg/tree-ssa failure

2014-01-08 Thread Richard Sandiford
Some of the tests were failing due to the branch cost and some were
failing due to !LOGICAL_OP_NON_SHORT_CIRCUIT.  I just skipped the
latter, as for ARM Cortex-M.

I'll look at the gcc.dg/tree-ssa/ssa-dom-thread-4.c failure separately.

Tested on mips64-linux-gnu and applied.

Thanks,
Richard


gcc/testsuite/
* gcc.dg/tree-ssa/reassoc-32.c, gcc.dg/tree-ssa/reassoc-33.c,
gcc.dg/tree-ssa/reassoc-34.c, gcc.dg/tree-ssa/reassoc-35.c,
gcc.dg/tree-ssa/reassoc-36.c: Extend -mbranch-cost handling to MIPS.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c,
gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c,
gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c,
gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c,
gcc.dg/tree-ssa/vrp87.c, gcc.dg/tree-ssa/forwprop-28.c: Skip for MIPS.

Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c  2014-01-08 22:11:48.552943720 
+
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c  2014-01-08 22:11:50.069956983 
+
@@ -1,7 +1,7 @@
 /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* 
picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-*} } } */
 
 /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */
-/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */
+/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */
 
 
 int test (int a, int b, int c)
Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c  2014-01-08 22:11:48.553943729 
+
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c  2014-01-08 22:11:50.070956992 
+
@@ -1,7 +1,7 @@
 /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* 
picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-* hppa*-*-*} } } */
 
 /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */
-/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */
+/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */
 
 int test (int a, int b, int c)
 {
Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c  2014-01-08 22:11:48.552943720 
+
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c  2014-01-08 22:11:50.070956992 
+
@@ -1,7 +1,7 @@
 /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* 
picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-* hppa*-*-*} } } */
 
 /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */
-/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */
+/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */
 
 int test (int a, int b, int c)
 {
Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c  2014-01-08 22:11:48.553943729 
+
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c  2014-01-08 22:11:50.070956992 
+
@@ -1,7 +1,7 @@
 /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* 
picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-* hppa*-*-*} } } */
 
 /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */
-/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */
+/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */
 
 int test (unsigned int a, int b, int c)
 {
Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c  2014-01-08 22:11:48.553943729 
+
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c  2014-01-08 22:11:50.070956992 
+
@@ -1,7 +1,7 @@
 /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* 
picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-* hppa*-*-*} } } */
 
 /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */
-/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */
+/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */
 
 int test (int a, int b, int c)
 {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c2014-01-08 
22:11:48.552943720 +
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c2014-01-08 
22:11:50.070956992 +
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* 

Re: [RFC] libgcov.c re-factoring and offline profile-tool

2014-01-08 Thread Rong Xu
On Wed, Dec 18, 2013 at 9:28 AM, Xinliang David Li davi...@google.com wrote:

  #ifdef L_gcov_merge_ior
  /* The profile merging function that just adds the counters.  It is given
 -   an array COUNTERS of N_COUNTERS old counters and it reads the same 
 number
 -   of counters from the gcov file.  */
 +   an array COUNTERS of N_COUNTERS old counters.
 +   When SRC==NULL, it reads the same number of counters from the gcov file.
 +   Otherwise, it reads from SRC array.  */
  void
 -__gcov_merge_ior (gcov_type *counters, unsigned n_counters)
 +__gcov_merge_ior (gcov_type *counters, unsigned n_counters,
 +  gcov_type *src, unsigned w __attribute__ ((unused)))

 So the new in-memory variants are introduced for merging tool, while libgcc 
 use gcov_read_counter
 interface?
 Perhaps we can actually just duplicate the functions to avoid runtime to do 
 all the scalling
 and in_mem tests it won't need?


 I thought about this one a little. How about making the interface
 change conditionally, but still share the implementation?  The merge
 function bodies mostly remain unchanged and there is no runtime
 penalty for libgcov.  The new macros can be shared across most of the
 mergers.

 #ifdef IN_PREOFILE_TOOL
 #define GCOV_MERGE_EXTRA_ARGS  gcov_type *src, unsigned w
 #define GCOV_READ_COUNTER  *(src++) * w
 #else
 #define GCOV_MERGE_EXTRA_ARGS
 #define GCOV_READ_COUNTER gcov_read_counter ()
 #endif

 __gcov_merge_add (gcov_type *counters, unsigned n_counters,
   GCOV_MERGE_EXTRA_ARGS)
 {

  for (; n_counters; counters++, n_counters--)
   {
   *counters += GCOV_READ_COUNTER ;
}

 }

 thanks,

Personally I don't think the run time test of in_mem will cause any
issue. This is in profile dumping, why don't we care a few more cycle
heres? it won't pollute the profile.

If you really don't like that, we can use the above approach, or I can
hide the logic in gcov_read_counter(), i.e. overload
gcov_read_counter() in profile_tool. For that, I will need a new
global variable SRC and set it before calling the merge function.
I would prefer to keep weight in _gcov_merge_* argument list.

What do you think?

-Rong


 David


 I would suggest going with libgcov.h changes and clenaups first, with 
 interface changes next
 and the gcov-tool is probably quite obvious at the end?
 Do you think you can split the patch this way?

 Thanks and sorry for taking long to review. I should have more time again 
 now.
 Honza


Re: [MIPS, committed] Revert some Octeon BADDU patches

2014-01-08 Thread Eric Botcazou
 This patch just reverts some changes I'd made to the BADDU patterns
 for the infamous (truncate:QI (plus:SI ...)) - (plus:QI ...)
 simplification. That simplification was limited to CISCy targets for PR
 58295.
 
 Tested on mips64-linux-gnu and applied.  It fixes the octeon-baddu-1.c
 failures.

You presumably need to apply it to the 4.8 branch as well.

-- 
Eric Botcazou


Re: [RFC] libgcov.c re-factoring and offline profile-tool

2014-01-08 Thread Rong Xu
On Fri, Dec 6, 2013 at 6:23 AM, Jan Hubicka hubi...@ucw.cz wrote:
 @@ -325,6 +311,9 @@ static struct gcov_summary all_prg;
  #endif
  /* crc32 for this program.  */
  static gcov_unsigned_t crc32;
 +/* Use this summary checksum rather the computed one if the value is
 + *non-zero.  */
 +static gcov_unsigned_t saved_summary_checksum;

 Why do you need to save the checksum? Won't it reset summary back with 
 multiple streaming?

This was for the gcov_tool. checksum will be recomputed in gcov_exit
and the value will depend on
the order of gcov_info list. (the order will be different after
reading from gcda files to memory). The purpose was
to have the same summary_checksum so that I can get identical gcov-dump output.


 I would really like to avoid introducing those static vars that are used 
 exclusively
 by gcov_exit.  What about putting them into an gcov_context structure that
 is passed around the functions that was broken out?

With my recently patch the localizes this_prg, we only use 64 more
bytes in bss. Do you still we have to remove
all these statics?




Re: [PATCH] Fix PR59471

2014-01-08 Thread Jakub Jelinek
On Tue, Jan 07, 2014 at 03:54:56PM +0100, Richard Biener wrote:
 2014-01-07  Richard Biener  rguent...@suse.de
 
   PR middle-end/59471
   * gimplify.c (gimplify_expr): Gimplify register-register type
   VIEW_CONVERT_EXPRs to separate stmts.
 
   * gcc.dg/pr59471.c: New testcase.

The testcase fails on i686-linux, because of the ABI warnings.

I've verified following change ICEd without your fix and works with your
fix, bootstrapped/regtested it on x86_64-linux and i686-linux and committed
to trunk as obvious.

2014-01-08  Jakub Jelinek  ja...@redhat.com

PR middle-end/59471
* gcc.dg/pr59471.c (foo): Avoid vector type arguments or return
type, use pointers to vector type instead.

--- gcc/testsuite/gcc.dg/pr59471.c.jj   2014-01-08 10:23:20.0 +0100
+++ gcc/testsuite/gcc.dg/pr59471.c  2014-01-08 17:52:42.0 +0100
@@ -9,8 +9,8 @@ __attribute__ ((__vector_size__ (16)));
 typedef unsigned int uint32x4_t
 __attribute__ ((__vector_size__ (16)));
 
-uint8x4_t
-foo (uint16x8_t x)
+void
+foo (uint16x8_t *x, uint8x4_t *y)
 {
-  return (uint8x4_t) ((uint32x4_t) x)[0];
+  *y = (uint8x4_t) ((uint32x4_t) (*x))[0];
 }


Jakub


Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Paolo Carlini

Hi,

On 01/08/2014 11:11 PM, Tim Shen wrote:

On Wed, Jan 8, 2014 at 5:20 AM, Paolo Carlini paolo.carl...@oracle.com wrote:

On 01/08/2014 10:24 AM, Jonathan Wakely wrote:

Ouch! Yes, that's quite a bit slower, and this code is already very
slow to compile.

With this patch (who is based on a-fixed.diff, committed earlerly),
who use templated member functions instead of templating the whole
_Compiler, time consumption is:
g++ -g -Wall -std=c++11 -g -Wall -std=c++11 -O3 regextest.cc  3.79s
user 0.14s system 98% cpu 3.981 total

Comparing to 4.5s it's better and probably fine.

Booted and tested with -m32 and -m64 respectively.

I agree, it's probably fine for now, but please actually attach the patch ;)

Paolo.



Fix segfault with weak external symbols

2014-01-08 Thread Eric Botcazou
This is a regression present on the mainline for weak external symbols and 
languages with non-call exceptions:

0xb222df crash_signal
/home/eric/svn/gcc/gcc/toplev.c:337
0x75ed9c symtab_alias_ultimate_target(symtab_node*, availability*)
/home/eric/svn/gcc/gcc/symtab.c:989
0xb69a59 varpool_variable_node
/home/eric/svn/gcc/gcc/cgraph.h:1430
0xb69a59 tree_could_trap_p(tree_node*)
/home/eric/svn/gcc/gcc/tree-eh.c:2691
0xb6a85c stmt_could_throw_1_p
/home/eric/svn/gcc/gcc/tree-eh.c:2751
0xb6a85c stmt_could_throw_p(gimple_statement_base*)
/home/eric/svn/gcc/gcc/tree-eh.c:2780
0xb6d46f lower_eh_constructs_2
/home/eric/svn/gcc/gcc/tree-eh.c:2028
0xb6d46f lower_eh_constructs_1
/home/eric/svn/gcc/gcc/tree-eh.c:2123
0xb6f871 lower_eh_constructs
/home/eric/svn/gcc/gcc/tree-eh.c:2141
0xb6f871 execute
/home/eric/svn/gcc/gcc/tree-eh.c:2193
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.

In tree_could_trap_p:

case VAR_DECL:
  /* Assume that accesses to weak vars may trap, unless we know
 they are certainly defined in current TU or in some other
 LTO partition.  */
  if (DECL_WEAK (expr))
{
  struct varpool_node *node;
  if (!DECL_EXTERNAL (expr))
return false;
  node = varpool_variable_node (varpool_get_node (expr), NULL);
  if (node  node-symbol.in_other_partition)
return false;
  return true;
}
  return false;

The problem is that varpool_get_node returns NULL and varpool_variable_node
(and its callee symtab_alias_ultimate_target) chokes on the NULL.  This is
a regression from the 4.8.x series, where the same NULL goes through the
function without a hitch.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2014-01-08  Eric Botcazou  ebotca...@adacore.com

* cgraph.h (varpool_variable_node): Do not choke on null node.


2014-01-08  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/weak2.ad[sb]: New test.


-- 
Eric BotcazouIndex: cgraph.h
===
--- cgraph.h	(revision 206418)
+++ cgraph.h	(working copy)
@@ -1426,8 +1426,12 @@ varpool_variable_node (varpool_node *nod
 {
   varpool_node *n;
 
-  n = dyn_cast varpool_node (symtab_alias_ultimate_target (node,
-			 availability));
+  if (node)
+n = dyn_cast varpool_node (symtab_alias_ultimate_target (node,
+			   availability));
+  else
+n = NULL;
+
   if (!n  availability)
 *availability = AVAIL_NOT_AVAILABLE;
   return n;
-- { dg-do compile }

package body Weak2 is

   function F return Integer is
   begin
  return Var;
   end;

end Weak2;
package Weak2 is

   Var : Integer;
   pragma Import (Ada, Var, var_name);
   pragma Weak_External (Var);

   function F return Integer;

end Weak2;


[PATCH] Fix cfgcleanup regression (PR rtl-optimization/59724)

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 05:54:55PM +0100, Uros Bizjak wrote:
 This caused PR59724 on alpha:
 
 20021116-1.c: In function ‘foo’:
 20021116-1.c:31:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 9
  }
  ^
 20021116-1.c:31:1: error: insn outside basic block
 (jump_insn 94 52 93 9 (return) 20021116-1.c:31 -1
  (nil)
  - return)

Ugh, indeed.  The problem is that try_head_merge_bb really wants
flow_find_head_matching_sequence to count all (non-note) insns, not
just active insns, because otherwise as in the above testcase we
can have e.g. 2 active insns followed by one non-active, all matching
(flow_find_head_matching_sequence returns 2) and on another edge
just 2 active insns and nothing else matching.  2 == 2, so the caller
thinks it doesn't matter which one is shorter, but we have the insn range
of 3 insns together.

So, this patch just reverts the try_head_merge_bb changes and makes
flow_find_head_matching_sequence behave the old way when called from
try_head_merge_bb, i.e. count all non-note insns, and only when called
from ifcvt.c count just active insns.  Plus the ifcvt.c change ensures
we don't mistakenly call it with stop_after == 0 (which wouldn't actually
stop).

Bootstrapped/regtested on x86_64-linux and i686-linux, Uros is testing it
on Alpha.  Ok for trunk?

2014-01-08  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/59724
* ifcvt.c (cond_exec_process_if_block): Don't call
flow_find_head_matching_sequence with 0 longest_match.
* cfgcleanup.c (flow_find_head_matching_sequence): Count even
non-active insns if !stop_after.
(try_head_merge_bb): Revert 2014-01-07 changes.

--- gcc/ifcvt.c.jj  2014-01-08 10:23:20.0 +0100
+++ gcc/ifcvt.c 2014-01-08 18:46:17.017715169 +0100
@@ -522,7 +522,10 @@ cond_exec_process_if_block (ce_if_block
  n_insns -= 2 * n_matching;
}
 
-  if (then_start  else_start)
+  if (then_start
+  else_start
+  then_n_insns  n_matching
+  else_n_insns  n_matching)
{
  int longest_match = MIN (then_n_insns - n_matching,
   else_n_insns - n_matching);
--- gcc/cfgcleanup.c.jj 2014-01-07 08:54:05.772736321 +0100
+++ gcc/cfgcleanup.c2014-01-08 18:41:14.433307914 +0100
@@ -1421,7 +1421,8 @@ flow_find_cross_jump (basic_block bb1, b
 /* Like flow_find_cross_jump, except start looking for a matching sequence from
the head of the two blocks.  Do not include jumps at the end.
If STOP_AFTER is nonzero, stop after finding that many matching
-   instructions.  */
+   instructions.  If STOP_AFTER is zero, count all INSN_P insns, if it is
+   non-zero, only count active insns.  */
 
 int
 flow_find_head_matching_sequence (basic_block bb1, basic_block bb2, rtx *f1,
@@ -1493,7 +1494,7 @@ flow_find_head_matching_sequence (basic_
 
  beforelast1 = last1, beforelast2 = last2;
  last1 = i1, last2 = i2;
- if (active_insn_p (i1))
+ if (!stop_after || active_insn_p (i1))
ninsns++;
}
 
@@ -2408,7 +2409,9 @@ try_head_merge_bb (basic_block bb)
   max_match--;
   if (max_match == 0)
return false;
-  e0_last_head = prev_active_insn (e0_last_head);
+  do
+   e0_last_head = prev_real_insn (e0_last_head);
+  while (DEBUG_INSN_P (e0_last_head));
 }
 
   if (max_match == 0)
@@ -2428,14 +2431,16 @@ try_head_merge_bb (basic_block bb)
   basic_block merge_bb = EDGE_SUCC (bb, ix)-dest;
   rtx head = BB_HEAD (merge_bb);
 
-  if (!active_insn_p (head))
-   head = next_active_insn (head);
+  while (!NONDEBUG_INSN_P (head))
+   head = NEXT_INSN (head);
   headptr[ix] = head;
   currptr[ix] = head;
 
   /* Compute the end point and live information  */
   for (j = 1; j  max_match; j++)
-   head = next_active_insn (head);
+   do
+ head = NEXT_INSN (head);
+   while (!NONDEBUG_INSN_P (head));
   simulate_backwards_to_point (merge_bb, live, head);
   IOR_REG_SET (live_union, live);
 }


Jakub


Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Tim Shen
On Wed, Jan 8, 2014 at 5:38 PM, Paolo Carlini paolo.carl...@oracle.com wrote:
 I agree, it's probably fine for now, but please actually attach the patch ;)

Oops sorry .


So my plan is to instantiate _Compiler and _Executor instead of user
interfaces like basic_regex or regex_match, because the implementation
may change (say add a new executor) later. Is that Ok?


-- 
Regards,
Tim Shen
commit d9f47e783680a1cab86bd704e67236025cbdff18
Author: tim timshe...@gmail.com
Date:   Mon Jan 6 00:03:41 2014 -0500

2014-01-08  Tim Shen  timshe...@gmail.com

* bits/regex_automaton.tcc: Indentation fix.
* bits/regex_compiler.h (__compile_nfa(), _Compiler,
_RegexTranslator _AnyMatcher, _CharMatcher,
_BracketMatcher): Add bool option template parameters and
specializations to make matching more efficient and space saving.
* bits/regex_compiler.tcc: Likewise.

diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc 
b/libstdc++-v3/include/bits/regex_automaton.tcc
index 7edc67f..e222803 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -134,9 +134,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _NFA_TraitsT::_M_dot(std::ostream __ostr) const
 {
   __ostr  digraph _Nfa {\n
- rankdir=LR;\n;
+ rankdir=LR;\n;
   for (size_t __i = 0; __i  this-size(); ++__i)
-(*this)[__i]._M_dot(__ostr, __i);
+   (*this)[__i]._M_dot(__ostr, __i);
   __ostr  }\n;
   return __ostr;
 }
diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 4ac67df..b73fe30 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -39,7 +39,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  templatetypename _TraitsT
+  templatetypename, bool, bool
 struct _BracketMatcher;
 
   /// Builds an NFA from an input iterator interval.
@@ -63,7 +63,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef typename _ScannerT::_TokenT _TokenT;
   typedef _StateSeq_TraitsT_StateSeqT;
   typedef std::stack_StateSeqT, std::vector_StateSeqT _StackT;
-  typedef _BracketMatcher_TraitsT  
_BMatcherT;
   typedef std::ctypetypename _TraitsT::char_type_CtypeT;
 
   // accepts a specific token or returns false.
@@ -91,20 +90,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool
   _M_bracket_expression();
 
-  void
-  _M_expression_term(_BMatcherT __matcher);
+  templatebool __icase, bool __collate
+   void
+   _M_insert_any_matcher_ecma();
 
-  bool
-  _M_range_expression(_BMatcherT __matcher);
+  templatebool __icase, bool __collate
+   void
+   _M_insert_any_matcher_posix();
 
-  bool
-  _M_collating_symbol(_BMatcherT __matcher);
+  templatebool __icase, bool __collate
+   void
+   _M_insert_char_matcher();
 
-  bool
-  _M_equivalence_class(_BMatcherT __matcher);
+  templatebool __icase, bool __collate
+   void
+   _M_insert_character_class_matcher();
 
-  bool
-  _M_character_class(_BMatcherT __matcher);
+  templatebool __icase, bool __collate
+   void
+   _M_insert_bracket_matcher(bool __neg);
+
+  templatebool __icase, bool __collate
+   void
+   _M_expression_term(_BracketMatcher_TraitsT, __icase, __collate
+  __matcher);
 
   int
   _M_cur_int_value(int __radix);
@@ -148,16 +157,110 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __compile_nfa(__cfirst, __cfirst + __len, __traits, __flags);
 }
 
-  templatetypename _TraitsT, bool __is_ecma
-struct _AnyMatcher
+  // [28.13.14]
+  templatetypename _TraitsT, bool __icase, bool __collate
+class _RegexTranslator
 {
-  typedef typename _TraitsT::char_type   _CharT;
+public:
+  typedef typename _TraitsT::char_type   _CharT;
+  typedef typename _TraitsT::string_type _StringT;
+  typedef typename std::conditional__collate,
+   _StringT,
+   _CharT::type _StrTransT;
 
   explicit
-  _AnyMatcher(const _TraitsT __traits)
+  _RegexTranslator(const _TraitsT __traits)
   : _M_traits(__traits)
   { }
 
+  _CharT
+  _M_translate(_CharT __ch) const
+  {
+   if (__icase)
+ return _M_traits.translate_nocase(__ch);
+   else if (__collate)
+ return _M_traits.translate(__ch);
+   else
+ return __ch;
+  }
+
+  _StrTransT
+  _M_transform(_CharT __ch) const
+  {
+   return _M_transform_impl(__ch, typename integral_constantbool,
+__collate::type());
+  }
+
+private:
+  _StrTransT
+  _M_transform_impl(_CharT __ch, false_type) const
+  { return __ch; }
+
+  

Re: [RFA][PATCH][middle-end/53623] Improve extension elimination

2014-01-08 Thread Jeff Law

On 01/08/14 01:14, Eric Botcazou wrote:

Committed after private email approval from Jakub.  I made one
additional trivial change (missing whitespace in a comment).


This breaks bootstrap with RTL checking enabled:

/home/eric/svn/gcc/libgcc/config/libbid/bid64_noncomp.c:119:1: internal
compiler error: RTL check: expected code 'set' or 'clobber', have 'parallel'
in combine_reaching_defs, at ree.c:711
There were two issues in that code.  The first assumed the form of 
DEF_INSN was (set (dest) (src)), the second assumed that the destination 
must be a reg before checking its REGNO.


ree.c already had some code which effectively defined the form that the 
defining insn could take.  It's not quite single_set, though I'd 
really prefer that be the form in the future.  Anyway, I pulled that 
code out of merge_def_and_ext so that it could also be used by 
combine_reaching_defs.


With that I was able to bootstrap  regression test with 
--enable-checking=rtl as well as a normal bootstrap and regression test 
on x86_64-unknown-linux-gnu.


OK for the trunk?


* ree.c (get_sub_rtx): New function, extracted from...
(merge_def_and_ext): Here.
(combine_reaching_defs): Use get_sub_rtx.


diff --git a/gcc/ree.c b/gcc/ree.c
index ec09c7a..b41e891 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -580,27 +580,17 @@ make_defs_and_copies_lists (rtx extend_insn, const_rtx 
set_pat,
   return ret;
 }
 
-/* Merge the DEF_INSN with an extension.  Calls combine_set_extension
-   on the SET pattern.  */
-
-static bool
-merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state)
+static rtx *
+get_sub_rtx (rtx def_insn)
 {
-  enum machine_mode ext_src_mode;
-  enum rtx_code code;
-  rtx *sub_rtx;
-  rtx s_expr;
-  int i;
-
-  ext_src_mode = GET_MODE (XEXP (SET_SRC (cand-expr), 0));
-  code = GET_CODE (PATTERN (def_insn));
-  sub_rtx = NULL;
+  enum rtx_code code = GET_CODE (PATTERN (def_insn));
+  rtx *sub_rtx = NULL;
 
   if (code == PARALLEL)
 {
-  for (i = 0; i  XVECLEN (PATTERN (def_insn), 0); i++)
+  for (int i = 0; i  XVECLEN (PATTERN (def_insn), 0); i++)
 {
-  s_expr = XVECEXP (PATTERN (def_insn), 0, i);
+  rtx s_expr = XVECEXP (PATTERN (def_insn), 0, i);
   if (GET_CODE (s_expr) != SET)
 continue;
 
@@ -609,7 +599,7 @@ merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state 
*state)
   else
 {
   /* PARALLEL with multiple SETs.  */
-  return false;
+  return NULL;
 }
 }
 }
@@ -618,10 +608,27 @@ merge_def_and_ext (ext_cand *cand, rtx def_insn, 
ext_state *state)
   else
 {
   /* It is not a PARALLEL or a SET, what could it be ? */
-  return false;
+  return NULL;
 }
 
   gcc_assert (sub_rtx != NULL);
+  return sub_rtx;
+}
+
+/* Merge the DEF_INSN with an extension.  Calls combine_set_extension
+   on the SET pattern.  */
+
+static bool
+merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state)
+{
+  enum machine_mode ext_src_mode;
+  rtx *sub_rtx;
+
+  ext_src_mode = GET_MODE (XEXP (SET_SRC (cand-expr), 0));
+  sub_rtx = get_sub_rtx (def_insn);
+
+  if (sub_rtx == NULL)
+return false;
 
   if (REG_P (SET_DEST (*sub_rtx))
(GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode
@@ -707,8 +714,13 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, 
ext_state *state)
   /* If there is an overlap between the destination of DEF_INSN and
 CAND-insn, then this transformation is not safe.  Note we have
 to test in the widened mode.  */
+  rtx *dest_sub_rtx = get_sub_rtx (def_insn);
+  if (dest_sub_rtx == NULL
+ || !REG_P (SET_DEST (*dest_sub_rtx)))
+   return false;
+
   rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand-insn))),
-REGNO (SET_DEST (PATTERN (def_insn;
+REGNO (SET_DEST (*dest_sub_rtx)));
   if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand-insn
return false;
 


Re: [RFA][PATCH][middle-end/53623] Improve extension elimination

2014-01-08 Thread Jakub Jelinek
On Wed, Jan 08, 2014 at 04:02:17PM -0700, Jeff Law wrote:
   * ree.c (get_sub_rtx): New function, extracted from...
   (merge_def_and_ext): Here.
   (combine_reaching_defs): Use get_sub_rtx.

 --- a/gcc/ree.c
 +++ b/gcc/ree.c
 @@ -580,27 +580,17 @@ make_defs_and_copies_lists (rtx extend_insn, const_rtx 
 set_pat,
return ret;
  }
  
 -/* Merge the DEF_INSN with an extension.  Calls combine_set_extension
 -   on the SET pattern.  */
 -
 -static bool
 -merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state)
 +static rtx *
 +get_sub_rtx (rtx def_insn)

Please add a function comment for it (perhaps saying that it is like
single_set but never allows more than one SET).

Ok with that change.

Jakub


RE: [PATCH] Fix PR58115

2014-01-08 Thread Bernd Edlinger
Hi,

On Tue, 7 Jan 2014 15:10:20, Richard Biener wrote:

 On Tue, Jan 7, 2014 at 1:12 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Bernd Edlinger bernd.edlin...@hotmail.de writes:
 How about this patch for the big comment?


 The comment should say that target_set_current_function()
 cannot call target_reinit() because:

 target_reinit()=lang_dependent_init_target()
 =init_optabs()=init_all_optabs(this_fn_optabs);

 uses this_fn_optabs which is undefined here.

 However many targets (nios2, rx, i386, rs6000) do exactly that.

 Is there currently any target, that sets this_target_optab in the
 target_set_current_function?

 MIPS :-) (via save_target_globals_default_opts=save_target_globals)

 I think other targets need to do the same thing in order for tests
 like that extended intrinsics_4.c to work. How does this patch look?
 Tested on x86_64-linux-gnu.

 I didn't remove save_target_globals_default_opts because there the
 temporary optimization_current_node also protects a call to init_reg_sets.

 Well, if it works the patch is ok. You're way more familiar with the
 details of this machinery.

 Richard.


I found another test case that still fails with today's trunk:

#include immintrin.h

__m256 a[10], b[10], c[10];

void __attribute__((target (sse2), optimize (3)))
foo (void)
{
}

void __attribute__((target (avx), optimize (3)))
bar (void)
{
  a[0] = _mm256_and_ps (b[0], c[0]);
}

compile with i686-pc-linux-gnu-gcc -O2 -msse2 -mno-avx -S  

The attached patch seems to fix this test case for
targets that do not have SWITCHABLE_TARGET.

What do you think about it?

I think Jakub's patch will fix this case, but I did not try.
However even if the i368 is now clean, there are
still many targets that use target_reinit() in
target_set_current_function.


Bernd.

 Thanks,
 Richard


 gcc/
 PR target/58115
 * target-globals.c (save_target_globals): Remove this_fn_optab
 handling.
 * toplev.c: Include optabs.h.
 (target_reinit): Temporarily restore the global options if another
 set of options are in force.

 gcc/testsuite/
 * gcc.target/i386/intrinsics_4.c (bar): New function.

 Index: gcc/target-globals.c
 ===
 --- gcc/target-globals.c 2014-01-02 22:16:03.042278971 +
 +++ gcc/target-globals.c 2014-01-07 12:08:33.569900970 +
 @@ -68,7 +68,6 @@ struct target_globals *
 save_target_globals (void)
 {
 struct target_globals *g;
 - struct target_optabs *saved_this_fn_optabs = this_fn_optabs;

 g = ggc_alloc_target_globals ();
 g-flag_state = XCNEW (struct target_flag_state);
 @@ -88,10 +87,8 @@ save_target_globals (void)
 g-bb_reorder = XCNEW (struct target_bb_reorder);
 g-lower_subreg = XCNEW (struct target_lower_subreg);
 restore_target_globals (g);
 - this_fn_optabs = this_target_optabs;
 init_reg_sets ();
 target_reinit ();
 - this_fn_optabs = saved_this_fn_optabs;
 return g;
 }

 Index: gcc/toplev.c
 ===
 --- gcc/toplev.c 2014-01-07 08:11:43.888058805 +
 +++ gcc/toplev.c 2014-01-07 12:10:19.448096479 +
 @@ -78,6 +78,7 @@ Software Foundation; either version 3, o
 #include diagnostic-color.h
 #include context.h
 #include pass_manager.h
 +#include optabs.h

 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include dbxout.h
 @@ -1752,6 +1753,23 @@ target_reinit (void)
 {
 struct rtl_data saved_x_rtl;
 rtx *saved_regno_reg_rtx;
 + tree saved_optimization_current_node;
 + struct target_optabs *saved_this_fn_optabs;
 +
 + /* Temporarily switch to the default optimization node, so that
 + *this_target_optabs is set to the default, not reflecting
 + whatever a previous function used for the optimize
 + attribute. */
 + saved_optimization_current_node = optimization_current_node;
 + saved_this_fn_optabs = this_fn_optabs;
 + if (saved_optimization_current_node != optimization_default_node)
 + {
 + optimization_current_node = optimization_default_node;
 + cl_optimization_restore
 + (global_options,
 + TREE_OPTIMIZATION (optimization_default_node));
 + }
 + this_fn_optabs = this_target_optabs;

 /* Save *crtl and regno_reg_rtx around the reinitialization
 to allow target_reinit being called even after prepare_function_start. */
 @@ -1769,7 +1787,16 @@ target_reinit (void)
 /* Reinitialize lang-dependent parts. */
 lang_dependent_init_target ();

 - /* And restore it at the end, as free_after_compilation from
 + /* Restore the original optimization node. */
 + if (saved_optimization_current_node != optimization_default_node)
 + {
 + optimization_current_node = saved_optimization_current_node;
 + cl_optimization_restore (global_options,
 + TREE_OPTIMIZATION (optimization_current_node));
 + }
 + this_fn_optabs = saved_this_fn_optabs;
 +
 + /* Restore regno_reg_rtx at the end, as free_after_compilation from
 expand_dummy_function_end clears it. */
 if (saved_regno_reg_rtx)
 {
 Index: gcc/testsuite/gcc.target/i386/intrinsics_4.c
 

Re: [GOOGLE] Remove mod_id_to_name map

2014-01-08 Thread Xinliang David Li
Ok.

David

On Wed, Jan 8, 2014 at 10:58 AM, Dehao Chen de...@google.com wrote:
 This patch removes mod_id_to_name map because the info is already
 there in module_infos. And also, AutoFDO don't have access to update
 this map because its a file-static structure.

 Bootstrapped and passed regression test.

 OK for google branch?

 Thanks,
 Dehao

 Index: gcc/coverage.c
 ===
 --- gcc/coverage.c (revision 206366)
 +++ gcc/coverage.c (working copy)
 @@ -615,37 +615,17 @@ reorder_module_groups (const char *imports_file, u
module_name_tab.dispose ();
  }

 -typedef struct {
 -  unsigned int mod_id;
 -  const char *mod_name;
 -} mod_id_to_name_t;
 -
 -static vecmod_id_to_name_t *mod_names;
 -
 -static void
 -record_module_name (unsigned int mod_id, const char *name)
 -{
 -  mod_id_to_name_t t;
 -
 -  t.mod_id = mod_id;
 -  t.mod_name = xstrdup (name);
 -  if (!mod_names)
 -vec_alloc (mod_names, 10);
 -  mod_names-safe_push (t);
 -}
 -
  /* Return the module name for module with MOD_ID.  */

  const char *
  get_module_name (unsigned int mod_id)
  {
size_t i;
 -  mod_id_to_name_t *elt;

 -  for (i = 0; mod_names-iterate (i, elt); i++)
 +  for (i = 0; i  num_in_fnames; i++)
  {
 -  if (elt-mod_id == mod_id)
 -return elt-mod_name;
 +  if (module_infos[i]-ident == mod_id)
 +return lbasename (module_infos[i]-source_filename);
  }

gcc_assert (0);
 @@ -927,9 +907,6 @@ read_counts_file (const char *da_file_name, unsign
   }
  }

 -  record_module_name (mod_info-ident,
 -  lbasename (mod_info-source_filename));
 -
if (dump_enabled_p ())
  {
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,


Re: [PATCH] Fix for PR 59524

2014-01-08 Thread Jeff Law

On 01/08/14 14:16, Iyer, Balaji V wrote:

Hello Everyone,
Attached, please find a patch will fix the bug mentioned in PR 59524. 
The main issue was that Cilk keywords tests are running even when the user 
configured the compiler with --disable-libcilkrts. This patch should fix this 
issue for C and C++. This is tested on x86 and x86_64.

Here are the ChangeLog entries

gcc/testsuite/ChangeLog
+2014-01-08  Balaji V. Iyer  balaji.v.i...@intel.com
+
+   PR testsuite/59524
+   * gcc.dg/cilk-plus/cilk-plus.exp: Make sure the cilk keywords tests
+   are run only if the Cilk library is available/enabled.
+   * g++.dg/cilk-plus/cilk-plus.exp: Likewise.
+   * lib/target-supports.exp (check_libcilkrts_available): New function.
+

Is this Ok for trunk?

Yes.

jeff



  1   2   >