[PATCH] [doc] Update plugin doc
Update plugin document after switching to C++, also make it more friendly to cross-build. ChangeLog: 2014-01-08 Joey Ye joey...@arm.com doc/plugin.texi (Building GCC plugins): Update to C++. OK to trunk? diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi index fc2d754..e668de6 100644 --- a/gcc/doc/plugins.texi +++ b/gcc/doc/plugins.texi @@ -465,18 +465,18 @@ integer numbers, so a plugin could ensure it is built for GCC 4.7 with The following GNU Makefile excerpt shows how to build a simple plugin: @smallexample -GCC=gcc -PLUGIN_SOURCE_FILES= plugin1.c plugin2.c -PLUGIN_OBJECT_FILES= $(patsubst %.c,%.o,$(PLUGIN_SOURCE_FILES)) -GCCPLUGINS_DIR:= $(shell $(GCC) -print-file-name=plugin) -CFLAGS+= -I$(GCCPLUGINS_DIR)/include -fPIC -O2 - -plugin.so: $(PLUGIN_OBJECT_FILES) - $(GCC) -shared $^ -o $@@ +HOST_GCC=g++ +TARGET_GCC=gcc +PLUGIN_SOURCE_FILES= plugin1.c plugin2.cc +GCCPLUGINS_DIR:= $(shell $(TARGET_GCC) -print-file-name=plugin) +CXXFLAGS+= -I$(GCCPLUGINS_DIR)/include -fPIC -fno-rtti -O2 + +plugin.so: $(PLUGIN_SOURCE_FILES) + $(HOST_GCC) -shared $(CXXFLAGS) $^ -o $@@ @end smallexample -A single source file plugin may be built with @code{gcc -I`gcc --print-file-name=plugin`/include -fPIC -shared -O2 plugin.c -o +A single source file plugin may be built with @code{g++ -I`gcc +-print-file-name=plugin`/include -fPIC -shared -fno-rtti -O2 plugin.c -o plugin.so}, using backquote shell syntax to query the @file{plugin} directory.
Re: [RFA][PATCH][middle-end/53623] Improve extension elimination
Committed after private email approval from Jakub. I made one additional trivial change (missing whitespace in a comment). This breaks bootstrap with RTL checking enabled: /home/eric/svn/gcc/libgcc/config/libbid/bid64_noncomp.c:119:1: internal compiler error: RTL check: expected code 'set' or 'clobber', have 'parallel' in combine_reaching_defs, at ree.c:711 } ^ 0x9c5fcf rtl_check_failed_code2(rtx_def const*, rtx_code, rtx_code, char const*, int, char const*) /home/eric/svn/gcc/gcc/rtl.c:783 0x14626da combine_reaching_defs /home/eric/svn/gcc/gcc/ree.c:711 0x1464ae9 find_and_remove_re /home/eric/svn/gcc/gcc/ree.c:957 0x1464ae9 rest_of_handle_ree /home/eric/svn/gcc/gcc/ree.c:1019 0x1464ae9 execute /home/eric/svn/gcc/gcc/ree.c:1058 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. make[3]: *** [bid64_noncomp.o] Error 1 make[3]: *** Waiting for unfinished jobs -- Eric Botcazou
Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
Hi Bernd, The patch is OK to me. But do we need reorder_loops for the c6x backend ? I mean we can set the do_reorder parameter to FALSE to save compile time, since c6x backend only choose hw-doloops whose body contains only one basic block. Cheers, Felix On 01/05/2014 05:10 PM, Teresa Johnson wrote: On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt ber...@codesourcery.com wrote: I have a different patch which I'll submit next week after some more testing. The assert in cfgrtl is unnecessarily broad and really only needs to trigger if -freorder-blocks-and-partition; there's nothing wrong with entering cfglayout after normal bb-reorder. Currently -freorder-blocks-and-partition is the default for x86. I assume that hw-doloop is not enabled for any i386 targets, which is why we haven't seen this? Precisely. And will this mean that -freorder-blocks-and-partition cannot be used for the targets that use hw-doloop? If so, should -freorder-blocks-and-partition be prevented with a warning for those targets? If someone explicitly chooses that option we can turn off the reordering in hw-doloop. That should happen sufficiently rarely that it isn't a problem. That's what the patch below does - bootstraped on x86_64-linux, tested there and with bfin-elf. Ok? I've also tested that Blackfin still benefits from the hw-doloop reordering code and generates more hardware loops if it's enabled. So we want to be able to run it at -O2. I looked at hw-doloop briefly and since it seems to be doing some manual bb reordering I guess it can't simply be moved before bbro. It seems like a better long-term solution would be to make bbro hw-doloop-aware as Felix suggested earlier. Maybe. It could be argued that the code in hw-doloop is relevant only for a small class of targets so it should only be enabled for them. In any case, that's not stage 3 material and two ports are broken... Bernd
Re: [PATCH] Fix PR59471
On Tue, 7 Jan 2014, Jakub Jelinek wrote: On Tue, Jan 07, 2014 at 04:12:57PM +0100, Richard Biener wrote: What about if something post gimplification creates VCE(BFR(VCE())) or similar and tries to force_gimple_operand_gsi or similar, then without making the above invalid in the predicates we'd still not try to gimplify it at all (because it would pass the predicate), and then hit the verification ICE. I don't think it passes any predicate, certainly not is_gimple_val, so we enter gimplification anyway. Or am I missing something? It isn't is_gimple_val, sure, I was thinking about whatever predicate we have for say the RHS of a load, is that is_gimple_addressable? There is is_gimple_lvalue too. Apparently we are calling force_gimple_operand_1* with just is_gimple_condexpr if it is not is_gimple_val or is_gimple_reg_rhs, so perhaps we are fine. I think it's fine. We do tree force_gimple_operand_1 (tree expr, gimple_seq *stmts, gimple_predicate gimple_test_f, tree var) { ... /* gimple_test_f might be more strict than is_gimple_val, make sure we pass both. Just checking gimple_test_f doesn't work because most gimple predicates do not work recursively. */ if (is_gimple_val (expr) (*gimple_test_f) (expr)) return expr; which of course is kind of pointless, but the gimplifier predicates are designed to only work post-gimplification, not pre-gimplification (if you test for the predicate at the start of gimplify_expr you'll see lots of failures). I have now committed the patch. Richard.
[PATCH] Fix PR49718 : allow no_instrument_function attribute in class member definition/declaration
All, I was looking at PR49718. I have enclosed a simple fix for this bug report. 2014-01-07 Laurent Alfonsi laurent.alfo...@st.com * c-family/c-common.c (handle_no_instrument_function_attribute): Allow no_instrument_function attribute in class member definition/declaration. Looking at the implementation of the function attributes, I see no reason anymore to keep this error message. Let me know if I missed something. I have also added a testcase in the enclosed patch. 2014-01-07 Laurent Alfonsi laurent.alfo...@st.com PR c++/49718 * g++.dg/pr49718.C: New gcc/g++/libstdc++ testsuites are ok on x86-64. Ok for trunk ? Regards, Laurent From 141d2bcfeab5e0635c7f4e362387fd5b1b9494e6 Mon Sep 17 00:00:00 2001 From: Laurent ALFONSI laurent.alfo...@st.com Date: Tue, 7 Jan 2014 16:26:04 +0100 Subject: [PATCH] Fix PR49718 : allow no_instrument_function attribute in class member definition/declaration --- gcc/c-family/c-common.c| 6 -- gcc/testsuite/g++.dg/pr49718.C | 41 + 2 files changed, 41 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr49718.C diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 8ecb70c..17fcb0d 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -7929,12 +7929,6 @@ handle_no_instrument_function_attribute (tree *node, tree name, %qE attribute applies only to functions, name); *no_add_attrs = true; } - else if (DECL_INITIAL (decl)) -{ - error_at (DECL_SOURCE_LOCATION (decl), - can%'t set %qE attribute after definition, name); - *no_add_attrs = true; -} else DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl) = 1; diff --git a/gcc/testsuite/g++.dg/pr49718.C b/gcc/testsuite/g++.dg/pr49718.C new file mode 100644 index 000..07cac8c --- /dev/null +++ b/gcc/testsuite/g++.dg/pr49718.C @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -finstrument-functions } */ +/* { dg-final { scan-assembler-times __cyg_profile_func_enter 1} } */ + +#define NOINSTR __attribute__((no_instrument_function)) + +struct t +{ + public: + /* Function code should be instrumented */ + __attribute__((noinline)) t() {} + + /* Function t::a() should not be instrumented */ + NOINSTR void a(){ + } + /* Function t::b() should not be instrumented */ + void NOINSTR b(){ + } + /* Function t::c() should not be instrumented */ + void c() NOINSTR { + } + /* Function t::d() should not be instrumented */ + void d() NOINSTR; +}; + +void t::d() +{ +} + +/* Function call_all_functions() should not be instrumented */ +struct t call_all_functions() __attribute__((no_instrument_function)); +struct t call_all_functions() +{ + struct t a; /* Constructor not inlined */ + a.a(); /* Inlined t::a() should not be instrumented */ + a.b(); /* Inlined t::b() should not be instrumented */ + a.c(); /* Inlined t::c() should not be instrumented */ + a.d(); /* Inlined t::d() should not be instrumented */ + return a; +} + -- 1.8.4.1
Re: [Patch] Regex bracket matcher cache optimization
On 7 January 2014 19:36, Tim Shen wrote: I didn't noticed that's so time consuming. I think reducing the compile time is possible (by templating several member functions instead of whole _Compiler class). Ouch! Yes, that's quite a bit slower, and this code is already very slow to compile. I haven't looked at the code recently, but another option that sometimes helps is to have a base class that implements the common functionality and then derive four classes from it, but minimise the amount of code in the derived classes. Thanks for the patches!
[PATCH] Fix get_mode_bounds for BImode (PR rtl-optimization/59649)
Hi! The recent change to get_mode_bounds for partial mode, where GET_MODE_PRECISION instead of GET_MODE_SIZE is now used, has broken ia64 bootstrap. The problem is that BImode is special cased in various places, e.g. trunc_int_for_mode, so the two values of the mode are 0 and STORE_FLAG_VALUE (which is sometimes -1, sometimes (ia64 case) 1). Now, two of the 3 get_mode_bounds callers use the same mode == target_mode and when called with BImode, true, BImode, ... min_val is -1 and max_val is 0, but given the weirdo trunc_int_for_mode behavior which returns STORE_FLAG_VALUE for value with low bit set and 0 otherwise, get_mode_bounds actually returns min_rtx (const_int 1) and max_rtx (const_int 0). This confuses the callers (in this case simplify-rtx.c) which then compares the trueop1 value against the bounds to miscompile the code. This patch fixes this by special casing BImode, so that we get the bounds in the right order for BImode, ?, BImode and even for the case where target_mode is wider ignores sign and returns 0, STORE_FLAG_VALUE or vice versa in the right order. Eric has kindly tested this on ia64. Ok for trunk? 2014-01-08 Jakub Jelinek ja...@redhat.com PR rtl-optimization/59649 * stor-layout.c (get_mode_bounds): For BImode return 0 and STORE_FLAG_VALUE. --- gcc/stor-layout.c.jj2014-01-03 11:40:57.0 +0100 +++ gcc/stor-layout.c 2014-01-07 18:59:39.056846684 +0100 @@ -2821,7 +2821,21 @@ get_mode_bounds (enum machine_mode mode, gcc_assert (size = HOST_BITS_PER_WIDE_INT); - if (sign) + /* Special case BImode, which has values 0 and STORE_FLAG_VALUE. */ + if (mode == BImode) +{ + if (STORE_FLAG_VALUE 0) + { + min_val = STORE_FLAG_VALUE; + max_val = 0; + } + else + { + min_val = 0; + max_val = STORE_FLAG_VALUE; + } +} + else if (sign) { min_val = -((unsigned HOST_WIDE_INT) 1 (size - 1)); max_val = ((unsigned HOST_WIDE_INT) 1 (size - 1)) - 1; Jakub
Re: [PATCH] [doc] Update plugin doc
Joey Ye joey...@arm.com wrote: ChangeLog: 2014-01-08 Joey Ye joey...@arm.com doc/plugin.texi (Building GCC plugins): Update to C++. OK to trunk? Okay unless anyone raises concrete issues in the next couple of days (or approves directly, of course). Thanks, Gerald
Re: [PATCH] Fix get_mode_bounds for BImode (PR rtl-optimization/59649)
On Wed, 8 Jan 2014, Jakub Jelinek wrote: Hi! The recent change to get_mode_bounds for partial mode, where GET_MODE_PRECISION instead of GET_MODE_SIZE is now used, has broken ia64 bootstrap. The problem is that BImode is special cased in various places, e.g. trunc_int_for_mode, so the two values of the mode are 0 and STORE_FLAG_VALUE (which is sometimes -1, sometimes (ia64 case) 1). Now, two of the 3 get_mode_bounds callers use the same mode == target_mode and when called with BImode, true, BImode, ... min_val is -1 and max_val is 0, but given the weirdo trunc_int_for_mode behavior which returns STORE_FLAG_VALUE for value with low bit set and 0 otherwise, get_mode_bounds actually returns min_rtx (const_int 1) and max_rtx (const_int 0). This confuses the callers (in this case simplify-rtx.c) which then compares the trueop1 value against the bounds to miscompile the code. This patch fixes this by special casing BImode, so that we get the bounds in the right order for BImode, ?, BImode and even for the case where target_mode is wider ignores sign and returns 0, STORE_FLAG_VALUE or vice versa in the right order. Eric has kindly tested this on ia64. Ok for trunk? Ok. Thanks, Richard. 2014-01-08 Jakub Jelinek ja...@redhat.com PR rtl-optimization/59649 * stor-layout.c (get_mode_bounds): For BImode return 0 and STORE_FLAG_VALUE. --- gcc/stor-layout.c.jj 2014-01-03 11:40:57.0 +0100 +++ gcc/stor-layout.c 2014-01-07 18:59:39.056846684 +0100 @@ -2821,7 +2821,21 @@ get_mode_bounds (enum machine_mode mode, gcc_assert (size = HOST_BITS_PER_WIDE_INT); - if (sign) + /* Special case BImode, which has values 0 and STORE_FLAG_VALUE. */ + if (mode == BImode) +{ + if (STORE_FLAG_VALUE 0) + { + min_val = STORE_FLAG_VALUE; + max_val = 0; + } + else + { + min_val = 0; + max_val = STORE_FLAG_VALUE; + } +} + else if (sign) { min_val = -((unsigned HOST_WIDE_INT) 1 (size - 1)); max_val = ((unsigned HOST_WIDE_INT) 1 (size - 1)) - 1; Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: std::vector move assign patch
On 27 December 2013 18:27, François Dumont wrote: Hi Here is a patch to fix an issue in normal mode during the move assignment. The destination vector allocator instance is moved too during the assignment which is wrong. Thanks for your patience, the normal-mode fix is definitely correct, and I've finished reviewing the other parts and they look good too. As I discover this problem while working on issues with management of safe iterators during move operations this patch also fix those issues in the debug mode for the vector container. Fixes for other containers in debug mode will come later. OK, great. In the new test you have: + VERIFY( it == v1.begin() ); // Error, it singular Please change this to Error, it is singular 2013-12-27 François Dumont fdum...@gcc.gnu.org * include/bits/stl_vector.h (std::vector::_M_move_assign): Pass *this allocator instance when building temporary vector instance so that *this allocator do not get moved. Please change this to does not get moved * include/debug/safe_base.h (_Safe_sequence_base(_Safe_sequence_base)): New. * include/debug/vector (__gnu_debug::vector(vector)): Use latter. I don't think latter is clear here, please say something like Use new move constructor for base class or ... for _Safe_sequence_base. This is OK for trunk, thanks very much. We might also want to fix just the normal-mode part on the 4.8 branch, I'll think about that.
Re: [Patch] Regex bracket matcher cache optimization
Hi, On 01/08/2014 10:24 AM, Jonathan Wakely wrote: On 7 January 2014 19:36, Tim Shen wrote: I didn't noticed that's so time consuming. I think reducing the compile time is possible (by templating several member functions instead of whole _Compiler class). Ouch! Yes, that's quite a bit slower, and this code is already very slow to compile. I haven't looked at the code recently, but another option that sometimes helps is to have a base class that implements the common functionality and then derive four classes from it, but minimise the amount of code in the derived classes. I only want to add that, besides keeping compile-time under control for 4.9.0 - please investigate a bit more along the mentioned lines - we should also start experimenting with exporting the instantiations. I don't know what the other implementations are doing, but in general it definitely makes sense, for compile-time performance too. I think we already said that some time ago, but the issue seems more important now. Maybe it's really unavoidable if we need template complexity for first class run-time performance. Paolo.
[patch] [plugin] Fix PR 59335 plugin build
Fix trunk plugin build by adding missing headers and remove headers no longer exist. Test passed: - arm-none-eabi build --enable-plugins - build test plugin - x86_64 bootstrap --enable-plugins OK to trunk? ChangeLog.gcc 2013-11-19 Joey Ye joey...@arm.com PR plugin/59335 * Makefile.in (tree-cfg.h, tree-into-ssa.h, fold-const.h, gimple-ssa.h, gimple-iterator.h, varasm.h, context.h): Add missing headers for plugin. (tree-flow.h, tree-flow-inline.h): Remove as they no longer exist. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 459b1ba..55f1ace 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -882,13 +882,14 @@ TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \ $(VEC_H) treestruct.def $(HASHTAB_H) \ double-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \ $(REAL_H) $(FIXED_VALUE_H) -TREE_H = tree.h $(TREE_CORE_H) tree-check.h +TREE_H = tree.h $(TREE_CORE_H) tree-check.h tree-cfg.h tree-into-ssa.h REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \ cfg-flags.def cfghooks.h GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \ $(GGC_H) $(BASIC_BLOCK_H) $(TREE_H) tree-ssa-operands.h \ - tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h + tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h \ + gimple-ssa.h gimple-iterator.h GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h RECOG_H = recog.h EMIT_RTL_H = emit-rtl.h @@ -929,7 +930,7 @@ CPP_ID_DATA_H = $(CPPLIB_H) $(srcdir)/../libcpp/include/cpp-id-data.h CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h $(CPP_ID_DATA_H) TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H) TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H) -TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \ +TREE_FLOW_H = tree-ssa-operands.h \ $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \ $(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \ tree-ssa-alias.h @@ -3119,7 +3120,7 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h \ $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \ $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \ - version.h stringpool.h + version.h stringpool.h varasm.h fold-const.h $(CONTEXT_H) # generate the 'build fragment' b-header-vars s-header-vars: Makefilediff --git a/gcc/Makefile.in b/gcc/Makefile.in index 459b1ba..55f1ace 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -882,13 +882,14 @@ TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \ $(VEC_H) treestruct.def $(HASHTAB_H) \ double-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \ $(REAL_H) $(FIXED_VALUE_H) -TREE_H = tree.h $(TREE_CORE_H) tree-check.h +TREE_H = tree.h $(TREE_CORE_H) tree-check.h tree-cfg.h tree-into-ssa.h REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \ cfg-flags.def cfghooks.h GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \ $(GGC_H) $(BASIC_BLOCK_H) $(TREE_H) tree-ssa-operands.h \ - tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h + tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) is-a.h \ + gimple-ssa.h gimple-iterator.h GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h RECOG_H = recog.h EMIT_RTL_H = emit-rtl.h @@ -929,7 +930,7 @@ CPP_ID_DATA_H = $(CPPLIB_H) $(srcdir)/../libcpp/include/cpp-id-data.h CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h $(CPP_ID_DATA_H) TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H) TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H) -TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \ +TREE_FLOW_H = tree-ssa-operands.h \ $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \ $(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \ tree-ssa-alias.h @@ -3119,7 +3120,7 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h \ $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \ $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \ - version.h stringpool.h + version.h stringpool.h varasm.h fold-const.h $(CONTEXT_H) # generate the 'build fragment' b-header-vars s-header-vars: Makefile
Re: reload autoinc fix
On 07/01/14 21:06, Andrew Pinski wrote: On Tue, Jan 7, 2014 at 12:55 PM, Jeff Law l...@redhat.com wrote: On 01/07/14 09:16, Bernd Schmidt wrote: This is PR56791. The address inside of an autoinc is reloaded, and the autoinc is reloaded, but the reload insns are emitted in the wrong order. As far as I can tell, this is because find_reloads_address_1 has two methods of pushing a reload for an autoinc, one of them using the previously identified type, and the other (better one) using RELOAD_OTHER. If we previously reloaded an inner part of the address, the use of RELOAD_OTHER is mismatched and leads to the wrong order of insns. This patch just remembers if we've pushed a reload, and forces the optimization to be skipped in that case. Bootstrapped and tested on x86_64-linux (with lra_p disabled but still somewhat pointlessly); John Anglin said in the PR that it tests ok on PA. Will commit in a few days if no objections. No objections to the substance of the patch, though I think the comment could be clearer. Though my question is what target does this matter since ARM has moved away from reload and other targets should do the same? There's still the chance we will have to move back for this release when building Thumb1. Only if we can iron out enough of the bugs/size regressions will we stick with LRA for that permutation. R.
Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP
On Wed, 8 Jan 2014, Kugan wrote: On 07/01/14 23:23, Richard Biener wrote: On Tue, 7 Jan 2014, Kugan wrote: [snip] Note that VIEW_CONVERT_EXPR is wrong here. I think you are handling this wrong still. From a quick look you want to avoid the actual promotion for reg_1 = when reg_1 is promoted and thus the target is (subreg:XX N). The RHS has been expanded in XXmode. Dependent on the value-range of reg_1 you want to set N to a paradoxical subreg of the expanded result. You can always do that if the reg is zero-extended and else if the MSB is not set for any of the values of reg_1. Thanks Richard for the explanation. I just want to double confirm I understand you correctly before I attempt to fix it. So let me try this for the following example, for a gimple stmt of the following from: unsigned short _5; short int _6; _6 = (short int)_5; ;; _6 = (short int) _5; target = (subreg/s/u:HI (reg:SI 110 [ D.4144 ]) 0) temp = (subreg:HI (reg:SI 118) 0) So, I must generate the following if it satisfies the other conditions. (set (reg:SI 110 [ D.4144 ]) (subreg:SI temp )) Is my understanding correct? I'm no RTL expert in this particular area but yes, I think so. Not sure what paradoxical subregs are valid, so somebody else should comment here. You could even generate (set (reg:SI 110) (reg:SI 118)) iff temp is a SUBREG of a promoted var, as you require that for the destination as well. I don't see how is_assigned_exp_fit_type reflects this in any way. What I tried doing with the patch is: (insn 13 12 0 (set (reg:SI 110 [ D.4144 ]) (zero_extend:SI (subreg:HI (reg:SI 118) 0))) c5.c:8 -1 (nil)) If the values in register (reg:SI 118) fits HI mode (without overflowing), I assume that it is not necessary to just drop the higher bits and zero_extend as done above and generate the following instead. (insn 13 12 0 (set (reg:SI 110 [ D.4144 ]) (((reg:SI 118) 0))) c5.c:8 -1 (nil)) is_assigned_exp_fit_type just checks if the range fits (in the above case, the value in eg:SI 118 fits HI mode) and the checks before emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp)); checks the modes match. Is this wrong or am I missing the whole point? is_assigned_exp_fit_type is weird - it looks at the value-range of _5, but as you want to elide the extension from _6 to SImode you want to look at the value-range from _5. So, breaking it down and applying the promotion to GIMPLE it would look like unsigned short _5; short int _6; _6 = (short int)_5; _6_7 = (int) _6; where you want to remove the last line representing the assignment to (subreg:HI (reg:SI 110)). Whether you can do that depends on the value-range of _6, not on the value-range of _5. It's also completely independent on the operation performed on the RHS. Well. As far as I understand at least. Richard.
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)
On Tue, 7 Jan 2014, Jakub Jelinek wrote: Hi! On Fri, Jan 03, 2014 at 11:33:50AM +0100, Jakub Jelinek wrote: On Fri, Jan 03, 2014 at 11:24:53AM +0100, Richard Biener wrote: Anyway, back to the original patch, so do you prefer something like this instead? I.e. handle only __builtin_unreachable and __cxa_pure_virtual specially, and not devirt for fold_stmt_inplace? 2014-01-07 Jakub Jelinek ja...@redhat.com PR tree-optimization/59622 * gimple-fold.c (gimple_fold_call): Fix a typo in message. Handle __cxa_pure_virtual similarly to __builtin_unreachable. Don't devirtualize for inplace at all. * g++.dg/opt/pr59622-2.C: New test. --- gcc/gimple-fold.c.jj 2014-01-03 11:40:57.247320424 +0100 +++ gcc/gimple-fold.c 2014-01-07 18:15:00.352601812 +0100 @@ -1167,7 +1167,7 @@ gimple_fold_call (gimple_stmt_iterator * (OBJ_TYPE_REF_EXPR (callee) { fprintf (dump_file, -Type inheritnace inconsistent devirtualization of ); +Type inheritance inconsistent devirtualization of ); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); fprintf (dump_file, to ); print_generic_expr (dump_file, callee, TDF_SLIM); @@ -1177,26 +1177,35 @@ gimple_fold_call (gimple_stmt_iterator * gimple_call_set_fn (stmt, OBJ_TYPE_REF_EXPR (callee)); changed = true; } - else if (flag_devirtualize virtual_method_call_p (callee)) + else if (flag_devirtualize !inplace virtual_method_call_p (callee)) { bool final; vec cgraph_node *targets = possible_polymorphic_call_targets (callee, final); if (final targets.length () = 1) { + tree fndecl; if (targets.length () == 1) + fndecl = targets[0]-decl; + else + fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE); + + /* If fndecl (like __builtin_unreachable or + __cxa_pure_virtual) takes no arguments, doesn't have + return value and is noreturn, just add the call before + stmt and DCE will do it's job later on. */ + if (TREE_THIS_VOLATILE (fndecl) +VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl))) +TYPE_ARG_TYPES (TREE_TYPE (fndecl)) == void_list_node) { - gimple_call_set_fndecl (stmt, targets[0]-decl); - changed = true; - } - else if (!inplace) - { - tree fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE); gimple new_stmt = gimple_build_call (fndecl, 0); gimple_set_location (new_stmt, gimple_location (stmt)); gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); return true; } + + gimple_call_set_fndecl (stmt, fndecl); I prefer to always do this, not do the fancy insertion-before. That would do repeated folding for fold_stmt (gsi); fold_stmt (gsi); fold_stmt (gsi); where the last two should be a no-op. Richard. + changed = true; } } } --- gcc/testsuite/g++.dg/opt/pr59622-2.C.jj 2014-01-07 18:10:45.435904909 +0100 +++ gcc/testsuite/g++.dg/opt/pr59622-2.C 2014-01-07 18:10:45.435904909 +0100 @@ -0,0 +1,21 @@ +// PR tree-optimization/59622 +// { dg-do compile } +// { dg-options -O2 } + +namespace +{ + struct A + { +A () {} +virtual A *bar (int) = 0; +A *baz (int x) { return bar (x); } + }; +} + +A *a; + +void +foo () +{ + a-baz (0); +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [Patch AArch64] Implement Vector Permute Support
On Wed, Jan 08, 2014 at 12:10:13AM +, Andrew Pinski wrote: On Tue, Jan 7, 2014 at 4:05 PM, Marcus Shawcroft marcus.shawcr...@arm.com wrote: Andrew, We know that there are numerous issues with aarch64 BE advsimd support in GCC. The aarch64_be support is very much a work in progress. Tejas sorted out a number of fundamentals with a series of patches in November, notably in PCS conformance. There is more to come. However, aarch64_be-* support in gcc 4.9 is not going to match the level of quality for the aarch64-* port. Yes but should not introduce an ICE while GCC is in stage3. This was working before due not having a vec_perm before. I am going to request this to be reverted soon if it is not fixed (the GCC rules are clear here). Hi Andrew, I am confused, are you also proposing to revert this patch on 4.8 branch? The code has been sitting with that assert in place on trunk for well over a year (note that December 2012 was during 4.8's stage 3, not 4.9) there is no regression here. But, that doesn't absolve me of the fact that this is broken in a stupid way for big-endian AArch64. The band-aid, which I can prepare, would be to turn off vec_perm for BYTES_BIG_ENDIAN targets on the 4.9 and 4.8 branches. This is the most sensible thing to do in the short term. Naturally, you will lose vectorization of permute operations, but at least you won't get the ICE or wrong code generation. This is what the ARM back-end (from which I ported the vec_perm code) does. In the longer term you would want to audit the lane-numbering discrepancies between GCC and our architectural lane-numbers. We are some way towards that after Tejas' PCS conformance fix, but as Marcus has said, there is more to come. I should imagine that in this case you will need to provide a run-time transformation between the permute mask and an appropriate mask for tbl. To reiterate, this does not need reverted, we'll get a fix out disabling vec_perm for BYTES_BIG_ENDIAN on 4.8 branch and 4.9. Thanks, James
Re: [Patch] Regex bracket matcher cache optimization
On 01/07/2014 08:36 PM, Tim Shen wrote: On Tue, Jan 7, 2014 at 4:02 AM, Paolo Carlini paolo.carl...@oracle.com wrote: Ideally, I would suggest committing first the improvements in your previous patch (by the way, thanks for the numbers!) + the pure bug fixes and separate the further performance improvements which have compile-time performance implications (how big?), see if, eg, Jon has something to recommend. Can we do that? First patch committed. I later found that the second patch b.diff is based on the committed version (the attach, which fixed the problem); Not sure I'm following all the past and present tenses ;) but in my old message I proposed to commit now the *correctness* fixes too, which, I suppose, are fixes which don't have compile-time performance implications. Paolo.
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)
On Wed, Jan 08, 2014 at 11:45:28AM +0100, Richard Biener wrote: I prefer to always do this, not do the fancy insertion-before. That would do repeated folding for fold_stmt (gsi); fold_stmt (gsi); fold_stmt (gsi); where the last two should be a no-op. I don't see how is that possible, at least for the __builtin_unreachable case, because by just setting the fndecl to __builtin_unreachable and keeping the incompatible fntype and bogus arguments for it all the predicates whether it is a valid/suitable builtin call will fail and we don't have a __builtin_unreachable function you could call. So at least for builtin we want to make sure it has the right parameters. If the lhs is something we can just initialize to zero, we can replace the call with zeroing the lhs, but that is no the case always. For __cxa_pure_virtual we could just keep the code as is (just with the !inplace addition and spelling fix?), but would need to fix up whatever ICEs during checking on it to honor fntype rather than decl's type. Jakub
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)
On Wed, 8 Jan 2014, Jakub Jelinek wrote: On Wed, Jan 08, 2014 at 11:45:28AM +0100, Richard Biener wrote: I prefer to always do this, not do the fancy insertion-before. That would do repeated folding for fold_stmt (gsi); fold_stmt (gsi); fold_stmt (gsi); where the last two should be a no-op. I don't see how is that possible, at least for the __builtin_unreachable case, because by just setting the fndecl to __builtin_unreachable and keeping the incompatible fntype and bogus arguments for it all the predicates whether it is a valid/suitable builtin call will fail and we don't have a __builtin_unreachable function you could call. Well, that just means we need two sets of predicates to check for a builtin call. The __builtin_unreachable code wants to know what the callee is, not if that's a valid call to it. But yeah - this starts to get confusing :/ So at least for builtin we want to make sure it has the right parameters. If the lhs is something we can just initialize to zero, we can replace the call with zeroing the lhs, but that is no the case always. I start to think this is a too complex transform for stmt folding ... For __cxa_pure_virtual we could just keep the code as is (just with the !inplace addition and spelling fix?), but would need to fix up whatever ICEs during checking on it to honor fntype rather than decl's type. Yes. So a patch just keeping the targets.length () == 1 case in folding with just replacing the fndecl of the call is ok. Thanks, Richard.
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)
On Wed, 8 Jan 2014, Richard Biener wrote: On Wed, 8 Jan 2014, Jakub Jelinek wrote: On Wed, Jan 08, 2014 at 11:45:28AM +0100, Richard Biener wrote: I prefer to always do this, not do the fancy insertion-before. That would do repeated folding for fold_stmt (gsi); fold_stmt (gsi); fold_stmt (gsi); where the last two should be a no-op. I don't see how is that possible, at least for the __builtin_unreachable case, because by just setting the fndecl to __builtin_unreachable and keeping the incompatible fntype and bogus arguments for it all the predicates whether it is a valid/suitable builtin call will fail and we don't have a __builtin_unreachable function you could call. Well, that just means we need two sets of predicates to check for a builtin call. The __builtin_unreachable code wants to know what the callee is, not if that's a valid call to it. But yeah - this starts to get confusing :/ So at least for builtin we want to make sure it has the right parameters. If the lhs is something we can just initialize to zero, we can replace the call with zeroing the lhs, but that is no the case always. I start to think this is a too complex transform for stmt folding ... Alternatively do update_call_from_tree (gsi, get_or_create_ssa_default_def (cfun, create_tmp_var (TREE_TYPE (lhs. For __cxa_pure_virtual we could just keep the code as is (just with the !inplace addition and spelling fix?), but would need to fix up whatever ICEs during checking on it to honor fntype rather than decl's type. Yes. So a patch just keeping the targets.length () == 1 case in folding with just replacing the fndecl of the call is ok. Thanks, Richard. -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)
On Wed, Jan 08, 2014 at 12:15:40PM +0100, Richard Biener wrote: I start to think this is a too complex transform for stmt folding ... Alternatively do update_call_from_tree (gsi, get_or_create_ssa_default_def (cfun, create_tmp_var (TREE_TYPE (lhs. The lhs might not be is_gimple_reg_type though. What to do in that case? Jakub
Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622, take 3)
On Wed, 8 Jan 2014, Jakub Jelinek wrote: On Wed, Jan 08, 2014 at 12:15:40PM +0100, Richard Biener wrote: I start to think this is a too complex transform for stmt folding ... Alternatively do update_call_from_tree (gsi, get_or_create_ssa_default_def (cfun, create_tmp_var (TREE_TYPE (lhs. The lhs might not be is_gimple_reg_type though. What to do in that case? In that case you can remove the stmt. Richard.
Re: [PING] [REPOST] Invalid Code when reading from unaligned zero-sized array
On Tue, Jan 7, 2014 at 5:31 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hello, Ping... We still need a decision how to fix this. There are two alternative patches: 1. My latest proposal: http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01675.html 2. Eric's latest proposal: http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01667.html Let's go with 1., your patch adjusting how we recurse in expand. That seems safer to eventually backport. For 4.10 we should re-visit this and fix all the backends for those ABI issues with modes ... Richard. Thanks Bernd.
Re: [PATCH, 4.8, PR 59610] More optimize guards in ipa-prop.c
On Tue, Jan 7, 2014 at 7:27 PM, Martin Jambor mjam...@suse.cz wrote: Hi, I forgot to put the optimize test to the ipa_compute_jump_functions when fixing PR 57358 which is where it is most necessary. This patch adds it there and to parm_preserved_before_stmt_p which is also reachable through ipa_load_from_parm_agg_1 that is also called from outside of jump function computations. I'm currently bootstrapping and testing the following on x86_64-linux. OK for the branch if it passes? And the testcase for trunk? Ok. Thanks, Richard. Thanks, Martin 2014-01-07 Martin Jambor mjam...@suse.cz PR ipa/59610 * ipa-prop.c (ipa_compute_jump_functions): Bail out if not optimizing. (parm_preserved_before_stmt_p): Assume modification present when not optimizing. testsuite/ * gcc.dg/ipa/pr59610.c: New test. diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 47d487d..3788a11 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -623,16 +623,22 @@ parm_preserved_before_stmt_p (struct param_analysis_info *parm_ainfo, if (parm_ainfo parm_ainfo-parm_modified) return false; - gcc_checking_assert (gimple_vuse (stmt) != NULL_TREE); - ao_ref_init (refd, parm_load); - /* We can cache visited statements only when parm_ainfo is available and when - we are looking at a naked load of the whole parameter. */ - if (!parm_ainfo || TREE_CODE (parm_load) != PARM_DECL) -visited_stmts = NULL; + if (optimize) +{ + gcc_checking_assert (gimple_vuse (stmt) != NULL_TREE); + ao_ref_init (refd, parm_load); + /* We can cache visited statements only when parm_ainfo is available and + when we are looking at a naked load of the whole parameter. */ + if (!parm_ainfo || TREE_CODE (parm_load) != PARM_DECL) + visited_stmts = NULL; + else + visited_stmts = parm_ainfo-parm_visited_statements; + walk_aliased_vdefs (refd, gimple_vuse (stmt), mark_modified, modified, + visited_stmts); +} else -visited_stmts = parm_ainfo-parm_visited_statements; - walk_aliased_vdefs (refd, gimple_vuse (stmt), mark_modified, modified, - visited_stmts); +modified = true; + if (parm_ainfo modified) parm_ainfo-parm_modified = true; return !modified; @@ -1466,6 +1472,9 @@ ipa_compute_jump_functions (struct cgraph_node *node, { struct cgraph_edge *cs; + if (!optimize) +return; + for (cs = node-callees; cs; cs = cs-next_callee) { struct cgraph_node *callee = cgraph_function_or_thunk_node (cs-callee, diff --git a/gcc/testsuite/gcc.dg/ipa/pr59610.c b/gcc/testsuite/gcc.dg/ipa/pr59610.c new file mode 100644 index 000..fc09334 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr59610.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +struct A { int a; }; +extern void *y; + +__attribute__((optimize (0))) void +foo (void *p, struct A x) +{ + foo (y, x); +}
Re: [PATCH] Change i?86/x86_64 into SWITCHABLE_TARGET (PR58115)
On Tue, Jan 7, 2014 at 8:39 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Jan 06, 2014 at 10:27:06AM +, Richard Sandiford wrote: Of course, IMO, the cleanest fix would be to use switchable targets for i386... The following patch does that, bootstrapped/regtested on x86_64-linux and i686-linux. The only problem with the patch is PCH, +FAIL: 17_intro/headers/c++200x/stdc++.cc (test for excess errors) +FAIL: 17_intro/headers/c++200x/stdc++_multiple_inclusion.cc (test for excess errors) (both 32-bit and 64-bit regtests), where it ICEs. I guess the problem is that the target globals are allocated partly in GC, partly in heap and even if they were allocated completely in GC and GTY(()) marked fully all the individual pointed structures, we IMNSHO still don't want it to be saved during PCH and restored later, what we have is basically just a cache of the target globals. Dunno what is the best way to handle that though. Either before writing PCH c-common.c could call some tree.c routine that would traverse the cl_option_hash_table hash table and for every TARGET_OPTION_NODE in the hash table clear TREE_TARGET_GLOBALS. Or perhaps some gengtype extension to run some routine before PCH saving on the tree_target_option structs and clear the globals field in there. Or use GTY((user)) on tree_target_option, but then dunno how we'd handle the marking of the embedded opts field (and common). Any ideas? Well, a GTY((skip_pch)) would probably work. Or move the thing out-of GC land (thus make cl_option_hash_table persistant) and simply GTY((skip)) the pointer completely. Not sure if we ever collect from it. Richard. 2014-01-07 Jakub Jelinek ja...@redhat.com PR target/58115 * tree-core.h (struct target_globals): New forward declaration. (struct tree_target_option): Add globals field. * tree.h (TREE_TARGET_GLOBALS): Define. * target-globals.h (struct target_globals): Define even if !SWITCHABLE_TARGET. * config/i386/i386.h (SWITCHABLE_TARGET): Define. * config/i386/i386.c: Include target-globals.h. (ix86_set_current_function): Instead of doing target_reinit unconditionally, use save_target_globals_default_opts and restore_target_globals. --- gcc/tree-core.h.jj 2014-01-07 08:47:24.0 +0100 +++ gcc/tree-core.h 2014-01-07 16:44:35.591358235 +0100 @@ -1557,11 +1557,18 @@ struct GTY(()) tree_optimization_option struct target_optabs *GTY ((skip)) base_optabs; }; +/* Forward declaration, defined in target-globals.h. */ + +struct GTY(()) target_globals; + /* Target options used by a function. */ struct GTY(()) tree_target_option { struct tree_common common; + /* Target globals for the corresponding target option. */ + struct target_globals *globals; + /* The optimization options used by the user. */ struct cl_target_option opts; }; --- gcc/tree.h.jj 2014-01-03 11:40:33.0 +0100 +++ gcc/tree.h 2014-01-07 12:55:39.137295100 +0100 @@ -2695,6 +2695,9 @@ extern tree build_optimization_node (str #define TREE_TARGET_OPTION(NODE) \ (TARGET_OPTION_NODE_CHECK (NODE)-target_option.opts) +#define TREE_TARGET_GLOBALS(NODE) \ + (TARGET_OPTION_NODE_CHECK (NODE)-target_option.globals) + /* Return a tree node that encapsulates the target options in OPTS. */ extern tree build_target_option_node (struct gcc_options *opts); --- gcc/target-globals.h.jj 2014-01-03 11:40:46.0 +0100 +++ gcc/target-globals.h2014-01-07 17:08:51.113880947 +0100 @@ -37,6 +37,7 @@ extern struct target_builtins *this_targ extern struct target_gcse *this_target_gcse; extern struct target_bb_reorder *this_target_bb_reorder; extern struct target_lower_subreg *this_target_lower_subreg; +#endif struct GTY(()) target_globals { struct target_flag_state *GTY((skip)) flag_state; @@ -57,6 +58,7 @@ struct GTY(()) target_globals { struct target_lower_subreg *GTY((skip)) lower_subreg; }; +#if SWITCHABLE_TARGET extern struct target_globals default_target_globals; extern struct target_globals *save_target_globals (void); --- gcc/config/i386/i386.h.jj 2014-01-06 22:37:19.0 +0100 +++ gcc/config/i386/i386.h 2014-01-07 12:13:06.480486755 +0100 @@ -2510,6 +2510,9 @@ extern void debug_dispatch_window (int); #define IX86_HLE_ACQUIRE (1 16) #define IX86_HLE_RELEASE (1 17) +/* For switching between functions with different target attributes. */ +#define SWITCHABLE_TARGET 1 + /* Local variables: version-control: t --- gcc/config/i386/i386.c.jj 2014-01-06 22:37:19.0 +0100 +++ gcc/config/i386/i386.c 2014-01-07 16:52:32.597904760 +0100 @@ -80,6 +80,7 @@ along with GCC; see the file COPYING3. #include tree-pass.h #include context.h #include pass_manager.h +#include target-globals.h static rtx legitimize_dllimport_symbol (rtx, bool); static rtx
Re: [Patch,testsuite] Fix testcases that use bind_pic_locally
On Tue, Jan 07, 2014 at 09:35:54PM +, Mike Stump wrote: On Dec 17, 2013, at 6:06 AM, Vidya Praveen vidyaprav...@arm.com wrote: bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by default [1][2]. Let's give Jakub 2 days to weigh in? If no objections, Ok, though, do see about adding documentation for it. Sure. I didn't respin the patch with documentation since I wanted to know if the solution is acceptable. If this patch is OK, I'll respin with the documentation for bind_pic_locally_ok. I kinda would like a simpler interface for these two, but? that can be follow on work, if someone has a bright idea and some time to implement it. Could you explain what do you mean by simpler interface here? Cheers VP.
Re: Rb tree node recycling patch
On 27 December 2013 18:30, François Dumont wrote: Hi Here is a patch to add recycling of Rb tree nodes when possible. The change looks good, but it is not a bug fix, so I don't think it's suitable for Stage 3. Please re-submit this after 4.9 is released when we are in Stage 1 again, thanks.
Re: [Patch,testsuite] Fix testcases that use bind_pic_locally
On Wed, Jan 08, 2014 at 11:49:08AM +, Vidya Praveen wrote: On Tue, Jan 07, 2014 at 09:35:54PM +, Mike Stump wrote: On Dec 17, 2013, at 6:06 AM, Vidya Praveen vidyaprav...@arm.com wrote: bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by default [1][2]. Let's give Jakub 2 days to weigh in? If no objections, Ok, though, do see about adding documentation for it. Sure. I didn't respin the patch with documentation since I wanted to know if the solution is acceptable. If this patch is OK, I'll respin with the documentation for bind_pic_locally_ok. I kinda would like a simpler interface for these two, but? that can be follow on work, if someone has a bright idea and some time to implement it. Could you explain what do you mean by simpler interface here? The simpler interface, as I said earlier, would be just to make sure /* { dg-add-options bind_pic_locally } */ does the right thing, I really don't believe you've tried hard enough. It is true dejagnu's default_target_compile has: if {[board_info $dest exists multilib_flags]} { append add_flags [board_info $dest multilib_flags] } last (before just adding -o $destfile; is multilib_flags where the -fpic/-fPIC comes in, right?), but if say dg-add-options bind_pic_locally adds the necessary options not to dg-extra-tools-flags, but to some other variable and say gcc_target_compile (and g++_target_compile) around the [target_compile ...] invocation e.g. temporarily append that other variable (if not empty) to board_info's multilib_flags and afterwards remove it, I don't see why it wouldn't work. Tcl is quite flexible in this. Jakub
[PATCH] Change i?86/x86_64 into SWITCHABLE_TARGET (PR58115, take 2)
On Wed, Jan 08, 2014 at 12:32:59PM +0100, Richard Biener wrote: Either before writing PCH c-common.c could call some tree.c routine that would traverse the cl_option_hash_table hash table and for every TARGET_OPTION_NODE in the hash table clear TREE_TARGET_GLOBALS. Or perhaps some gengtype extension to run some routine before PCH saving on the tree_target_option structs and clear the globals field in there. Or use GTY((user)) on tree_target_option, but then dunno how we'd handle the marking of the embedded opts field (and common). Any ideas? Well, a GTY((skip_pch)) would probably work. Or move the thing out-of GC land (thus make cl_option_hash_table persistant) and simply GTY((skip)) the pointer completely. Not sure if we ever collect from it. Even if the pointer was out of GCC land and GTY((skip)), we'd need to clear it somewhere during PCH saving, as the containing structure is GC allocated. I've already implemented in the mean time the variant with the htab_traverse, all still reachable TARGET_OPTION_NODE trees should be in that hash table. Bootstrapped/regtested on x86_64-linux and i686-linux (in both cases with --enable-checking=yes,rtl and --enable-checking=release, for the i686-linux/release checking I had to fix an unrelated compare debug issue I'll post when I manage to reduce testcase). I'd like to get rid of all the XCNEW calls in target-globals.c as a follow-up. As for performance, for --enable-checking=release from very rough check on make -j48 bootstrap and make -j48 check times the patch is compile time neutral, on e.g. declare-simd-1.C testcase g++ is twice as fast with the patch though (~ 0.8 sec without the patch, ~ 0.3 sec with the patch, both for x86_64 and i686). Ok for trunk? 2014-01-07 Jakub Jelinek ja...@redhat.com PR target/58115 * tree-core.h (struct target_globals): New forward declaration. (struct tree_target_option): Add globals field. * tree.h (TREE_TARGET_GLOBALS): Define. (prepare_target_option_nodes_for_pch): New prototype. * target-globals.h (struct target_globals): Define even if !SWITCHABLE_TARGET. * tree.c (prepare_target_option_node_for_pch, prepare_target_option_nodes_for_pch): New functions. * config/i386/i386.h (SWITCHABLE_TARGET): Define. * config/i386/i386.c: Include target-globals.h. (ix86_set_current_function): Instead of doing target_reinit unconditionally, use save_target_globals_default_opts and restore_target_globals. c-family/ * c-pch.c (c_common_write_pch): Call prepare_target_option_nodes_for_pch. --- gcc/tree-core.h.jj 2014-01-07 08:47:24.0 +0100 +++ gcc/tree-core.h 2014-01-07 16:44:35.591358235 +0100 @@ -1557,11 +1557,18 @@ struct GTY(()) tree_optimization_option struct target_optabs *GTY ((skip)) base_optabs; }; +/* Forward declaration, defined in target-globals.h. */ + +struct GTY(()) target_globals; + /* Target options used by a function. */ struct GTY(()) tree_target_option { struct tree_common common; + /* Target globals for the corresponding target option. */ + struct target_globals *globals; + /* The optimization options used by the user. */ struct cl_target_option opts; }; --- gcc/tree.h.jj 2014-01-03 11:40:33.0 +0100 +++ gcc/tree.h 2014-01-07 21:28:15.038061120 +0100 @@ -2695,9 +2695,14 @@ extern tree build_optimization_node (str #define TREE_TARGET_OPTION(NODE) \ (TARGET_OPTION_NODE_CHECK (NODE)-target_option.opts) +#define TREE_TARGET_GLOBALS(NODE) \ + (TARGET_OPTION_NODE_CHECK (NODE)-target_option.globals) + /* Return a tree node that encapsulates the target options in OPTS. */ extern tree build_target_option_node (struct gcc_options *opts); +extern void prepare_target_option_nodes_for_pch (void); + #if defined ENABLE_TREE_CHECKING (GCC_VERSION = 2007) inline tree --- gcc/target-globals.h.jj 2014-01-03 11:40:46.0 +0100 +++ gcc/target-globals.h2014-01-07 17:08:51.113880947 +0100 @@ -37,6 +37,7 @@ extern struct target_builtins *this_targ extern struct target_gcse *this_target_gcse; extern struct target_bb_reorder *this_target_bb_reorder; extern struct target_lower_subreg *this_target_lower_subreg; +#endif struct GTY(()) target_globals { struct target_flag_state *GTY((skip)) flag_state; @@ -57,6 +58,7 @@ struct GTY(()) target_globals { struct target_lower_subreg *GTY((skip)) lower_subreg; }; +#if SWITCHABLE_TARGET extern struct target_globals default_target_globals; extern struct target_globals *save_target_globals (void); --- gcc/tree.c.jj 2014-01-03 11:40:33.0 +0100 +++ gcc/tree.c 2014-01-07 21:27:35.590268195 +0100 @@ -11527,6 +11527,28 @@ build_target_option_node (struct gcc_opt return t; } +/* Reset TREE_TARGET_GLOBALS cache for TARGET_OPTION_NODE. + Called through htab_traverse. */ + +static int +prepare_target_option_node_for_pch (void **slot,
Re: Rb tree node recycling patch
Hi, On 12/27/2013 07:30 PM, François Dumont wrote: Note that this patch contains also a cleanup of a useless template parameter _Is_pod_comparator on _Rb_tree_impl. The useless parameter is a remnant of an attempt at exploiting the EBO for _Rb_tree_impl. At some point Benjamin got a patch from a contributor but then had to quickly revert it just in time for the ABI freeze because it didn't work. Evrything is recorded in the mailing list. Anyway, whatever we do now (more exactly, post 4.9) let's make sure we don't break the ABI inadvertently, or, if we actually decide do that, we should reconsider the EBO. About the node recycling idea itself, we got a closely related Bugzilla. Is it *exactly* the same issue, or not? Please double check. Paolo.
Re: Rb tree node recycling patch
On 01/08/2014 02:34 PM, Paolo Carlini wrote: Hi, On 12/27/2013 07:30 PM, François Dumont wrote: Note that this patch contains also a cleanup of a useless template parameter _Is_pod_comparator on _Rb_tree_impl. The useless parameter is a remnant of an attempt at exploiting the EBO for _Rb_tree_impl. At some point Benjamin got a patch from a contributor but then had to quickly revert it just in time for the ABI freeze because it didn't work. Evrything is recorded in the mailing list. Anyway, whatever we do now (more exactly, post 4.9) let's make sure we don't break the ABI inadvertently, or, if we actually decide do that, we should reconsider the EBO. This ChangeLog entry: 2004-03-25 Dhruv Matani dhruvb...@gmx.net * include/bits/stl_tree.h: Introduced a new class _Rb_tree_impl, ... has the original EBO idea, which in fact we didn't deliver. Paolo.
Re: [RFA][PATCH][middle-end/53623] Improve extension elimination
On 01/08/14 01:14, Eric Botcazou wrote: Committed after private email approval from Jakub. I made one additional trivial change (missing whitespace in a comment). This breaks bootstrap with RTL checking enabled: [ ... ] Thanks. I'm on it. jeff
Re: [Patch] libgcov.c re-factoring
On Mon, Jan 6, 2014 at 9:49 AM, Teresa Johnson tejohn...@google.com wrote: On Sun, Jan 5, 2014 at 12:08 PM, Jan Hubicka hubi...@ucw.cz wrote: 2014-01-03 Rong Xu x...@google.com * gcc/gcov-io.c (gcov_var): Move from gcov-io.h. (gcov_position): Ditto. (gcov_is_error): Ditto. (gcov_rewrite): Ditto. * gcc/gcov-io.h: Refactor. Move gcov_var to gcov-io.h, and libgcov only part to libgcc/libgcov.h. * libgcc/libgcov-driver.c: Use libgcov.h. (buffer_fn_data): Use xmalloc instead of malloc. (gcov_exit_merge_gcda): Ditto. * libgcc/libgcov-driver-system.c (allocate_filename_struct): Ditto. * libgcc/libgcov.h: New common header files for libgcov-*.h. * libgcc/libgcov-interface.c: Use libgcov.h * libgcc/libgcov-merge.c: Ditto. * libgcc/libgcov-profiler.c: Ditto. * libgcc/Makefile.in: Add dependence to libgcov.h OK, with the licence changes and... Index: gcc/gcov-io.c === --- gcc/gcov-io.c (revision 206100) +++ gcc/gcov-io.c (working copy) @@ -36,6 +36,61 @@ static const gcov_unsigned_t *gcov_read_words (uns static void gcov_allocate (unsigned); #endif +/* Optimum number of gcov_unsigned_t's read from or written to disk. */ +#define GCOV_BLOCK_SIZE (1 10) + +GCOV_LINKAGE struct gcov_var +{ + FILE *file; + gcov_position_t start; /* Position of first byte of block */ + unsigned offset; /* Read/write position within the block. */ + unsigned length; /* Read limit in the block. */ + unsigned overread; /* Number of words overread. */ + int error; /* 0 overflow, 0 disk error. */ + int mode;/* 0 writing, 0 reading */ +#if IN_LIBGCOV + /* Holds one block plus 4 bytes, thus all coverage reads writes + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE + to and from the disk. libgcov never backtracks and only writes 4 + or 8 byte objects. */ + gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; +#else + int endian; /* Swap endianness. */ + /* Holds a variable length block, as the compiler can write + strings and needs to backtrack. */ + size_t alloc; + gcov_unsigned_t *buffer; +#endif +} gcov_var; + +/* Save the current position in the gcov file. */ +static inline gcov_position_t +gcov_position (void) +{ + gcc_assert (gcov_var.mode 0); + return gcov_var.start + gcov_var.offset; +} + +/* Return nonzero if the error flag is set. */ +static inline int +gcov_is_error (void) +{ + return gcov_var.file ? gcov_var.error : 1; +} + +#if IN_LIBGCOV +/* Move to beginning of file and initialize for writing. */ +GCOV_LINKAGE inline void +gcov_rewrite (void) +{ + gcc_assert (gcov_var.mode 0); I would turn those two asserts into checking asserts so they do not bloat the runtime lib. Ok, but note that there are a number of other gcc_assert already in gcov-io.c (these were the only 2 in gcov-io.h, now moved here). Should I go ahead and change all of them in gcov-io.c? Actually, I tried changing these two, but gcc_checking_assert is undefined in libgcov.a. Ok to commit without this change? Teresa Thanks, Teresa Thanks, Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Workaround PR59584 on 4.8 Fix use of stack-pointer-register as a temporary for CRIS
From: Hans-Peter Nilsson h...@axis.com Date: Mon, 23 Dec 2013 23:34:02 +0100 Just as previously done on trunk, I'm going to cover up PR59584 (which was fixed and then exposed on the 4.8 branch) by applying commit r206187 from trunk below. Again, the PR bug is an ICE caused by the combination of expr.c:find_args_size_adjust and expr.c:fixup_args_size_notes not able to handle a define_split matching for the stack-adjustment assignment instruction emitted by __builtin_stack_restore (the insn that gets the REG_ARGS_SIZE note). *This* bug is slightly different but the fix happens to cover up that bug by not matching the splitter for the stack-pointer; the destination is used as a temporary, so sp is set to something unusable as a stack-pointer, ungood. Tested cris-elf, makes gcc.dg/pr50251.c pass again, will commit to the 4.8 branch. PR middle-end/59584 * config/cris/predicates.md (cris_nonsp_register_operand): New define_predicate. * config/cris/cris.md: Replace register_operand with cris_nonsp_register_operand for destinations in all define_splits where a register is set more than once. Index: gcc/config/cris/cris.md === --- gcc/config/cris/cris.md (revision 206176) +++ gcc/config/cris/cris.md (working copy) @@ -758,7 +758,7 @@ (define_split (match_operand:SI 1 const_int_operand )) (match_operand:SI 2 register_operand ))]) (match_operand 3 register_operand )) - (set (match_operand:SI 4 register_operand ) + (set (match_operand:SI 4 cris_nonsp_register_operand ) (plus:SI (mult:SI (match_dup 0) (match_dup 1)) (match_dup 2)))])] @@ -859,7 +859,7 @@ (define_split (match_operand:SI 0 cris_bdap_operand ) (match_operand:SI 1 cris_bdap_operand ))]) (match_operand 2 register_operand )) - (set (match_operand:SI 3 register_operand ) + (set (match_operand:SI 3 cris_nonsp_register_operand ) (plus:SI (match_dup 0) (match_dup 1)))])] reload_completed reg_overlap_mentioned_p (operands[3], operands[2]) [(set (match_dup 4) (match_dup 2)) @@ -3960,7 +3960,7 @@ (define_expand casesi ;; up. (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 4 cris_operand_extend_operator [(match_operand 1 register_operand ) @@ -3990,7 +3990,7 @@ (define_split ;; Call this op-extend-split-rx=rz (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 4 cris_plus_or_bound_operator [(match_operand 1 register_operand ) @@ -4018,7 +4018,7 @@ (define_split ;; Call this op-extend-split-swapped (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 4 cris_plus_or_bound_operator [(match_operator @@ -4044,7 +4044,7 @@ (define_split ;; bound. Call this op-extend-split-swapped-rx=rz. (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 4 cris_plus_or_bound_operator [(match_operator @@ -4075,7 +4075,7 @@ (define_split ;; Call this op-extend. (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 3 cris_orthogonal_operator [(match_operand 1 register_operand ) @@ -4099,7 +4099,7 @@ (define_split ;; Call this op-split-rx=rz (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 3 cris_commutative_orth_op [(match_operand 2 memory_operand ) @@ -4123,7 +4123,7 @@ (define_split ;; Call this op-split-swapped. (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 3 cris_commutative_orth_op [(match_operand 1 register_operand ) @@ -4146,7 +4146,7 @@ (define_split ;; Call this op-split-swapped-rx=rz. (define_split - [(set (match_operand 0 register_operand ) + [(set (match_operand 0 cris_nonsp_register_operand ) (match_operator 3 cris_orthogonal_operator [(match_operand 2 memory_operand ) @@ -4555,10 +4555,11 @@ (define_split ;; We're not allowed to generate copies of registers with different mode ;; until after reload; copying pseudos upsets reload. CVS as of ;; 2001-08-24, unwind-dw2-fde.c, _Unwind_Find_FDE ICE in -;; cselib_invalidate_regno. +;; cselib_invalidate_regno. Also, don't do this for the stack-pointer, +;; as we don't want it set temporarily to an invalid value.
Re: [PATCH] Change i?86/x86_64 into SWITCHABLE_TARGET (PR58115, take 2)
On Wed, Jan 8, 2014 at 1:45 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Jan 08, 2014 at 12:32:59PM +0100, Richard Biener wrote: Either before writing PCH c-common.c could call some tree.c routine that would traverse the cl_option_hash_table hash table and for every TARGET_OPTION_NODE in the hash table clear TREE_TARGET_GLOBALS. Or perhaps some gengtype extension to run some routine before PCH saving on the tree_target_option structs and clear the globals field in there. Or use GTY((user)) on tree_target_option, but then dunno how we'd handle the marking of the embedded opts field (and common). Any ideas? Well, a GTY((skip_pch)) would probably work. Or move the thing out-of GC land (thus make cl_option_hash_table persistant) and simply GTY((skip)) the pointer completely. Not sure if we ever collect from it. Even if the pointer was out of GCC land and GTY((skip)), we'd need to clear it somewhere during PCH saving, as the containing structure is GC allocated. I've already implemented in the mean time the variant with the htab_traverse, all still reachable TARGET_OPTION_NODE trees should be in that hash table. Bootstrapped/regtested on x86_64-linux and i686-linux (in both cases with --enable-checking=yes,rtl and --enable-checking=release, for the i686-linux/release checking I had to fix an unrelated compare debug issue I'll post when I manage to reduce testcase). I'd like to get rid of all the XCNEW calls in target-globals.c as a follow-up. As for performance, for --enable-checking=release from very rough check on make -j48 bootstrap and make -j48 check times the patch is compile time neutral, on e.g. declare-simd-1.C testcase g++ is twice as fast with the patch though (~ 0.8 sec without the patch, ~ 0.3 sec with the patch, both for x86_64 and i686). Ok for trunk? Works for me. Wait a bit for others to comment though. Thanks, Richard. 2014-01-07 Jakub Jelinek ja...@redhat.com PR target/58115 * tree-core.h (struct target_globals): New forward declaration. (struct tree_target_option): Add globals field. * tree.h (TREE_TARGET_GLOBALS): Define. (prepare_target_option_nodes_for_pch): New prototype. * target-globals.h (struct target_globals): Define even if !SWITCHABLE_TARGET. * tree.c (prepare_target_option_node_for_pch, prepare_target_option_nodes_for_pch): New functions. * config/i386/i386.h (SWITCHABLE_TARGET): Define. * config/i386/i386.c: Include target-globals.h. (ix86_set_current_function): Instead of doing target_reinit unconditionally, use save_target_globals_default_opts and restore_target_globals. c-family/ * c-pch.c (c_common_write_pch): Call prepare_target_option_nodes_for_pch. --- gcc/tree-core.h.jj 2014-01-07 08:47:24.0 +0100 +++ gcc/tree-core.h 2014-01-07 16:44:35.591358235 +0100 @@ -1557,11 +1557,18 @@ struct GTY(()) tree_optimization_option struct target_optabs *GTY ((skip)) base_optabs; }; +/* Forward declaration, defined in target-globals.h. */ + +struct GTY(()) target_globals; + /* Target options used by a function. */ struct GTY(()) tree_target_option { struct tree_common common; + /* Target globals for the corresponding target option. */ + struct target_globals *globals; + /* The optimization options used by the user. */ struct cl_target_option opts; }; --- gcc/tree.h.jj 2014-01-03 11:40:33.0 +0100 +++ gcc/tree.h 2014-01-07 21:28:15.038061120 +0100 @@ -2695,9 +2695,14 @@ extern tree build_optimization_node (str #define TREE_TARGET_OPTION(NODE) \ (TARGET_OPTION_NODE_CHECK (NODE)-target_option.opts) +#define TREE_TARGET_GLOBALS(NODE) \ + (TARGET_OPTION_NODE_CHECK (NODE)-target_option.globals) + /* Return a tree node that encapsulates the target options in OPTS. */ extern tree build_target_option_node (struct gcc_options *opts); +extern void prepare_target_option_nodes_for_pch (void); + #if defined ENABLE_TREE_CHECKING (GCC_VERSION = 2007) inline tree --- gcc/target-globals.h.jj 2014-01-03 11:40:46.0 +0100 +++ gcc/target-globals.h2014-01-07 17:08:51.113880947 +0100 @@ -37,6 +37,7 @@ extern struct target_builtins *this_targ extern struct target_gcse *this_target_gcse; extern struct target_bb_reorder *this_target_bb_reorder; extern struct target_lower_subreg *this_target_lower_subreg; +#endif struct GTY(()) target_globals { struct target_flag_state *GTY((skip)) flag_state; @@ -57,6 +58,7 @@ struct GTY(()) target_globals { struct target_lower_subreg *GTY((skip)) lower_subreg; }; +#if SWITCHABLE_TARGET extern struct target_globals default_target_globals; extern struct target_globals *save_target_globals (void); --- gcc/tree.c.jj 2014-01-03 11:40:33.0 +0100 +++ gcc/tree.c 2014-01-07 21:27:35.590268195 +0100 @@ -11527,6 +11527,28
Re: [Patch] libgcov.c re-factoring
Actually, I tried changing these two, but gcc_checking_assert is undefined in libgcov.a. Ok to commit without this change? OK. incrementally can you please define gcov_nonruntime_assert that will wind into gcc_assert for code within gcc/coverage tools and into nothing for libgcov runtime and we can change those offenders to that. Honza Teresa Thanks, Teresa Thanks, Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: Extend -fstack-protector-strong to cover calls with return slot
On 01/07/2014 02:37 PM, Jakub Jelinek wrote: On Tue, Jan 07, 2014 at 02:27:04PM +0100, Florian Weimer wrote: gimplify_modify_expr_rhs, in the CALL_EXPR case: if (use_target) { CALL_EXPR_RETURN_SLOT_OPT (*from_p) = 1; mark_addressable (*to_p); } Yeah, that sets it in some cases too, not in other testcases. Just look at how the flag is used when actually expanding it: if (target MEM_P (target) CALL_EXPR_RETURN_SLOT_OPT (exp)) structure_value_addr = XEXP (target, 0); else { /* For variable-sized objects, we must be called with a target specified. If we were to allocate space on the stack here, we would have no way of knowing when to free it. */ rtx d = assign_temp (rettype, 1, 1); structure_value_addr = XEXP (d, 0); target = 0; } Okay, I'm beginning to understand. I tried to actually reach the second branch, and ended up with PR59711. :) foo12 in the new C testcase covers it in part without a variable-sized object. so, if it is set, the address of the var on the LHS is passed to the function as hidden argument, if it is not set, we pass address of a stack temporary instead. Both the automatic var and the stack temporary can overflow, if the callee does something wrong. What about the attached version? It still does not exactly match your original suggestion because gimple_call_lhs (stmt) can be NULL_TREE if the result is ignored and this case needs instrumentation, as you explained, so I use the function return type in the aggregate_value_p check. Testing is still under way, but looks good so far. I'm bootstrapping with BOOT_CFLAGS=-O2 -g -fstack-protector-strong with Ada enabled, for additional coverage. -- Florian Weimer / Red Hat Product Security Team gcc/ 2014-01-08 Florian Weimer fwei...@redhat.com * cfgexpand.c (stack_protect_decl_p): New function, extracted from expand_used_vars. (stack_protect_return_slot_p): New function. (expand_used_vars): Call stack_protect_decl_p and stack_protect_return_slot_p for -fstack-protector-strong. gcc/testsuite/ 2014-01-08 Florian Weimer fwei...@redhat.com * gcc.dg/fstack-protector-strong.c: Add coverage for return slots. * g++.dg/fstack-protector-strong.C: Likewise. * gcc.target/i386/ssp-strong-reg.c: New file. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 206311) +++ gcc/cfgexpand.c (working copy) @@ -1599,6 +1599,52 @@ return 0; } +/* Check if the current function has local referenced variables that + have their addresses taken, contain an array, or are arrays. */ + +static bool +stack_protect_decl_p () +{ + unsigned i; + tree var; + + FOR_EACH_LOCAL_DECL (cfun, i, var) +if (!is_global_var (var)) + { + tree var_type = TREE_TYPE (var); + if (TREE_CODE (var) == VAR_DECL + (TREE_CODE (var_type) == ARRAY_TYPE + || TREE_ADDRESSABLE (var) + || (RECORD_OR_UNION_TYPE_P (var_type) + record_or_union_type_has_array_p (var_type + return true; + } + return false; +} + +/* Check if the current function has calls that use a return slot. */ + +static bool +stack_protect_return_slot_p () +{ + basic_block bb; + + FOR_ALL_BB_FN (bb, cfun) +for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + /* This assumes that calls to internal-only functions never + use a return slot. */ + if (is_gimple_call (stmt) + !gimple_call_internal_p (stmt) + aggregate_value_p (TREE_TYPE (gimple_call_fntype (stmt)), + gimple_call_fndecl (stmt))) + return true; + } + return false; +} + /* Expand all variables used in the function. */ static rtx @@ -1669,22 +1715,8 @@ pointer_map_destroy (ssa_name_decls); if (flag_stack_protect == SPCT_FLAG_STRONG) -FOR_EACH_LOCAL_DECL (cfun, i, var) - if (!is_global_var (var)) - { - tree var_type = TREE_TYPE (var); - /* Examine local referenced variables that have their addresses taken, - contain an array, or are arrays. */ - if (TREE_CODE (var) == VAR_DECL - (TREE_CODE (var_type) == ARRAY_TYPE - || TREE_ADDRESSABLE (var) - || (RECORD_OR_UNION_TYPE_P (var_type) - record_or_union_type_has_array_p (var_type - { - gen_stack_protect_signal = true; - break; - } - } + gen_stack_protect_signal + = stack_protect_decl_p () || stack_protect_return_slot_p (); /* At this point all variables on the local_decls with TREE_USED set are not associated with any block scope. Lay them out. */ Index: gcc/testsuite/g++.dg/fstack-protector-strong.C === --- gcc/testsuite/g++.dg/fstack-protector-strong.C (revision 206311) +++
[Patch,ARM] crypto intrinsics in AArch32 testsuite fix
Hi, Commit 206131 introduced check_effective_target_arm_crypto_ok in lib/target-supports.exp, to check that the target supports -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp. However, when GCC is configured for target arm-none-linux-gnueabihf, I can see all the new tests fail: sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:29: fatal error: gnu/stubs-soft.h: No such file or directory (stubs.h is included via arm_neon.h) This is because check_effective_target_arm_crypto_ok sample test is too simple. Making it include arm_neon.h does the trick (and makes the tests UNSUPPORTED rather than FAIL). OK? Christophe. 2014-01-08 Christophe Lyon christophe.l...@linaro.org * lib/target-supports.exp (check_effective_target_arm_crypto_ok): Include arm_neon.h in sample test. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index a8910bb..cc10936 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-01-08 Christophe Lyon christophe.l...@linaro.org + + * lib/target-supports.exp (check_effective_target_arm_crypto_ok): + Include arm_neon.h in sample test. + 2014-01-07 Paolo Carlini paolo.carl...@oracle.com * g++.dg/ext/is_base_of_incomplete-2.C: New. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5166679..7b40ccd 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2305,6 +2305,7 @@ proc check_effective_target_arm_unaligned { } { proc check_effective_target_arm_crypto_ok {} { if { [check_effective_target_arm32] } { return [check_no_compiler_messages arm_crypto_ok object { + #include arm_neon.h int foo (void) { __asm__ volatile (aese.8 q0, q0);
[PATCH] Don't segv in omp-low.c (PR middle-end/59669)
We can also get NULL for the default definition, so we need to handle that before calling has_zero_uses on it. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-01-08 Marek Polacek pola...@redhat.com PR middle-end/59669 * omp-low.c (simd_clone_adjust): Don't crash if def is NULL. testsuite/ * gcc.dg/gomp/pr59669.c: New test. --- gcc/omp-low.c.mp2014-01-08 13:48:40.353624984 +0100 +++ gcc/omp-low.c 2014-01-08 13:48:47.780656551 +0100 @@ -11587,7 +11587,7 @@ simd_clone_adjust (struct cgraph_node *n tree def = ssa_default_def (cfun, orig_arg); gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg)) || POINTER_TYPE_P (TREE_TYPE (orig_arg))); - if (!has_zero_uses (def)) + if (def !has_zero_uses (def)) { iter1 = make_ssa_name (orig_arg, NULL); iter2 = make_ssa_name (orig_arg, NULL); --- gcc/testsuite/gcc.dg/gomp/pr59669.c.mp 2014-01-08 13:50:23.710492087 +0100 +++ gcc/testsuite/gcc.dg/gomp/pr59669.c 2014-01-08 13:50:54.339622411 +0100 @@ -0,0 +1,9 @@ +/* PR middle-end/59669 */ +/* { dg-do compile } */ +/* { dg-options -fopenmp } */ + +#pragma omp declare simd linear(a) +void +foo (int a) +{ +} Marek
Re: [PATCH] Don't segv in omp-low.c (PR middle-end/59669)
On Wed, Jan 08, 2014 at 04:09:08PM +0100, Marek Polacek wrote: We can also get NULL for the default definition, so we need to handle that before calling has_zero_uses on it. Bootstrapped/regtested on x86_64-linux, ok for trunk? Looks ok, but there is similar code a few lines above, can you please fix it up and add it to the testcase? I'd think #pragma omp declare simd uniform(a) aligned(a:32) void bar (int *a) { } could hit the other spot. Jakub
Re: [Patch,ARM] crypto intrinsics in AArch32 testsuite fix
On 08/01/14 15:00, Christophe Lyon wrote: Hi, Commit 206131 introduced check_effective_target_arm_crypto_ok in lib/target-supports.exp, to check that the target supports -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp. However, when GCC is configured for target arm-none-linux-gnueabihf, I can see all the new tests fail: sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:29: fatal error: gnu/stubs-soft.h: No such file or directory (stubs.h is included via arm_neon.h) This is because check_effective_target_arm_crypto_ok sample test is too simple. Making it include arm_neon.h does the trick (and makes the tests UNSUPPORTED rather than FAIL). OK? Christophe. Hi Christophe, I believe the best solution here is to figure out the best mfloat-abi and mfpu options combiation like we do for the NEON options (look for example at check_effective_target_arm_neon_ok_nocache in target-supports.exp). That way these tests will not add -mfloat-abi=softfp to an arm-none-linux-gnueabihf target (which is the root of the problem) and they will PASS instead of being just UNSUPPORTED. I have a patch for that in testing. Thanks, Kyrill
Re: [PATCH] Don't segv in omp-low.c (PR middle-end/59669)
On Wed, Jan 08, 2014 at 04:14:06PM +0100, Jakub Jelinek wrote: On Wed, Jan 08, 2014 at 04:09:08PM +0100, Marek Polacek wrote: We can also get NULL for the default definition, so we need to handle that before calling has_zero_uses on it. Bootstrapped/regtested on x86_64-linux, ok for trunk? Looks ok, but there is similar code a few lines above, can you please fix it up and add it to the testcase? I'd think #pragma omp declare simd uniform(a) aligned(a:32) void bar (int *a) { } could hit the other spot. Indeed it does. So like this? 2014-01-08 Marek Polacek pola...@redhat.com PR middle-end/59669 * omp-low.c (simd_clone_adjust): Don't crash if def is NULL. testsuite/ * gcc.dg/gomp/pr59669-1.c: New test. * gcc.dg/gomp/pr59669-2.c: New test. --- gcc/omp-low.c.mp2014-01-08 13:48:40.353624984 +0100 +++ gcc/omp-low.c 2014-01-08 16:21:06.247268557 +0100 @@ -11537,7 +11537,7 @@ simd_clone_adjust (struct cgraph_node *n unsigned int alignment = node-simdclone-args[i].alignment; tree orig_arg = node-simdclone-args[i].orig_arg; tree def = ssa_default_def (cfun, orig_arg); - if (!has_zero_uses (def)) + if (def !has_zero_uses (def)) { tree fn = builtin_decl_explicit (BUILT_IN_ASSUME_ALIGNED); gimple_seq seq = NULL; @@ -11587,7 +11587,7 @@ simd_clone_adjust (struct cgraph_node *n tree def = ssa_default_def (cfun, orig_arg); gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg)) || POINTER_TYPE_P (TREE_TYPE (orig_arg))); - if (!has_zero_uses (def)) + if (def !has_zero_uses (def)) { iter1 = make_ssa_name (orig_arg, NULL); iter2 = make_ssa_name (orig_arg, NULL); --- gcc/testsuite/gcc.dg/gomp/pr59669-1.c.mp2014-01-08 13:50:23.710492087 +0100 +++ gcc/testsuite/gcc.dg/gomp/pr59669-1.c 2014-01-08 13:50:54.339622411 +0100 @@ -0,0 +1,9 @@ +/* PR middle-end/59669 */ +/* { dg-do compile } */ +/* { dg-options -fopenmp } */ + +#pragma omp declare simd linear(a) +void +foo (int a) +{ +} --- gcc/testsuite/gcc.dg/gomp/pr59669-2.c.mp2014-01-08 16:20:35.553121408 +0100 +++ gcc/testsuite/gcc.dg/gomp/pr59669-2.c 2014-01-08 16:20:54.099210269 +0100 @@ -0,0 +1,9 @@ +/* PR middle-end/59669 */ +/* { dg-do compile } */ +/* { dg-options -fopenmp } */ + +#pragma omp declare simd uniform(a) aligned(a:32) +void +bar (int *a) +{ +} Marek
Re: [PATCH] Don't segv in omp-low.c (PR middle-end/59669)
On Wed, Jan 08, 2014 at 04:25:47PM +0100, Marek Polacek wrote: Indeed it does. So like this? 2014-01-08 Marek Polacek pola...@redhat.com PR middle-end/59669 * omp-low.c (simd_clone_adjust): Don't crash if def is NULL. testsuite/ * gcc.dg/gomp/pr59669-1.c: New test. * gcc.dg/gomp/pr59669-2.c: New test. Yep, thanks. Jakub
[Patch, Fortran] PR 58182: [4.9 Regression] ICE with global binding name used as a FUNCTION
Hi all, I just committed an 'obvious' patch for a ICE-on-invalid regression on trunk: http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=206429 Cheers, Janus
[PATCH] Add zero-overhead looping for xtensa backend
Hi Sterling, This patch implements zero-overhead looping for xtensa backend using hw-doloop facility. If OK for trunk, please apply it for me. Thanks. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206431) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,18 @@ +2014-01-08 Felix Yang fei.yang0...@gmail.com + +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-01-08 Marek Polacek pola...@redhat.com PR middle-end/59669 Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 206431) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,6 +1291,8 @@ (set_attr length3)]) +;; Hardware loop support. + ;; Define the loop insns used by bct optimization to represent the ;; start and end of a zero-overhead loop (in loop.c). This start ;; template generates the loop insn; the end template doesn't generate @@ -1296,34 +1300,58 @@ (define_insn zero_cost_loop_start [(set (pc) -(if_then_else (eq (match_operand:SI 0 register_operand a) - (const_int 0)) - (label_ref (match_operand 1 )) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] +(if_then_else (ne (match_operand:SI 2 nonimmediate_operand 0) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 0 nonimmediate_operand =a) +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] - loopnez\t%0, %l1 + loop\t%0, %l1_LEND [(set_attr typejump) (set_attr modenone) (set_attr length3)]) (define_insn zero_cost_loop_end [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 )) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] +(if_then_else (ne (match_operand:SI 2 nonimmediate_operand 0) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 0 nonimmediate_operand =a) +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] { -xtensa_emit_loop_end (insn, operands); -return ; + xtensa_emit_loop_end (insn, operands); + return ; } [(set_attr typejump) (set_attr modenone) (set_attr length0)]) +; operand 0 is the loop count pseudo register +; operand 1 is the label to jump to at the top of the loop +(define_expand doloop_end + [(parallel [(set (pc) (if_then_else + (ne (match_operand:SI 0 ) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_dup 0) + (plus:SI (match_dup 0) +(const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)])] + +{ + /* The loop optimizer doesn't check the predicates... */ + if (GET_MODE (operands[0]) != SImode) +FAIL; +}) + ;; Setting a register from a comparison. Index: gcc/config/xtensa/xtensa.c === --- gcc/config/xtensa/xtensa.c(revision 206431) +++ gcc/config/xtensa/xtensa.c(working copy) @@ -1,6 +1,7 @@ /* Subroutines for insn-output.c for Tensilica's Xtensa architecture. Copyright (C) 2001-2014 Free Software Foundation, Inc. Contributed by Bob Wilson (bwil...@tensilica.com) at Tensilica. + Zero-overhead looping support by Felix Yang (felix.yang0...@gmail.com). This file is part of GCC. @@ -61,8 +62,9 @@ along with GCC; see the file COPYING3. If not see #include gimple.h #include gimplify.h #include df.h +#include hw-doloop.h +#include dumpfile.h - /* Enumeration for all of the relational tests, so that we can build arrays indexed by the test type, and not worry about the order of EQ, NE, etc. */ @@ -186,6 +188,10 @@ static reg_class_t xtensa_secondary_reload (bool, static bool constantpool_address_p (const_rtx addr); static bool xtensa_legitimate_constant_p (enum
Re: [Patch] libgcov.c re-factoring
On Wed, Jan 8, 2014 at 6:34 AM, Jan Hubicka hubi...@ucw.cz wrote: Actually, I tried changing these two, but gcc_checking_assert is undefined in libgcov.a. Ok to commit without this change? OK. incrementally can you please define gcov_nonruntime_assert that will wind into gcc_assert for code within gcc/coverage tools and into nothing for libgcov runtime and we can change those offenders to that. Ok, committed as r206435. Will send the assert patch in a follow-up later this week. Teresa Honza Teresa Thanks, Teresa Thanks, Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.
Hi Renlin, The new test you added introduces 2 new FAILs when the target is arm-none-linux-gnueabi (as opposed to arm-none-linux-gnueabihf). Christophe. On 24 December 2013 15:46, Renlin Li renlin...@arm.com wrote: Hi, I just updated my patch according your suggestion. Thank you for committing it for me! All you guys have a nice Xmas break! Kind regards, Renlin Li On 04/12/13 11:23, Ramana Radhakrishnan wrote: Sorry about the slow response. Been on holiday. On 20/11/13 16:27, Renlin Li wrote: Hi all, This patch will make the arm back-end use vcvt for float to fixed point conversions when applicable. Test on arm-none-linux-gnueabi has been done on the model. Okay for trunk? + (define_insn *combine_vcvtf2i + [(set (match_operand:SI 0 s_register_operand =r) + (fix:SI (fix:SF (mult:SF (match_operand:SF 1 s_register_operand t) +(match_operand 2 +const_double_vcvt_power_of_two Dp)] + TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP3 !flag_rounding_math + vcvt%?.s32.f32\\t%1, %1, %v2\;vmov%?\\t%0, %1 + [(set_attr predicable yes) +(set_attr predicable_short_it no) +(set_attr ce_count 2) +(set_attr type f_cvtf2i)] + ) + You need to set length to 8. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c @@ -0,0 +1,15 @@ +/* Check that vcvt is used for fixed and float data conversions. */ +/* { dg-do compile } */ +/* { dg-options -O1 -mfpu=vfp3 } */ +/* { dg-require-effective-target arm_vfp_ok } */ +float fixed_to_float(int i) +{ +return ((float)i / (1 16)); +} + +int float_to_fixed(float f) +{ +return ((int)(f*(1 16))); +} +/* { dg-final { scan-assembler vcvt.f32.s32 } } */ +/* { dg-final { scan-assembler vcvt.s32.f32 } } */ GNU coding style for functions. Ok with those changes. regards Ramana Kind regards, Renlin Li gcc/ChangeLog: 2013-11-20 Renlin Li renlin...@arm.com * config/arm/arm-protos.h (vfp_const_double_for_bits): Declare. * config/arm/constraints.md (Dp): Define new constraint. * config/arm/predicates.md ( const_double_vcvt_power_of_two): Define new predicate. * config/arm/arm.c (arm_print_operand): Add print for new fucntion. (vfp3_const_double_for_bits): New function. * config/arm/vfp.md (combine_vcvtf2i): Define new instruction. gcc/testsuite/ChangeLog: 2013-11-20 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: New test case.
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Wed, Jan 8, 2014 at 8:27 AM, Felix Yang fei.yang0...@gmail.com wrote: Hi Sterling, This patch implements zero-overhead looping for xtensa backend using hw-doloop facility. If OK for trunk, please apply it for me. Thanks. Hi Felix, I last worked on zero-overhead loops for Xtensa in the gcc 4.3 timeframe, but when I did, I ran into several problems related to later optimizations rearranging the code which I didn't have time to address. I'm sure much of that experience is completely stale now, but I would appreciate a detail of the testing you have done with this patch (in particular, a description of the different xtensa configurations you tested it against, especially the ones with and without loop instructions) before I approve it. Please be sure the assembler can relax the loops it generates as well. I don't see any particular problem, but there are many, many gotchas when dealing with xtensa loop instructions. It also appears that Tensilica has stopped posting test results for Xtensa, which makes it difficult to evaluate the quality of this patch. Thanks, Sterling
Re: [PATCH] Fix ifcvt (PR rtl-optimization/58668)
Hello! So like this instead? Bootstrapped/regtested on x86_64-linux and i686-linux. For 4.8 I'd still prefer the earlier patch though. 2013-12-18 Jakub Jelinek ja...@redhat.com PR rtl-optimization/58668 * cfgcleanup.c (flow_find_cross_jump): Don't count any jumps if dir_p is NULL. Remove p1 variable, use active_insn_p to determine what is counted. (flow_find_head_matching_sequence): Use active_insn_p to determine what is counted. (try_head_merge_bb): Adjust for the flow_find_head_matching_sequence counting change. * ifcvt.c (count_bb_insns): Use active_insn_p !JUMP_P to determine what is counted. * gcc.dg/pr58668.c: New test. This is fine for the trunk. Release manager's call for what they'd prefer on the 4.8 branch. This caused PR59724 on alpha: 20021116-1.c: In function ‘foo’: 20021116-1.c:31:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 9 } ^ 20021116-1.c:31:1: error: insn outside basic block (jump_insn 94 52 93 9 (return) 20021116-1.c:31 -1 (nil) - return) Uros.
FW: [PATCH] Fix PR 59631
A small but major typo. The second sentence should read ...usage of _Cilk_spawn [ and _Cilk_sync] *without* -fcilkplus... instead of ...with -fcilkplus... I am sorry about this. Sincerely, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Tuesday, January 7, 2014 10:15 AM To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix PR 59631 Hello Everyone, The attached patch will fix the issue reported in PR 59631. The main issue was the usage of Cilk spawn [and _Cilk_sync] with -fcilkplus caused an ICE. This patch should fix that. The issue was only reported for C++ but the issue exists in C compiler also. This patch fixes both C and C++. A test case is also included. Is this Ok for trunk? Here are the ChangeLog entries: +++ gcc/c/ChangeLog +2014-01-07 Balaji V. Iyer balaji.v.i...@intel.com + + PR c++/59631 + * c-parser.c (c_parser_postfix_expression): Replaced consecutive if + statements with if-elseif statements. +++ gcc/testsuite/ChangeLog +2014-01-07 Balaji V. Iyer balaji.v.i...@intel.com + + PR c++/59631 + * gcc.dg/cilk-plus/cilk-plus.exp: Removed -fcilkplus from flags list. + * g++.dg/cilk-plus/cilk-plus.exp: Likewise. + * c-c++-common/cilk-plus/CK/spawnee_inline.c: Replaced second dg- option + with dg-additional-options. + * c-c++-common/cilk-plus/CK/varargs_test.c: Likewise. + * c-c++-common/cilk-plus/CK/steal_check.c: Likewise. + * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise. + * c-c++-common/cilk-plus/CK/spawning_arg.c: Likewise. + * c-c++-common/cilk-plus/CK/invalid_spawns.c: Added a dg-options tag. + * c-c++-common/cilk-plus/CK/pr59631.c: New testcase. +++ gcc/cp/ChangeLog +2014-01-07 Balaji V. Iyer balaji.v.i...@intel.com + + PR c++/59631 + * parser.c (cp_parser_postfix_expression): Added a new if-statement + and replaced an existing if-statement with else-if statement. + Changed an existing error message wording to match the one from the C + parser. Thanks, Balaji V. Iyer. Index: gcc/c/c-parser.c === --- gcc/c/c-parser.c(revision 206392) +++ gcc/c/c-parser.c(working copy) @@ -7500,7 +7500,7 @@ expr = c_parser_postfix_expression (parser); expr.value = error_mark_node; } - if (c_parser_peek_token (parser)-keyword == RID_CILK_SPAWN) + else if (c_parser_peek_token (parser)-keyword == RID_CILK_SPAWN) { error_at (loc, consecutive %_Cilk_spawn% keywords are not permitted); Index: gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp === --- gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp(revision 206392) +++ gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp(working copy) @@ -51,13 +51,13 @@ dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -fcilkplus -O3 -std=c99 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -fcilkplus -g -O0 -std=c99 -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O1 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O2 -std=c99 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O2 -ftree-vectorize -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 -g -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O1 +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O2 -std=c99 +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O2 -ftree-vectorize +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 -g if { [check_effective_target_lto] } { -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 -flto -g -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 -flto -g } dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/SE/*.c]] -g Index: gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp === --- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(revision 206392) +++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(working copy) @@ -74,12 +74,12 @@ dg-finish dg-init -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -fcilkplus -dg-runtest [lsort [glob -nocomplain
Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.
Hi Christophe, There is a minor issue about this test case. It requires the `float-abi` of your target to be either `softfp` or `hard` (to utilize the floating point hardware). Could you please check whether this solves the problem or not? I should add it to the `dg-options` section of the test case and a patch is on the way. Thank you for your notification! Kind regards, Renlin Li On 08/01/14 16:43, Christophe Lyon wrote: Hi Renlin, The new test you added introduces 2 new FAILs when the target is arm-none-linux-gnueabi (as opposed to arm-none-linux-gnueabihf). Christophe. On 24 December 2013 15:46, Renlin Li renlin...@arm.com wrote: Hi, I just updated my patch according your suggestion. Thank you for committing it for me! All you guys have a nice Xmas break! Kind regards, Renlin Li On 04/12/13 11:23, Ramana Radhakrishnan wrote: Sorry about the slow response. Been on holiday. On 20/11/13 16:27, Renlin Li wrote: Hi all, This patch will make the arm back-end use vcvt for float to fixed point conversions when applicable. Test on arm-none-linux-gnueabi has been done on the model. Okay for trunk? + (define_insn *combine_vcvtf2i + [(set (match_operand:SI 0 s_register_operand =r) + (fix:SI (fix:SF (mult:SF (match_operand:SF 1 s_register_operand t) +(match_operand 2 +const_double_vcvt_power_of_two Dp)] + TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP3 !flag_rounding_math + vcvt%?.s32.f32\\t%1, %1, %v2\;vmov%?\\t%0, %1 + [(set_attr predicable yes) +(set_attr predicable_short_it no) +(set_attr ce_count 2) +(set_attr type f_cvtf2i)] + ) + You need to set length to 8. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c @@ -0,0 +1,15 @@ +/* Check that vcvt is used for fixed and float data conversions. */ +/* { dg-do compile } */ +/* { dg-options -O1 -mfpu=vfp3 } */ +/* { dg-require-effective-target arm_vfp_ok } */ +float fixed_to_float(int i) +{ +return ((float)i / (1 16)); +} + +int float_to_fixed(float f) +{ +return ((int)(f*(1 16))); +} +/* { dg-final { scan-assembler vcvt.f32.s32 } } */ +/* { dg-final { scan-assembler vcvt.s32.f32 } } */ GNU coding style for functions. Ok with those changes. regards Ramana Kind regards, Renlin Li gcc/ChangeLog: 2013-11-20 Renlin Li renlin...@arm.com * config/arm/arm-protos.h (vfp_const_double_for_bits): Declare. * config/arm/constraints.md (Dp): Define new constraint. * config/arm/predicates.md ( const_double_vcvt_power_of_two): Define new predicate. * config/arm/arm.c (arm_print_operand): Add print for new fucntion. (vfp3_const_double_for_bits): New function. * config/arm/vfp.md (combine_vcvtf2i): Define new instruction. gcc/testsuite/ChangeLog: 2013-11-20 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: New test case.
Re: [PATCH] _Cilk_for for C and C++
On Tue, Jan 07, 2014 at 10:11:59PM +, Iyer, Balaji V wrote: I used a similar existing one (safelen). Attached, please find 2 fixed patches for C and C++ along with their changelogs. But safelen is something completely different, while if I skim the _Cilk_for docs, the grain is really a chunk size, where the runtime library performs the scheduling of grain sized chunks, so using OMP_CLAUSE_SCHEDULE clause with OMP_CLAUSE_SCHEDULE_KIND (c) = OMP_CLAUSE_SCHEDULE_RUNTIME; OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = grain_expr; sounds like what should be used. OMP_CLAUSE_SAFELEN says what is the minimal vectorization factor the compiler can assume is safe for a simd loop. Jakub
Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.
On 8 January 2014 18:15, Renlin Li renlin...@arm.com wrote: Hi Christophe, There is a minor issue about this test case. It requires the `float-abi` of your target to be either `softfp` or `hard` (to utilize the floating point hardware). Could you please check whether this solves the problem or not? Indeed I had tried with 'hard' and it's OK. (That's why I said arm-none-linux-gnueabi as opposed to arm-none-linux-gnueabihf, but I wasn't clear enough). Thanks for your upcoming patch :-) Christophe.
[PATCH, AArch64 2/6] aarch64: Add mulditi3 and umulditi3 patterns
* config/aarch64/aarch64.md (su_optabmulditi3): New expander. --- gcc/config/aarch64/aarch64.md | 17 + 1 file changed, 17 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c4acdfc..0b3943d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2078,6 +2078,23 @@ [(set_attr type sumull)] ) +(define_expand su_optabmulditi3 + [(set (match_operand:TI 0 register_operand) + (mult:TI (ANY_EXTEND:TI (match_operand:DI 1 register_operand)) +(ANY_EXTEND:TI (match_operand:DI 2 register_operand] + +{ + rtx low = gen_reg_rtx (DImode); + emit_insn (gen_muldi3 (low, operands[1], operands[2])); + + rtx high = gen_reg_rtx (DImode); + emit_insn (gen_sumuldi3_highpart (high, operands[1], operands[2])); + + emit_move_insn (gen_lowpart (DImode, operands[0]), low); + emit_move_insn (gen_highpart (DImode, operands[0]), high); + DONE; +}) + (define_insn sumuldi3_highpart [(set (match_operand:DI 0 register_operand =r) (truncate:DI -- 1.8.4.2
[PATCH, AArch64 1/6] aarch64: Add addti3 and subti3 patterns
* config/aarch64/aarch64 (addti3, subti3): New expanders. (addGPI3_compare0): Remove leading * from name. (addGPI3_carryin): Likewise. (subGPI3_compare0): Likewise. (subGPI3_carryin): Likewise. --- gcc/config/aarch64/aarch64.md | 45 +++ 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4e838ee..c4acdfc 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1102,7 +1102,26 @@ (set_attr simd *,*,*,yes)] ) -(define_insn *addmode3_compare0 +(define_expand addti3 + [(set (match_operand:TI 0 register_operand ) + (plus:TI (match_operand:TI 1 register_operand ) +(match_operand:TI 2 register_operand )))] + +{ + rtx low = gen_reg_rtx (DImode); + emit_insn (gen_adddi3_compare0 (low, gen_lowpart (DImode, operands[1]), + gen_lowpart (DImode, operands[2]))); + + rtx high = gen_reg_rtx (DImode); + emit_insn (gen_adddi3_carryin (high, gen_highpart (DImode, operands[1]), +gen_highpart (DImode, operands[2]))); + + emit_move_insn (gen_lowpart (DImode, operands[0]), low); + emit_move_insn (gen_highpart (DImode, operands[0]), high); + DONE; +}) + +(define_insn addmode3_compare0 [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ (plus:GPI (match_operand:GPI 1 register_operand %r,r,r) @@ -1386,7 +1405,7 @@ [(set_attr type alu_ext)] ) -(define_insn *addmode3_carryin +(define_insn addmode3_carryin [(set (match_operand:GPI 0 register_operand =r) (plus:GPI (geu:GPI (reg:CC CC_REGNUM) (const_int 0)) @@ -1554,8 +1573,26 @@ (set_attr simd *,yes)] ) +(define_expand subti3 + [(set (match_operand:TI 0 register_operand ) + (minus:TI (match_operand:TI 1 register_operand ) + (match_operand:TI 2 register_operand )))] + +{ + rtx low = gen_reg_rtx (DImode); + emit_insn (gen_subdi3_compare0 (low, gen_lowpart (DImode, operands[1]), + gen_lowpart (DImode, operands[2]))); + + rtx high = gen_reg_rtx (DImode); + emit_insn (gen_subdi3_carryin (high, gen_highpart (DImode, operands[1]), +gen_highpart (DImode, operands[2]))); + + emit_move_insn (gen_lowpart (DImode, operands[0]), low); + emit_move_insn (gen_highpart (DImode, operands[0]), high); + DONE; +}) -(define_insn *submode3_compare0 +(define_insn submode3_compare0 [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ (minus:GPI (match_operand:GPI 1 register_operand r) (match_operand:GPI 2 register_operand r)) @@ -1702,7 +1739,7 @@ [(set_attr type alu_ext)] ) -(define_insn *submode3_carryin +(define_insn submode3_carryin [(set (match_operand:GPI 0 register_operand =r) (minus:GPI (minus:GPI -- 1.8.4.2
[PATCH, AArch64 3/6] aarch64: Add multi3 pattern
* config/aarch64/aarch64.md (multi3): New expander. (maddGPI): Remove leading * from name. --- gcc/config/aarch64/aarch64.md | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 0b3943d..0f76cd1 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1968,7 +1968,7 @@ [(set_attr type mul)] ) -(define_insn *maddmode +(define_insn maddmode [(set (match_operand:GPI 0 register_operand =r) (plus:GPI (mult:GPI (match_operand:GPI 1 register_operand r) (match_operand:GPI 2 register_operand r)) @@ -2095,6 +2095,31 @@ DONE; }) +;; The default expansion of multi3 using umuldi3_highpart will perform +;; the additions in an order that fails to combine into two madd insns. +(define_expand multi3 + [(set (match_operand:TI 0 register_operand) + (mult:TI (match_operand:TI 1 register_operand) +(match_operand:TI 2 register_operand)))] + +{ + rtx l0 = gen_reg_rtx (DImode); + rtx l1 = gen_lowpart (DImode, operands[1]); + rtx l2 = gen_lowpart (DImode, operands[2]); + rtx h0 = gen_reg_rtx (DImode); + rtx h1 = gen_highpart (DImode, operands[1]); + rtx h2 = gen_highpart (DImode, operands[2]); + + emit_insn (gen_muldi3 (l0, l1, l2)); + emit_insn (gen_umuldi3_highpart (h0, l1, l2)); + emit_insn (gen_madddi (h0, h1, l2, h0)); + emit_insn (gen_madddi (h0, l1, h2, h0)); + + emit_move_insn (gen_lowpart (DImode, operands[0]), l0); + emit_move_insn (gen_highpart (DImode, operands[0]), h0); + DONE; +}) + (define_insn sumuldi3_highpart [(set (match_operand:DI 0 register_operand =r) (truncate:DI -- 1.8.4.2
[PATCH, AArch64 0/7] TImode and longlong.h improvements
The recent longlong.h patch http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00286.html reminded me that the other common patterns really ought to be supported somehow. We had patterns defining ADDS, ADC, and UMULH, but we didn't have the proper expanders in place to make use of them. The final longlong.h patch has nothing that's really aarch64 specific, but I chickened out in making the generic patterns use builtin double word arithmetic. Perhaps some define set in the cpu-specific portion of the file ought to select this from the final common portion, but that sort of thing begs the question of large-scale cleanup. r~ Richard Henderson (6): aarch64: Add addti3 and subti3 patterns aarch64: Add mulditi3 and umulditi3 patterns aarch64: Add multi3 pattern soft-fp: Commonize creation of TImode types soft-fp: Define UDWtype for longlong.h aarch64: Define add_ss, sub_ddmmss, umul_ppmm gcc/config/aarch64/aarch64.md| 89 ++-- include/longlong.h | 28 +--- libgcc/config/aarch64/sfp-machine.h | 4 -- libgcc/config/i386/64/sfp-machine.h | 5 -- libgcc/config/ia64/sfp-machine.h | 5 -- libgcc/config/tilegx/sfp-machine32.h | 5 -- libgcc/config/tilegx/sfp-machine64.h | 5 -- libgcc/soft-fp/soft-fp.h | 14 ++ 8 files changed, 120 insertions(+), 35 deletions(-) -- 1.8.4.2
[PATCH, AArch64 6/6] aarch64: Define add_ssaaaa, sub_ddmmss, umul_ppmm
We have good support for TImode arithmetic, so no need to do anything with inline assembly. include/ * longlong.h [__aarch64__] (add_ss, sub_ddmmss, umul_ppmm): New. [__aarch64__] (COUNT_LEADING_ZEROS_0): Define in terms of W_TYPE_SIZE. --- include/longlong.h | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/include/longlong.h b/include/longlong.h index b4c1f400..1b11fc7 100644 --- a/include/longlong.h +++ b/include/longlong.h @@ -123,19 +123,35 @@ extern const UQItype __clz_tab[256] attribute_hidden; #endif /* __GNUC__ 2 */ #if defined (__aarch64__) +#define add_ss(sh, sl, ah, al, bh, bl) \ + do { \ +UDWtype __x = (UDWtype)(UWtype)(ah) 64 | (UWtype)(al); \ +__x += (UDWtype)(UWtype)(bh) 64 | (UWtype)(bl); \ +(sh) = __x W_TYPE_SIZE; \ +(sl) = __x; \ + } while (0) +#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ + do { \ +UDWtype __x = (UDWtype)(UWtype)(ah) 64 | (UWtype)(al); \ +__x -= (UDWtype)(UWtype)(bh) 64 | (UWtype)(bl); \ +(sh) = __x W_TYPE_SIZE; \ +(sl) = __x; \ + } while (0) +#define umul_ppmm(ph, pl, m0, m1) \ + do { \ +UDWtype __x = (UDWtype)(UWtype)(m0) * (UWtype)(m1); \ +(ph) = __x W_TYPE_SIZE; \ +(pl) = __x; \ + } while (0) +#define COUNT_LEADING_ZEROS_0 W_TYPE_SIZE #if W_TYPE_SIZE == 32 #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clz (X)) #define count_trailing_zeros(COUNT, X) ((COUNT) = __builtin_ctz (X)) -#define COUNT_LEADING_ZEROS_0 32 -#endif /* W_TYPE_SIZE == 32 */ - -#if W_TYPE_SIZE == 64 +#elif W_TYPE_SIZE == 64 #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clzll (X)) #define count_trailing_zeros(COUNT, X) ((COUNT) = __builtin_ctzll (X)) -#define COUNT_LEADING_ZEROS_0 64 #endif /* W_TYPE_SIZE == 64 */ - #endif /* __aarch64__ */ #if defined (__alpha) W_TYPE_SIZE == 64 -- 1.8.4.2
[PATCH, AArch64 4/6] soft-fp: Commonize creation of TImode types
No need to do this over and over for different 64-bit hosts. libgcc/ * config/soft-fp/soft-fp.h (TItype, UTItype, TI_BITS): New. * config/aarch64/sfp-machine.h (TItype, UTItype, TI_BITS): Remove. * config/i386/64/sfp-machine.h: Likewise. * config/ia64/sfp-machine.h: Likewise. * config/tilegx/sfp-machine32.h: Likewise. * config/tilegx/sfp-machine64.h: Likewise. --- libgcc/config/aarch64/sfp-machine.h | 4 libgcc/config/i386/64/sfp-machine.h | 5 - libgcc/config/ia64/sfp-machine.h | 5 - libgcc/config/tilegx/sfp-machine32.h | 5 - libgcc/config/tilegx/sfp-machine64.h | 5 - libgcc/soft-fp/soft-fp.h | 8 6 files changed, 8 insertions(+), 24 deletions(-) diff --git a/libgcc/config/aarch64/sfp-machine.h b/libgcc/config/aarch64/sfp-machine.h index 61b5f72..5e676be 100644 --- a/libgcc/config/aarch64/sfp-machine.h +++ b/libgcc/config/aarch64/sfp-machine.h @@ -28,10 +28,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define _FP_WS_TYPEsigned long long #define _FP_I_TYPE int -typedef int TItype __attribute__ ((mode (TI))); -typedef unsigned int UTItype __attribute__ ((mode (TI))); -#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype)) - /* The type of the result of a floating point comparison. This must match __libgcc_cmp_return__ in GCC for the target. */ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h index 1ff94c2..8197536 100644 --- a/libgcc/config/i386/64/sfp-machine.h +++ b/libgcc/config/i386/64/sfp-machine.h @@ -3,11 +3,6 @@ #define _FP_WS_TYPEsigned long long #define _FP_I_TYPE long long -typedef int TItype __attribute__ ((mode (TI))); -typedef unsigned int UTItype __attribute__ ((mode (TI))); - -#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype)) - #define _FP_MUL_MEAT_Q(R,X,Y) \ _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm) diff --git a/libgcc/config/ia64/sfp-machine.h b/libgcc/config/ia64/sfp-machine.h index e06bc9a..f7dd928 100644 --- a/libgcc/config/ia64/sfp-machine.h +++ b/libgcc/config/ia64/sfp-machine.h @@ -3,11 +3,6 @@ #define _FP_WS_TYPEsigned long #define _FP_I_TYPE long -typedef int TItype __attribute__ ((mode (TI))); -typedef unsigned int UTItype __attribute__ ((mode (TI))); - -#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype)) - /* The type of the result of a floating point comparison. This must match `__libgcc_cmp_return__' in GCC for the target. */ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); diff --git a/libgcc/config/tilegx/sfp-machine32.h b/libgcc/config/tilegx/sfp-machine32.h index 31a2032..a921533 100644 --- a/libgcc/config/tilegx/sfp-machine32.h +++ b/libgcc/config/tilegx/sfp-machine32.h @@ -3,11 +3,6 @@ #define _FP_WS_TYPEsigned long #define _FP_I_TYPE long -typedef int TItype __attribute__ ((mode (TI))); -typedef unsigned int UTItype __attribute__ ((mode (TI))); - -#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype)) - /* The type of the result of a floating point comparison. This must match `__libgcc_cmp_return__' in GCC for the target. */ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); diff --git a/libgcc/config/tilegx/sfp-machine64.h b/libgcc/config/tilegx/sfp-machine64.h index 7cf352e..2586dd5 100644 --- a/libgcc/config/tilegx/sfp-machine64.h +++ b/libgcc/config/tilegx/sfp-machine64.h @@ -3,11 +3,6 @@ #define _FP_WS_TYPEsigned long #define _FP_I_TYPE long -typedef int TItype __attribute__ ((mode (TI))); -typedef unsigned int UTItype __attribute__ ((mode (TI))); - -#define TI_BITS (__CHAR_BIT__ * (int)sizeof(TItype)) - /* The type of the result of a floating point comparison. This must match `__libgcc_cmp_return__' in GCC for the target. */ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); diff --git a/libgcc/soft-fp/soft-fp.h b/libgcc/soft-fp/soft-fp.h index 696fc86..b54b1ed 100644 --- a/libgcc/soft-fp/soft-fp.h +++ b/libgcc/soft-fp/soft-fp.h @@ -237,6 +237,11 @@ typedef int DItype __attribute__ ((mode (DI))); typedef unsigned int UQItype __attribute__ ((mode (QI))); typedef unsigned int USItype __attribute__ ((mode (SI))); typedef unsigned int UDItype __attribute__ ((mode (DI))); +#if _FP_W_TYPE_SIZE == 64 +typedef int TItype __attribute__ ((mode (TI))); +typedef unsigned int UTItype __attribute__ ((mode (TI))); +#endif + #if _FP_W_TYPE_SIZE == 32 typedef unsigned int UHWtype __attribute__ ((mode (HI))); #elif _FP_W_TYPE_SIZE == 64 @@ -249,6 +254,9 @@ typedef USItype UHWtype; #define SI_BITS(__CHAR_BIT__ * (int) sizeof (SItype)) #define DI_BITS(__CHAR_BIT__ * (int) sizeof (DItype)) +#if _FP_W_TYPE_SIZE == 64 +#
[PATCH, AArch64 5/6] soft-fp: Define UDWtype for longlong.h
The documentation for longlong.h says this type must be defined. We've gotten away with this because so far longlong.h hasn't actually used the type. libgcc/ * soft-fp/soft-fp.h: (UDWtype): New define. --- libgcc/soft-fp/soft-fp.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libgcc/soft-fp/soft-fp.h b/libgcc/soft-fp/soft-fp.h index b54b1ed..8f80ea6 100644 --- a/libgcc/soft-fp/soft-fp.h +++ b/libgcc/soft-fp/soft-fp.h @@ -248,6 +248,12 @@ typedef unsigned int UHWtype __attribute__ ((mode (HI))); typedef USItype UHWtype; #endif +#if _FP_W_TYPE_SIZE == 32 +# define UDWtype UDItype +#elif _FP_W_TYPE_SIZE == 64 +# define UDWtype UTItype +#endif + #ifndef CMPtype # define CMPtype int #endif -- 1.8.4.2
[PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs
On Wed, Jan 08, 2014 at 01:45:40PM +0100, Jakub Jelinek wrote: I'd like to get rid of all the XCNEW calls in target-globals.c as a follow-up. Here it is. The rationale is both to avoid many separate heap allocations and if TARGET_OPTION_NODE is no longer needed (all FUNCTION_DECLs referencing it are e.g. optimized away, say static unused functions) to avoid leaking memory. Bootstrapped/regtested on x86_64-linux and i686-linux (together with the i386 SWITCHABLE_TARGET patch). Though, looking at the sizes, i686-linux allocates 0x67928 bytes which I think with ggc-page.c we allocate 0.5MB for it (acceptable), on x86_64-linux the allocation size is 0x83aa8 and thus only ~ 15KB over to fit into 0.5MB, thus I think we allocate 1MB. So, if we wanted to tune for x86_64, we could not allocate say target_flag_state (size 0x5008) in the big chunk, but instead make it GTY((atomic)) and allocate separately. Or perhaps do that for other very large structs? In any case, that doesn't look like something that probably would need to be retuned for every release. The current sizes of the structs are: struct target_globals 0x800x40 struct target_flag_state0x200x20 struct target_regs 0x5008 0x5008 struct target_hard_regs 0x35c8 0x33f8 struct target_reload0xef70 0xef70 struct target_expmed0x180b0 0xf4b0 struct target_optabs0x4f0 0x4b9 struct target_cfgloop 0x1c0x1c struct target_ira 0x9628 0x9620 struct target_ira_int 0x3fca8 0x322e4 struct target_lra_int 0xa718 0x4e70 struct target_builtins 0x268 0x268 struct target_gcse 0x620x62 struct target_bb_reorder0x4 0x4 struct target_lower_subreg 0x24c 0x18c Perhaps use cut-off of 4KB with current sizes, anything below that would be allocated in the single block, anything above it separately. So 7 structs allocated together, 7 separately. 2014-01-08 Jakub Jelinek ja...@redhat.com * target-globals.c (save_target_globals): Allocate most of the structs using GC in payload of target_globals struct instead of allocating them on the heap. --- gcc/target-globals.c.jj 2014-01-08 10:23:22.0 +0100 +++ gcc/target-globals.c2014-01-08 14:00:13.183231122 +0100 @@ -68,24 +68,43 @@ struct target_globals * save_target_globals (void) { struct target_globals *g; - - g = ggc_alloc_target_globals (); - g-flag_state = XCNEW (struct target_flag_state); - g-regs = XCNEW (struct target_regs); + struct target_globals_extra { +struct target_globals g; +struct target_flag_state flag_state; +struct target_regs regs; +struct target_hard_regs hard_regs; +struct target_reload reload; +struct target_expmed expmed; +struct target_optabs optabs; +struct target_cfgloop cfgloop; +struct target_ira ira; +struct target_ira_int ira_int; +struct target_lra_int lra_int; +struct target_builtins builtins; +struct target_gcse gcse; +struct target_bb_reorder bb_reorder; +struct target_lower_subreg lower_subreg; + } *p; + p = (struct target_globals_extra *) + ggc_internal_cleared_alloc_stat (sizeof (struct target_globals_extra) + PASS_MEM_STAT); + g = (struct target_globals *) p; + g-flag_state = p-flag_state; + g-regs = p-regs; g-rtl = ggc_alloc_cleared_target_rtl (); - g-hard_regs = XCNEW (struct target_hard_regs); - g-reload = XCNEW (struct target_reload); - g-expmed = XCNEW (struct target_expmed); - g-optabs = XCNEW (struct target_optabs); + g-hard_regs = p-hard_regs; + g-reload = p-reload; + g-expmed = p-expmed; + g-optabs = p-optabs; g-libfuncs = ggc_alloc_cleared_target_libfuncs (); - g-cfgloop = XCNEW (struct target_cfgloop); - g-ira = XCNEW (struct target_ira); - g-ira_int = XCNEW (struct target_ira_int); - g-lra_int = XCNEW (struct target_lra_int); - g-builtins = XCNEW (struct target_builtins); - g-gcse = XCNEW (struct target_gcse); - g-bb_reorder = XCNEW (struct target_bb_reorder); - g-lower_subreg = XCNEW (struct target_lower_subreg); + g-cfgloop = p-cfgloop; + g-ira = p-ira; + g-ira_int = p-ira_int; + g-lra_int = p-lra_int; + g-builtins = p-builtins; + g-gcse = p-gcse; + g-bb_reorder = p-bb_reorder; + g-lower_subreg = p-lower_subreg; restore_target_globals (g); init_reg_sets (); target_reinit (); Jakub
[PATCH] Fix up ipa-prop caused -fcompare-debug failures (PR ipa/59722)
Hi! The recent ipa_analyze_params_uses changes broke i686-linux bootstrap with --enable-checking=release, the reduced testcase below shows it. Obviously we need to ignore debug stmt uses during analysis. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as obvious. 2014-01-08 Jakub Jelinek ja...@redhat.com PR ipa/59722 * ipa-prop.c (ipa_analyze_params_uses): Ignore uses in debug stmts. * gcc.dg/pr59722.c: New test. --- gcc/ipa-prop.c.jj 2014-01-06 22:32:17.101586391 +0100 +++ gcc/ipa-prop.c 2014-01-08 16:07:29.203641224 +0100 @@ -2127,8 +2127,11 @@ ipa_analyze_params_uses (struct cgraph_n FOR_EACH_IMM_USE_FAST (use_p, imm_iter, ddef) if (!is_gimple_call (USE_STMT (use_p))) { - controlled_uses = IPA_UNDESCRIBED_USE; - break; + if (!is_gimple_debug (USE_STMT (use_p))) + { + controlled_uses = IPA_UNDESCRIBED_USE; + break; + } } else controlled_uses++; --- gcc/testsuite/gcc.dg/pr59722.c.jj 2014-01-08 16:06:34.325960016 +0100 +++ gcc/testsuite/gcc.dg/pr59722.c 2014-01-08 16:06:03.0 +0100 @@ -0,0 +1,36 @@ +/* PR ipa/59722 */ +/* { dg-do compile } */ +/* { dg-options -O2 -fcompare-debug } */ + +extern void abrt (const char *, int) __attribute__((noreturn)); +void baz (int *, int *); + +static inline int +bar (void) +{ + return 1; +} + +static inline void +foo (int *x, int y (void)) +{ + while (1) +{ + int a = 0; + if (*x) + { + baz (x, a); + while (a !y ()) + ; + break; + } + abrt (, 1); +} +} + +void +test (int x) +{ + foo (x, bar); + foo (x, bar); +} Jakub
C++ PATCH for c++/59614 (compile hog with lots of templates)
I was forgetting that recursing into template arguments would in turn recurse into their template arguments, leading to quadratic behavior. So, look at template arguments only once and add any inherited tags to the instantiated type. Tested x86_64-pc-linux-gnu, applying to trunk. commit f97c952a82d54a4cf0fc4583560de78589fa5664 Author: Jason Merrill ja...@redhat.com Date: Tue Jan 7 17:19:20 2014 -0500 PR c++/59614 * class.c (abi_tag_data): Add tags field. (check_abi_tags): Initialize it. (find_abi_tags_r): Support collecting missing tags. (mark_type_abi_tags): Don't look at template args. (inherit_targ_abi_tags): New. (check_bases_and_members): Use it. * cp-tree.h (ABI_TAG_IMPLICIT): New. * mangle.c (write_abi_tags): Check it. diff --git a/gcc/cp/class.c b/gcc/cp/class.c index c961b22..0c3ce47 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -1340,14 +1340,20 @@ struct abi_tag_data { tree t; tree subob; + // error_mark_node to get diagnostics; otherwise collect missing tags here + tree tags; }; static tree -find_abi_tags_r (tree *tp, int */*walk_subtrees*/, void *data) +find_abi_tags_r (tree *tp, int *walk_subtrees, void *data) { if (!OVERLOAD_TYPE_P (*tp)) return NULL_TREE; + /* walk_tree shouldn't be walking into any subtrees of a RECORD_TYPE + anyway, but let's make sure of it. */ + *walk_subtrees = false; + if (tree attributes = lookup_attribute (abi_tag, TYPE_ATTRIBUTES (*tp))) { struct abi_tag_data *p = static_caststruct abi_tag_data*(data); @@ -1358,7 +1364,20 @@ find_abi_tags_r (tree *tp, int */*walk_subtrees*/, void *data) tree id = get_identifier (TREE_STRING_POINTER (tag)); if (!IDENTIFIER_MARKED (id)) { - if (TYPE_P (p-subob)) + if (p-tags != error_mark_node) + { + /* We're collecting tags from template arguments. */ + tree str = build_string (IDENTIFIER_LENGTH (id), + IDENTIFIER_POINTER (id)); + p-tags = tree_cons (NULL_TREE, str, p-tags); + ABI_TAG_IMPLICIT (p-tags) = true; + + /* Don't inherit this tag multiple times. */ + IDENTIFIER_MARKED (id) = true; + } + + /* Otherwise we're diagnosing missing tags. */ + else if (TYPE_P (p-subob)) { warning (OPT_Wabi_tag, %qT does not have the %E abi tag that base %qT has, p-t, tag, p-subob); @@ -1397,22 +1416,6 @@ mark_type_abi_tags (tree t, bool val) IDENTIFIER_MARKED (id) = val; } } - - /* Also mark ABI tags from template arguments. */ - if (CLASSTYPE_TEMPLATE_INFO (t)) -{ - tree args = CLASSTYPE_TI_ARGS (t); - for (int i = 0; i TMPL_ARGS_DEPTH (args); ++i) - { - tree level = TMPL_ARGS_LEVEL (args, i+1); - for (int j = 0; j TREE_VEC_LENGTH (level); ++j) - { - tree arg = TREE_VEC_ELT (level, j); - if (CLASS_TYPE_P (arg)) - mark_type_abi_tags (arg, val); - } - } -} } /* Check that class T has all the abi tags that subobject SUBOB has, or @@ -1424,13 +1427,50 @@ check_abi_tags (tree t, tree subob) mark_type_abi_tags (t, true); tree subtype = TYPE_P (subob) ? subob : TREE_TYPE (subob); - struct abi_tag_data data = { t, subob }; + struct abi_tag_data data = { t, subob, error_mark_node }; cp_walk_tree_without_duplicates (subtype, find_abi_tags_r, data); mark_type_abi_tags (t, false); } +void +inherit_targ_abi_tags (tree t) +{ + if (CLASSTYPE_TEMPLATE_INFO (t) == NULL_TREE) +return; + + mark_type_abi_tags (t, true); + + tree args = CLASSTYPE_TI_ARGS (t); + struct abi_tag_data data = { t, NULL_TREE, NULL_TREE }; + for (int i = 0; i TMPL_ARGS_DEPTH (args); ++i) +{ + tree level = TMPL_ARGS_LEVEL (args, i+1); + for (int j = 0; j TREE_VEC_LENGTH (level); ++j) + { + tree arg = TREE_VEC_ELT (level, j); + data.subob = arg; + cp_walk_tree_without_duplicates (arg, find_abi_tags_r, data); + } +} + + // If we found some tags on our template arguments, add them to our + // abi_tag attribute. + if (data.tags) +{ + tree attr = lookup_attribute (abi_tag, TYPE_ATTRIBUTES (t)); + if (attr) + TREE_VALUE (attr) = chainon (data.tags, TREE_VALUE (attr)); + else + TYPE_ATTRIBUTES (t) + = tree_cons (get_identifier (abi_tag), data.tags, + TYPE_ATTRIBUTES (t)); +} + + mark_type_abi_tags (t, false); +} + /* Run through the base classes of T, updating CANT_HAVE_CONST_CTOR_P, and NO_CONST_ASN_REF_P. Also set flag bits in T based on properties of the bases. */ @@ -5431,6 +5471,9 @@ check_bases_and_members (tree t) bool saved_nontrivial_dtor; tree fn; + /* Pick up any abi_tags from our template arguments before checking. */ + inherit_targ_abi_tags (t); + /* By default, we use const reference arguments and generate default constructors. */ cant_have_const_ctor = 0; diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index bdae500..96af562f 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -65,6 +65,7 @@
[GOOGLE] Remove mod_id_to_name map
This patch removes mod_id_to_name map because the info is already there in module_infos. And also, AutoFDO don't have access to update this map because its a file-static structure. Bootstrapped and passed regression test. OK for google branch? Thanks, Dehao Index: gcc/coverage.c === --- gcc/coverage.c (revision 206366) +++ gcc/coverage.c (working copy) @@ -615,37 +615,17 @@ reorder_module_groups (const char *imports_file, u module_name_tab.dispose (); } -typedef struct { - unsigned int mod_id; - const char *mod_name; -} mod_id_to_name_t; - -static vecmod_id_to_name_t *mod_names; - -static void -record_module_name (unsigned int mod_id, const char *name) -{ - mod_id_to_name_t t; - - t.mod_id = mod_id; - t.mod_name = xstrdup (name); - if (!mod_names) -vec_alloc (mod_names, 10); - mod_names-safe_push (t); -} - /* Return the module name for module with MOD_ID. */ const char * get_module_name (unsigned int mod_id) { size_t i; - mod_id_to_name_t *elt; - for (i = 0; mod_names-iterate (i, elt); i++) + for (i = 0; i num_in_fnames; i++) { - if (elt-mod_id == mod_id) -return elt-mod_name; + if (module_infos[i]-ident == mod_id) +return lbasename (module_infos[i]-source_filename); } gcc_assert (0); @@ -927,9 +907,6 @@ read_counts_file (const char *da_file_name, unsign } } - record_module_name (mod_info-ident, - lbasename (mod_info-source_filename)); - if (dump_enabled_p ()) { dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
Re: PATCH: PR target/59587: cpu_names in i386.c is accessed with wrong index
On Wed, Dec 25, 2013 at 2:32 PM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Dec 25, 2013 at 10:31 PM, H.J. Lu hjl.to...@gmail.com wrote: cpu_names in i386.c is only used by ix86_function_specific_print which accesses it with enum processor_type index. But cpu_names is defined as array with enum target_cpu_default index. This patch adds processor names to processor_target_table and uses processor_target_table instead of cpu_names. It removes cpu_names and target_cpu_default. Tested on Linux/x86-64. OK to install? Wait a moment, it looks to me that TARGET_CPU_DEFAULT has to be synchronized with const processor_alias_table, so we are able to define various ISA extensions by selecting TARGET_CPU_*. The TARGET_CPU_DEFAULT can then TARGET_CPU_DEFAULT sets the default -mtune=, not -march=. be used to select extensions in the same way as PROCESSOR_* selects tuning for certain processor. It has been like this for a long time. For x86, TARGET_CPU_DEFAULT isn't defined no matter which configure options are used. We can change config.gcc to set TARGET_CPU_DEFAULT to proper PROCESSOR_XXX or set it to a string xxx for processor xxx. But GCC driver always passes -march=/-mtune= to toplev.c so that TARGET_CPU_DEFAULT is normally used. I meant to say TARGET_CPU_DEFAULT isn't normally used. Let me rethink this a bit, please do not commit the patch. TARGET_CPU_DEFAULT is left over for 32-bit target before --with-arch= and --with-cpu= were added. Today, -mtune=xxx -march=xxx are always passed to cc1 by GCC driver. If cc1 is run by hand and -mtune=xxx -march=xxx aren't passed to cc1, we should do 1. For 64-bit, it should be the same as -mtune=generic -march=x86_64 are passed. 2. For 32-bit, it should be the same as -mtune=cpu -march=cpu are passed, where cpu is the target cpu used to configure GCC, like i386 in i386-linux, i486 in i486-linux, But there is no i786 cpu. i786 is treated as i686. If SUBTARGET32_DEFAULT_CPU is defined, it should be the same -mtune=SUBTARGET32_DEFAULT_CPU -march=SUBTARGET32_DEFAULT_CPU. Here is the patch to implement this. Let's do one step at a time. So, let's split the patch back to target/59587 fix: 2013-12-25 H.J. Lu hongjiu...@intel.com PR target/59587 * config/i386/i386.c (struct ptt): Add a field for processor name. (processor_target_table): Sync with processor_type. Add processor names. (cpu_names): Removed. (ix86_option_override_internal): Default x_ix86_tune_string to processor_target_table[TARGET_CPU_DEFAULT].name. (ix86_function_specific_print): Assert arch and tune PROCESSOR_max. Use processor_target_table to print arch and tune names. * config/i386/i386.h (TARGET_CPU_DEFAULT): Default to PROCESSOR_GENERIC. (target_cpu_default): Removed. (processor_type): Reordered. OK for mainline and for 4.8 after a few days in mainline. Thanks, Uros. I am testing this patch. I will check it into 4.8 branch after finishing regression test. Thanks. -- H.J. --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6493bb2..f17bf56 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,24 @@ +2014-01-08 H.J. Lu hongjiu...@intel.com + + Backport from mainline + 2013-12-25 H.J. Lu hongjiu...@intel.com + + PR target/59587 + * config/i386/i386.c (struct ptt): Add a field for processor + name. + (processor_target_table): Sync with processor_type. Add + processor names. + (cpu_names): Removed. + (ix86_option_override_internal): Default x_ix86_tune_string + to processor_target_table[TARGET_CPU_DEFAULT].name. + (ix86_function_specific_print): Assert arch and tune + PROCESSOR_max. Use processor_target_table to print arch and + tune names. + * config/i386/i386.h (TARGET_CPU_DEFAULT): Default to + PROCESSOR_GENERIC32. + (target_cpu_default): Removed. + (processor_type): Reordered. + 2014-01-08 Uros Bizjak ubiz...@gmail.com Backport from mainline diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e03aa72..c06c220 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2409,6 +2409,7 @@ static tree ix86_veclibabi_acml (enum built_in_function, tree, tree); /* Processor target table, indexed by processor number */ struct ptt { + const char *const name; /* processor name */ const struct processor_costs *cost; /* Processor costs */ const int align_loop; /* Default alignments. */ const int align_loop_max_skip; @@ -2417,66 +2418,31 @@ struct ptt const int align_func; }; +/* This table must be in sync with enum processor_type in i386.h. */ static const struct ptt processor_target_table[PROCESSOR_max] = { - {i386_cost, 4, 3, 4, 3, 4}, - {i486_cost, 16, 15, 16, 15, 16}, - {pentium_cost, 16, 7, 16, 7, 16}, - {pentiumpro_cost, 16, 15, 16, 10, 16}, - {geode_cost, 0, 0,
PR 59137: Incorrect liveness info during dbr_schedule
PR 59137 is another case where dbr_schedule gets confused about liveness. We start out with: A: $2 = x B: if $4 == $2 goto L1 [REG_DEAD: $2] C: if $4 0 goto L2 ... L1: D: $2 = y E: goto L3 L2: F: $2 = x G: goto L3 ... L3: ... return $2 We fill G's delay slot in the obvious way: L2: G: goto L3 F: $2 = x Then we try to steal G's delay slot for C. F is obviously redundant with A in this context, so we drop it and end with a simple threaded branch to L3: A: $2 = x B: if $4 == $2 goto L1 [REG_DEAD: $2] C: if $4 0 goto L3 The problem is that the REG_DEAD note is no longer accurate, so when we go on to fill B's delay slot we mistakenly think that we can use D: A: $2 = x B: if $4 == $2 goto L3 D: $2 = y C: if $4 0 goto L3 and so the return value for $4 0 changes from x to y. reorg's mechanism for handling deleted redundant instructions seems to be update_block, which adds a USE containing the redundant instruction just before the place that it was supposed to occur. The patch therefore uses update_block in steal_delay_list_from_target. I went through the other calls to redundant_insn and a few of them also seem to be missing an update_block. I don't have testcases for these though, so it's going to be be a matter of opinion whether adding them or leaving them out is the defensive thing to do. I'm happy either way. (redundant_insn is pretty conservative, so the branch whose delay slot we're trying to fill can never be the one that makes a delay slot redundant. It must always be an instruction from before the branch. So inserting the (use ...) immediately before the branch should be correct.) Tested on mips64-linux-gnu. OK for trunk? OK for 4.8? Thanks, Richard gcc/ PR rtl-optimization/59137 * reorg.c (steal_delay_list_from_target): Call update_block for elided insns. (steal_delay_list_from_fallthrough, relax_delay_slots): Likewise. gcc/testsuite/ PR rtl-optimization/59137 * gcc.target/mips/pr59137.c: New test. Index: gcc/reorg.c === --- gcc/reorg.c 2014-01-08 18:04:23.420954812 + +++ gcc/reorg.c 2014-01-08 19:17:12.005446964 + @@ -1093,6 +1093,7 @@ steal_delay_list_from_target (rtx insn, int used_annul = 0; int i; struct resources cc_set; + bool *redundant; /* We can't do anything if there are more delay slots in SEQ than we can handle, or if we don't know that it will be a taken branch. @@ -1133,6 +1134,7 @@ steal_delay_list_from_target (rtx insn, return delay_list; #endif + redundant = XALLOCAVEC (bool, XVECLEN (seq, 0)); for (i = 1; i XVECLEN (seq, 0); i++) { rtx trial = XVECEXP (seq, 0, i); @@ -1154,7 +1156,8 @@ steal_delay_list_from_target (rtx insn, /* If this insn was already done (usually in a previous delay slot), pretend we put it in our delay slot. */ - if (redundant_insn (trial, insn, new_delay_list)) + redundant[i] = redundant_insn (trial, insn, new_delay_list); + if (redundant[i]) continue; /* We will end up re-vectoring this branch, so compute flags @@ -1187,6 +1190,12 @@ steal_delay_list_from_target (rtx insn, return delay_list; } + /* Record the effect of the instructions that were redundant and which + we therefore decided not to copy. */ + for (i = 1; i XVECLEN (seq, 0); i++) +if (redundant[i]) + update_block (XVECEXP (seq, 0, i), insn); + /* Show the place to which we will be branching. */ *pnew_thread = first_active_target_insn (JUMP_LABEL (XVECEXP (seq, 0, 0))); @@ -1250,6 +1259,7 @@ steal_delay_list_from_fallthrough (rtx i /* If this insn was already done, we don't need it. */ if (redundant_insn (trial, insn, delay_list)) { + update_block (trial, insn); delete_from_delay_slot (trial); continue; } @@ -3236,6 +3246,7 @@ relax_delay_slots (rtx first) to reprocess this insn. */ if (redundant_insn (XVECEXP (pat, 0, 1), delay_insn, 0)) { + update_block (XVECEXP (pat, 0, 1), insn); delete_from_delay_slot (XVECEXP (pat, 0, 1)); next = prev_active_insn (next); continue; @@ -3355,6 +3366,7 @@ relax_delay_slots (rtx first) redirect_with_delay_slots_safe_p (delay_insn, target_label, insn)) { + update_block (XVECEXP (PATTERN (trial), 0, 1), insn); reorg_redirect_jump (delay_insn, target_label); next = insn; continue; Index: gcc/testsuite/gcc.target/mips/pr59137.c === --- /dev/null 2013-12-26 20:29:50.272541227 + +++ gcc/testsuite/gcc.target/mips/pr59137.c 2014-01-08 19:17:12.006448250
[MIPS, committed] Revert some Octeon BADDU patches
This patch just reverts some changes I'd made to the BADDU patterns for the infamous (truncate:QI (plus:SI ...)) - (plus:QI ...) simplification. That simplification was limited to CISCy targets for PR 58295. Tested on mips64-linux-gnu and applied. It fixes the octeon-baddu-1.c failures. Thanks, Richard gcc/ Revert: 2012-10-07 Richard Sandiford rdsandif...@googlemail.com * config/mips/mips.c (mips_truncated_op_cost): New function. (mips_rtx_costs): Adjust test for BADDU. * config/mips/mips.md (*baddu_dimode): Push truncates to operands. 2012-10-02 Richard Sandiford rdsandif...@googlemail.com * config/mips/mips.md (*baddu_si_eb, *baddu_si_el): Merge into... (*baddu_si): ...this new pattern. Index: gcc/config/mips/mips.c === --- gcc/config/mips/mips.c 2014-01-02 22:16:09.486330453 + +++ gcc/config/mips/mips.c 2014-01-08 10:42:17.727013965 + @@ -3634,17 +3634,6 @@ mips_set_reg_reg_cost (enum machine_mode } } -/* Return the cost of an operand X that can be trucated for free. - SPEED says whether we're optimizing for size or speed. */ - -static int -mips_truncated_op_cost (rtx x, bool speed) -{ - if (GET_CODE (x) == TRUNCATE) -x = XEXP (x, 0); - return set_src_cost (x, speed); -} - /* Implement TARGET_RTX_COSTS. */ static bool @@ -4037,13 +4026,12 @@ mips_rtx_costs (rtx x, int code, int out case ZERO_EXTEND: if (outer_code == SET ISA_HAS_BADDU + (GET_CODE (XEXP (x, 0)) == TRUNCATE + || GET_CODE (XEXP (x, 0)) == SUBREG) GET_MODE (XEXP (x, 0)) == QImode - GET_CODE (XEXP (x, 0)) == PLUS) + GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS) { - rtx plus = XEXP (x, 0); - *total = (COSTS_N_INSNS (1) - + mips_truncated_op_cost (XEXP (plus, 0), speed) - + mips_truncated_op_cost (XEXP (plus, 1), speed)); + *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed); return true; } *total = mips_zero_extend_cost (mode, XEXP (x, 0)); Index: gcc/config/mips/mips.md === --- gcc/config/mips/mips.md 2014-01-08 10:29:42.171963087 + +++ gcc/config/mips/mips.md 2014-01-08 10:38:05.799078793 + @@ -1312,20 +1312,32 @@ (define_insn_and_split *addsi3_extended ;; Combiner patterns for unsigned byte-add. -(define_insn *baddu_si +(define_insn *baddu_si_eb [(set (match_operand:SI 0 register_operand =d) (zero_extend:SI -(plus:QI (match_operand:QI 1 register_operand d) - (match_operand:QI 2 register_operand d] - ISA_HAS_BADDU +(subreg:QI + (plus:SI (match_operand:SI 1 register_operand d) + (match_operand:SI 2 register_operand d)) 3)))] + ISA_HAS_BADDU BYTES_BIG_ENDIAN + baddu\\t%0,%1,%2 + [(set_attr alu_type add)]) + +(define_insn *baddu_si_el + [(set (match_operand:SI 0 register_operand =d) +(zero_extend:SI +(subreg:QI + (plus:SI (match_operand:SI 1 register_operand d) + (match_operand:SI 2 register_operand d)) 0)))] + ISA_HAS_BADDU !BYTES_BIG_ENDIAN baddu\\t%0,%1,%2 [(set_attr alu_type add)]) (define_insn *baddu_dimode [(set (match_operand:GPR 0 register_operand =d) (zero_extend:GPR -(plus:QI (truncate:QI (match_operand:DI 1 register_operand d)) - (truncate:QI (match_operand:DI 2 register_operand d)] +(truncate:QI + (plus:DI (match_operand:DI 1 register_operand d) + (match_operand:DI 2 register_operand d)] ISA_HAS_BADDU TARGET_64BIT baddu\\t%0,%1,%2 [(set_attr alu_type add)])
Re: Drop -m32 from pr59099.c
Hello! gcc.target/i386/pr59099.c fails on x86_64-redhat-linux-gnu with --disable-multilib because linking -m32 code is not supported. The test case passes in 64-bit mode as well. The other -m32 tests do not use dg-do run, so they do not exhibit this problem. Okay for trunk? No, this IMHO really should be: /* { dg-do run { target { ia32 fpic } } } */ /* { dg-options -O2 -fPIC } */ All tests in gcc.target/i386 having -m32 (or -m64) in dg-options are buggy and should be fixed, either by adding { target ia32 } to their dg-do compile or whatever other dg-do they have, or adding /* { dg-require-effective-target ia32 } */ line and dropping the -m32 from dg-options. I have committed following testsuite patch that removes -m32 from options. Also, the patch includes check for fpic effective target when -fpic is used. 2014-01-08 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/asm-1.c: Remove dg-options. * gcc.target/i386/incoming-5.c (dg-options): Remove -m32. * gcc.target/i386/pr55433.c (dg-options): Ditto. * gcc.target/i386/pr57848.c (dg-options): Ditto. * gcc.target/i386/pr59099.c (dg-options): Ditto. Require fpic effective target. * gcc.target/i386/pr56246.c (dg-do): Compile for fpic target only. Tested on x86_64-pc-linux-gnu {,-m32}, will be committed to mainline in a moment. Uros. Index: gcc.target/i386/asm-1.c === --- gcc.target/i386/asm-1.c (revision 206436) +++ gcc.target/i386/asm-1.c (working copy) @@ -1,6 +1,5 @@ /* { dg-do compile } */ /* { dg-require-effective-target ia32 } */ -/* { dg-options -m32 } */ register unsigned int EAX asm (r14); /* { dg-error register name } */ Index: gcc.target/i386/incoming-5.c === --- gcc.target/i386/incoming-5.c(revision 206436) +++ gcc.target/i386/incoming-5.c(working copy) @@ -1,6 +1,6 @@ /* PR middle-end/37009 */ /* { dg-do compile { target { { ! *-*-darwin* } ia32 } } } */ -/* { dg-options -m32 -mincoming-stack-boundary=2 -mpreferred-stack-boundary=2 } */ +/* { dg-options -mincoming-stack-boundary=2 -mpreferred-stack-boundary=2 } */ extern void bar (double *); Index: gcc.target/i386/pr55433.c === --- gcc.target/i386/pr55433.c (revision 206436) +++ gcc.target/i386/pr55433.c (working copy) @@ -1,5 +1,5 @@ -/* { dg-do compile {target { *-*-darwin* } } } */ -/* { dg-options -O1 -m32 } */ +/* { dg-do compile { target { *-*-darwin* } } } */ +/* { dg-options -O1 } */ typedef unsigned long long tick_t; extern int foo(void); Index: gcc.target/i386/pr56246.c === --- gcc.target/i386/pr56246.c (revision 206436) +++ gcc.target/i386/pr56246.c (working copy) @@ -1,5 +1,5 @@ /* PR target/56225 */ -/* { dg-do compile { target { ia32 } } } */ +/* { dg-do compile { target { ia32 fpic } } } */ /* { dg-options -O2 -fno-omit-frame-pointer -march=i686 -fpic } */ void NoBarrier_AtomicExchange (long long *ptr) { Index: gcc.target/i386/pr57848.c === --- gcc.target/i386/pr57848.c (revision 206436) +++ gcc.target/i386/pr57848.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O1 -m32 } */ +/* { dg-options -O1 } */ extern unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int); #pragma GCC target(sse4.2) Index: gcc.target/i386/pr59099.c === --- gcc.target/i386/pr59099.c (revision 206436) +++ gcc.target/i386/pr59099.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ -/* { dg-options -O2 -fPIC -m32 } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options -O2 -fPIC } */ void (*pfn)(void);
Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs
Jakub Jelinek ja...@redhat.com writes: 2014-01-08 Jakub Jelinek ja...@redhat.com * target-globals.c (save_target_globals): Allocate most of the structs using GC in payload of target_globals struct instead of allocating them on the heap. Looks good to me FWIW. I don't know either way about the one-big-blob thing. Note that we'll still leak memory when deleting TARGET_OPTION_NODEs because target_ira_int and target_lra_int have pointers to heap-allocated storage. Thanks, Richard
Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
On Tue, Jan 7, 2014 at 8:07 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 01/05/2014 05:10 PM, Teresa Johnson wrote: On Sun, Jan 5, 2014 at 3:39 AM, Bernd Schmidt ber...@codesourcery.com wrote: I have a different patch which I'll submit next week after some more testing. The assert in cfgrtl is unnecessarily broad and really only needs to trigger if -freorder-blocks-and-partition; there's nothing wrong with entering cfglayout after normal bb-reorder. Currently -freorder-blocks-and-partition is the default for x86. I assume that hw-doloop is not enabled for any i386 targets, which is why we haven't seen this? Precisely. And will this mean that -freorder-blocks-and-partition cannot be used for the targets that use hw-doloop? If so, should -freorder-blocks-and-partition be prevented with a warning for those targets? If someone explicitly chooses that option we can turn off the reordering in hw-doloop. That should happen sufficiently rarely that it isn't a problem. That's what the patch below does - bootstraped on x86_64-linux, tested there and with bfin-elf. Ok? Ok, looks good to me. I've also tested that Blackfin still benefits from the hw-doloop reordering code and generates more hardware loops if it's enabled. So we want to be able to run it at -O2. I looked at hw-doloop briefly and since it seems to be doing some manual bb reordering I guess it can't simply be moved before bbro. It seems like a better long-term solution would be to make bbro hw-doloop-aware as Felix suggested earlier. Maybe. It could be argued that the code in hw-doloop is relevant only for a small class of targets so it should only be enabled for them. In any case, that's not stage 3 material and two ports are broken... Ok, that makes sense. Thanks, Teresa Bernd -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: PR 59137: Incorrect liveness info during dbr_schedule
On Wed, Jan 8, 2014 at 8:27 PM, Richard Sandiford wrote: gcc/ PR rtl-optimization/59137 * reorg.c (steal_delay_list_from_target): Call update_block for elided insns. (steal_delay_list_from_fallthrough, relax_delay_slots): Likewise. gcc/testsuite/ PR rtl-optimization/59137 * gcc.target/mips/pr59137.c: New test. This is OK for trunk. For release branches I'll defer to the RMs. Ciao! Steven
Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs
On Wed, Jan 08, 2014 at 07:41:26PM +, Richard Sandiford wrote: Jakub Jelinek ja...@redhat.com writes: 2014-01-08 Jakub Jelinek ja...@redhat.com * target-globals.c (save_target_globals): Allocate most of the structs using GC in payload of target_globals struct instead of allocating them on the heap. Looks good to me FWIW. I don't know either way about the one-big-blob thing. Note that we'll still leak memory when deleting TARGET_OPTION_NODEs because target_ira_int and target_lra_int have pointers to heap-allocated storage. Yeah, perhaps that is something to fix incrementally. But, at least we will not leak ~ 0.5MB per (unique) target attribute used on some unused function. Jakub
PING: PATCH: PRs bootstrap/59580/59583: Improve x86 --with-arch/--with-cpu= configure handling
On Mon, Dec 23, 2013 at 6:14 AM, H.J. Lu hjl.to...@gmail.com wrote: On Sun, Dec 22, 2013 at 11:11:12PM +0100, Uros Bizjak wrote: Please get someone to review config.gcc changes. They are OK as far as x86 rename is concerned, but I can't review functional changes. Hi Paolo, Can you review this config.gcc change? @@ -588,6 +588,22 @@ esac # Common C libraries. tm_defines=$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 +# 32-bit x86 processors supported by --with-arch=. Each processor +# MUST be separated by exactly one space. +x86_archs=athlon athlon-4 athlon-fx athlon-mp athlon-tbird \ +athlon-xp k6 k6-2 k6-3 geode c3 c3-2 winchip-c6 winchip2 i386 i486 \ +i586 i686 pentium pentium-m pentium-mmx pentium2 pentium3 pentium3m \ +pentium4 pentium4m pentiumpro prescott Missing native. x86_archs contains 32-bit x86 processors. native is allowed for 64-bit targets and is included in x86_64_archs. 64-bit processors can be used in --with-arch/--with-cpu= for 32-bit targets. Here is a patch to improve x86 x86 --with-arch/--with-cpu= configure handling. This patch defines 3 variables: 1. x86_archs: It contains 32-bit x86 processors supported by --with-arch=, which aren't allowed for 64-bit targets. 2. x86_64_archs: It contains 64-bit x86 processors supported by --with-arch=, which are allowed for both 32-bit and 64-bit targets. 3. x86_cpus. It contains x86 processors supported by --with-cpu=, which are allowed for both 32-bit and 64-bit targets. Each processor in those 3 variables are separated by exactly one space. Instead of checking if a value of --with-arch/--with-cpu= is valid in many difference places with case ${val} in valid pattern list) OK ;; *) error exit 1 ;; esac and updating all pattern lists when adding a new processor, this patch uses case valid processor list separated by exactly one space in * ${val} *) OK ;; *) error exit 1 ;; esac valid processor list separated by exactly one space is combination of 3 processor variables. It only needs separate a check for empty value with if test x${val} != x; then $val isn't empty else $val is empty fi With this approach, we only need to add new 32-bit processors to x86_archs and new 64-bit processors to x86_64_archs. They will be supported by --with-arch/--with-cpu= automatically. OK to install? Thanks. H.J. --- 2013-12-23 H.J. Lu hongjiu...@intel.com PR bootstrap/59580 PR bootstrap/59583 * config.gcc (x86_archs): New variable. (x86_64_archs): Likewise. (x86_cpus): Likewise. Use $x86_archs, $x86_64_archs and $x86_cpus to check valid --with-arch/--with-cpu= options. Support --with-arch=/--with-cpu={nehalem,westmere, sandybridge,ivybridge,haswell,broadwell,bonnell,silvermont}. diff --git a/gcc/config.gcc b/gcc/config.gcc index 24dbaf9..51eb2b1 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -588,6 +588,22 @@ esac # Common C libraries. tm_defines=$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 +# 32-bit x86 processors supported by --with-arch=. Each processor +# MUST be separated by exactly one space. +x86_archs=athlon athlon-4 athlon-fx athlon-mp athlon-tbird \ +athlon-xp k6 k6-2 k6-3 geode c3 c3-2 winchip-c6 winchip2 i386 i486 \ +i586 i686 pentium pentium-m pentium-mmx pentium2 pentium3 pentium3m \ +pentium4 pentium4m pentiumpro prescott +# 64-bit x86 processors supported by --with-arch=. Each processor +# MUST be separated by exactly one space. +x86_64_archs=amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \ +bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \ +core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \ +sandybridge ivybridge haswell broadwell bonnell silvermont x86-64 native +# Additional x86 processors supported by --with-cpu=. Each processor +# MUST be separated by exactly one space. +x86_cpus=generic intel + # Common parts for widely ported systems. case ${target} in *-*-darwin*) @@ -1392,20 +1408,21 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i done TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'` need_64bit_isa=yes - case X${with_cpu} in - Xgeneric|Xintel|Xatom|Xslm|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver4|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) - ;; - X) + if test x$with_cpu = x; then if test x$with_cpu_64 = x; then with_cpu_64=generic fi - ;; -
[PATCH,rs6000,committed] Remove duplicates from altivec_overloaded_builtins
This patch removes a couple of redundant entries I noticed in altivec_overloaded_builtins. Identical entries occur nearby. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no regressions, applied as obvious. Thanks, Bill 2014-01-08 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove two duplicate entries. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 206375) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -608,10 +608,6 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V8HI, 0, 0 }, - { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW, -RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 }, - { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW, -RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_VUPKHSH, P8V_BUILTIN_VUPKHSW, RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_VUPKHSH, P8V_BUILTIN_VUPKHSW,
Re: [PATCH, AArch64 5/6] soft-fp: Define UDWtype for longlong.h
soft-fp patches should go first to glibc. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, AArch64 4/6] soft-fp: Commonize creation of TImode types
On Wed, 8 Jan 2014, Richard Henderson wrote: diff --git a/libgcc/soft-fp/soft-fp.h b/libgcc/soft-fp/soft-fp.h index 696fc86..b54b1ed 100644 --- a/libgcc/soft-fp/soft-fp.h +++ b/libgcc/soft-fp/soft-fp.h @@ -237,6 +237,11 @@ typedef int DItype __attribute__ ((mode (DI))); typedef unsigned int UQItype __attribute__ ((mode (QI))); typedef unsigned int USItype __attribute__ ((mode (SI))); typedef unsigned int UDItype __attribute__ ((mode (DI))); +#if _FP_W_TYPE_SIZE == 64 +typedef int TItype __attribute__ ((mode (TI))); +typedef unsigned int UTItype __attribute__ ((mode (TI))); +#endif This isn't the right conditional. _FP_W_TYPE_SIZE is ultimately an optimization choice and need not be related to whether any TImode functions are being defined using soft-fp, or whether TImode is supported at all. I think the most you can do is have sfp-machine.h define a macro to say that TImode should be supported in soft-fp, rather than actually defining the types itself. (If someone were to use soft-fp on hppa64, then they might well use _FP_W_TYPE_SIZE == 64, but hppa64 doesn't support TImode.) -- Joseph S. Myers jos...@codesourcery.com
microMIPS jump instructions
Hi Richard, It looks like the microMIPS implementation is missing support for the JRC instruction and also misses an opportunity to generate JALS. I've attached a patch, plus some new test cases to correct this. Does this look okay to commit? I'd like to get it in 4.9. Thanks, Catherine jrc-jals.cl Description: jrc-jals.cl jrc-jals.patch Description: jrc-jals.patch
[PATCH] Fix for PR 59524
Hello Everyone, Attached, please find a patch will fix the bug mentioned in PR 59524. The main issue was that Cilk keywords tests are running even when the user configured the compiler with --disable-libcilkrts. This patch should fix this issue for C and C++. This is tested on x86 and x86_64. Here are the ChangeLog entries gcc/testsuite/ChangeLog +2014-01-08 Balaji V. Iyer balaji.v.i...@intel.com + + PR testsuite/59524 + * gcc.dg/cilk-plus/cilk-plus.exp: Make sure the cilk keywords tests + are run only if the Cilk library is available/enabled. + * g++.dg/cilk-plus/cilk-plus.exp: Likewise. + * lib/target-supports.exp (check_libcilkrts_available): New function. + Is this Ok for trunk? Thanks, Balaji V. Iyer. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 519d472..e0a0e43 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,11 @@ +2014-01-08 Balaji V. Iyer balaji.v.i...@intel.com + + PR testsuite/59524 + * gcc.dg/cilk-plus/cilk-plus.exp: Make sure the cilk keywords tests + are run only if the Cilk library is available/enabled. + * g++.dg/cilk-plus/cilk-plus.exp: Likewise. + * lib/target-supports.exp (check_libcilkrts_available): New function. + 2014-01-07 Yufeng Zhang yufeng.zh...@arm.com * gcc.target/arm/neon/vst1Q_laneu64-1.c: New test. diff --git a/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp b/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp index e201fd2..b08be25 100644 --- a/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp +++ b/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp @@ -47,9 +47,7 @@ dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -g dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -g -O2 -ftree-vectorize -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -g -O3 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]] -O3 -ftree-vectorize -fcilkplus -g -dg-finish -dg-init dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -O0 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -O1 -fcilkplus @@ -61,25 +59,17 @@ dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -g -O1 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -g -O2 -ftree-vectorize -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -g -O3 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -O3 -ftree-vectorize -fcilkplus -g -dg-finish -dg-init -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O1 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O2 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O3 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -O2 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -O3 -fcilkplus -dg-finish +if { [check_libcilkrts_available] } { +dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O1 -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O3 -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -O2 -fcilkplus -dg-init -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O1 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O2 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -O2 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -O3 -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O1 -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -fcilkplus +dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -O2 -fcilkplus + } dg-finish unset TEST_EXTRA_LIBS diff --git
Re: microMIPS jump instructions
Moore, Catherine catherine_mo...@mentor.com writes: 2014-01-08 Catherine Moore c...@codesourcery.com gcc/testsuite/ * gcc.target/mips/umips-branch-3.c: New test. * gcc.target/mips/umips-branch-4.c: New test. gcc/ * config/mips/mips.md (simple_return): Attempt to use JRC for microMIPS. * config/mips/mips.h (MIPS_CALL): Attempt to use JALS for microMIPS. OK, thanks, but: Index: gcc/config/mips/mips.md === --- gcc/config/mips/mips.md (revision 206407) +++ gcc/config/mips/mips.md (working copy) @@ -1,5 +1,5 @@ ;; Mips.md Machine Description for MIPS based processors -;; Copyright (C) 1989-2014 Free Software Foundation, Inc. +;; Copyright (C) 1989-2013 Free Software Foundation, Inc. ;; Contributed by A. Lichnewsky, l...@inria.inria.fr ;; Changes by Michael Meissner, meiss...@osf.org ;; 64-bit r4000 support by Ian Lance Taylor, i...@cygnus.com, and please drop this bit. Richard
Re: [PATCH,rs6000,committed] Remove duplicates from altivec_overloaded_builtins
On Wed, Jan 8, 2014 at 3:15 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: This patch removes a couple of redundant entries I noticed in altivec_overloaded_builtins. Identical entries occur nearby. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no regressions, applied as obvious. Thanks, Bill 2014-01-08 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove two duplicate entries. Okay, good catch. Thanks, David
Re: [PATCH,rs6000] Add -maltivec={le,be} options
On Tue, Jan 7, 2014 at 6:59 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2014-01-07 at 22:18 +, Joseph S. Myers wrote: On Tue, 7 Jan 2014, Bill Schmidt wrote: Yes, sorry for not being more clear. This is indeed for interpretation of element numbers in Altivec intrinsics such as vec_splat, vec_extract, vec_insert, and so forth. By default these will match array element order for the target endianness. But with -maltivec=be for a little endian target, we will force use of big-endian element order (matching the behavior of the underlying hardware instructions). Thanks for the explanation. I think you should make the .texi documentation say something more like this. Sure, I can wordsmith something along those lines. Thanks for the feedback! This patch is okay with the documentation clarification requested by Joseph. I also would suggest removing but may be enabled in the future from the le option and limit the comment to ignored on big-endian targets. Also, please add a comment to -maltivec that it defaults to the native endian order. And for -maltivec=be, please state that this is the default for big-endian; for -maltivec=le, please state that this is the default for little-endian. It's important to be clear and redundant in this type of documentation. Thanks, David
[patch][i386] Remove code executed only if reload_in_progress (i.e. never)
Hello Uros, and everyone else, Now that LRA is always used for the i386 targets, reload_in_progress is never set so all code conditional on it is now dead. The attached patch removes this code. Sadly I'm having difficulty testing the patch because I have no access to a suitable x86_64 or ix86 box :-) I'll try to test the patch on a compile farm machine, but I'm already posting the patch to hear if this is still OK for this late stage of the development cycle. It's not as if we're going to go back to reload so the code really is dead AFAICT, but it's obviously not a bug fix. Ciao! Steven * i386/i386.c (legitimize_pic_address): Remove never-executed code, reload_in_progress is never set if LRA is used. (legitimize_tls_address): Likewise. (ix86_expand_move): Likewise. (ix86_expand_binary_operator): Likewise. (ix86_expand_unary_operator): Likewise. * i386/predicates.md (index_register_operand): Likewise. Index: config/i386/i386.c === --- config/i386/i386.c (revision 206444) +++ config/i386/i386.c (working copy) @@ -13013,11 +13013,7 @@ legitimize_pic_address (rtx orig, rtx reg) ix86_cmodel != CM_SMALL_PIC gotoff_operand (addr, Pmode)) { rtx tmpreg; - /* This symbol may be referenced via a displacement from the PIC -base address (@GOTOFF). */ - if (reload_in_progress) - df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true); if (GET_CODE (addr) == CONST) addr = XEXP (addr, 0); if (GET_CODE (addr) == PLUS) @@ -13046,11 +13042,6 @@ legitimize_pic_address (rtx orig, rtx reg) } else if (!TARGET_64BIT !TARGET_PECOFF gotoff_operand (addr, Pmode)) { - /* This symbol may be referenced via a displacement from the PIC -base address (@GOTOFF). */ - - if (reload_in_progress) - df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true); if (GET_CODE (addr) == CONST) addr = XEXP (addr, 0); if (GET_CODE (addr) == PLUS) @@ -13108,11 +13099,6 @@ legitimize_pic_address (rtx orig, rtx reg) } else { - /* This symbol must be referenced via a load from the -Global Offset Table (@GOT). */ - - if (reload_in_progress) - df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true); new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_GOT); new_rtx = gen_rtx_CONST (Pmode, new_rtx); if (TARGET_64BIT) @@ -13164,8 +13150,6 @@ legitimize_pic_address (rtx orig, rtx reg) { if (!TARGET_64BIT) { - if (reload_in_progress) - df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true); new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0), UNSPEC_GOTOFF); new_rtx = gen_rtx_PLUS (Pmode, new_rtx, op1); @@ -13453,8 +13437,6 @@ legitimize_tls_address (rtx x, enum tls_model mode } else if (flag_pic) { - if (reload_in_progress) - df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true); pic = pic_offset_table_rtx; type = TARGET_ANY_GNU_TLS ? UNSPEC_GOTNTPOFF : UNSPEC_GOTTPOFF; } @@ -16644,10 +16626,8 @@ ix86_expand_move (enum machine_mode mode, rtx oper /* dynamic-no-pic */ if (MACHOPIC_INDIRECT) { - rtx temp = ((reload_in_progress - || ((op0 REG_P (op0)) - mode == Pmode)) - ? op0 : gen_reg_rtx (Pmode)); + rtx temp = (op0 REG_P (op0) mode == Pmode) + ? op0 : gen_reg_rtx (Pmode); op1 = machopic_indirect_data_reference (op1, temp); if (MACHOPIC_PURE) op1 = machopic_legitimize_pic_address (op1, mode, @@ -17318,16 +17298,9 @@ ix86_expand_binary_operator (enum rtx_code code, e /* Emit the instruction. */ op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_ee (code, mode, src1, src2)); - if (reload_in_progress) -{ - /* Reload doesn't know about the flags register, and doesn't know that - it doesn't want to clobber it. We can only do this with PLUS. */ - gcc_assert (code == PLUS); - emit_insn (op); -} - else if (reload_completed - code == PLUS - !rtx_equal_p (dst, src1)) + if (reload_completed + code == PLUS + !rtx_equal_p (dst, src1)) { /* This is going to be an LEA; avoid splitting it later. */ emit_insn (op); @@ -17494,13 +17467,8 @@ ix86_expand_unary_operator (enum rtx_code code, en /* Emit the instruction. */ op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_e (code, mode, src)); - if (reload_in_progress || code == NOT) -{ - /* Reload doesn't know about the flags register, and doesn't know that -
Re: [PATCH,rs6000] Add -maltivec={le,be} options
On Wed, 2014-01-08 at 16:46 -0500, David Edelsohn wrote: On Tue, Jan 7, 2014 at 6:59 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2014-01-07 at 22:18 +, Joseph S. Myers wrote: On Tue, 7 Jan 2014, Bill Schmidt wrote: Yes, sorry for not being more clear. This is indeed for interpretation of element numbers in Altivec intrinsics such as vec_splat, vec_extract, vec_insert, and so forth. By default these will match array element order for the target endianness. But with -maltivec=be for a little endian target, we will force use of big-endian element order (matching the behavior of the underlying hardware instructions). Thanks for the explanation. I think you should make the .texi documentation say something more like this. Sure, I can wordsmith something along those lines. Thanks for the feedback! This patch is okay with the documentation clarification requested by Joseph. I also would suggest removing but may be enabled in the future from the le option and limit the comment to ignored on big-endian targets. Also, please add a comment to -maltivec that it defaults to the native endian order. And for -maltivec=be, please state that this is the default for big-endian; for -maltivec=le, please state that this is the default for little-endian. It's important to be clear and redundant in this type of documentation. Thanks, David OK, thanks very much for the review. I'll clean up the documentation as requested this evening. Thanks, Bill
Re: [patch][i386] Remove code executed only if reload_in_progress (i.e. never)
On Wed, Jan 08, 2014 at 10:51:53PM +0100, Steven Bosscher wrote: Hello Uros, and everyone else, Now that LRA is always used for the i386 targets, reload_in_progress is never set so all code conditional on it is now dead. The attached patch removes this code. Sadly I'm having difficulty testing the patch because I have no access to a suitable x86_64 or ix86 box :-) I'll try to test the patch on a compile farm machine, but I'm already posting the patch to hear if this is still OK for this late stage of the development cycle. It's not as if we're going to go back to reload so the code really is dead AFAICT, but it's obviously not a bug fix. While LRA is always on, making it harder to test with reload doesn't seem to be a good idea to me for 4.9, when some RA issue is reported for these architectures, often one just patches config/i386/i386.c by hand to enable reload instead of LRA and tests it with that instead. This patch would mean we'd need to keep around a patchset to apply for those purposes. * i386/i386.c (legitimize_pic_address): Remove never-executed code, reload_in_progress is never set if LRA is used. (legitimize_tls_address): Likewise. (ix86_expand_move): Likewise. (ix86_expand_binary_operator): Likewise. (ix86_expand_unary_operator): Likewise. * i386/predicates.md (index_register_operand): Likewise. config/ prefix would be needed in the ChangeLog entries. Jakub
Re: [RFC] libgcov.c re-factoring and offline profile-tool
Here is the patch that addresses Honza's concern about bss increment. It just makes this_prg a local variable. Some comments are inlined. On Fri, Dec 6, 2013 at 6:23 AM, Jan Hubicka hubi...@ucw.cz wrote: Do you know how the size of libgcov changed with your patch? Quick check of current mainline on compiling empty main gives: jh@gcc10:~/trunk/build/gcc$ cat t.c main() { } jh@gcc10:~/trunk/build/gcc$ ./xgcc -B ./ -O2 -fprofile-generate -o a.out-new --static t.c jh@gcc10:~/trunk/build/gcc$ gcc -O2 -fprofile-generate -o a.out-old --static t.c jh@gcc10:~/trunk/build/gcc$ size a.out-old textdata bss dec hex filename 6081413560 16728 628429 996cd a.out-old jh@gcc10:~/trunk/build/gcc$ size a.out-new textdata bss dec hex filename 6126213688 22880 639189 9c0d5 a.out-new Without profiling I get: jh@gcc10:~/trunk/build/gcc$ size a.out-new-no jh@gcc10:~/trunk/build/gcc$ size a.out-old-no textdata bss dec hex filename 5997193448 12568 615735 96537 a.out-old-no textdata bss dec hex filename 6002473448 12568 616263 96747 a.out-new-no Quite big for empty program, but mostly glibc fault, I suppose (that won't be an issue for embedded platforms). But anyway we increased text size overhead from 8k to 12k, BSS size overhead from 4k to 10k and data by another 1k. I think it would more fair to compare r204729 and r204730. Your comparison had some other changes in libgcov such as time_profiler and indirecto_call_profiler_v2. Using the same empty t.c, for r204729, we have xur2%208:gcc ./xgcc -B ./ -O2 -fprofile-generate --static -o a.out-r204729 t.c xur2%209:gcc size a.out-r204729 text databssdechex filename 803207 6352 15448 825007 c96af a.out-r204729 xur2%210:gcc ./xgcc -B ./ -O2 --static -o a.out-r204729-no t.c xur2%211:gcc size a.out-r204729-no text databssdechex filename 790337 6112 11336 807785 c5369 a.out-r204729-no For r204730, we have xur2%216:gcc ./xgcc -B ./ -O2 -fprofile-generate --static -o a.out-r204730 t.c xur2%217:gcc size a.out-r204730 text databssdechex filename 802919 6384 21592 830895 cadaf a.out-r204730 xur2%218:gcc ./xgcc -B ./ -O2 --static -o a.out-r204730-no t.c xur2%219:gcc size a.out-r204730-no text databssdechex filename 790337 6112 11336 807785 c5369 a.out-r204730-no r204730 actually has smaller text, data size with -fprofile-generate. You are right about there are 6kb more bss space due to the static variables introduced. It mostly caused by this_prg object. With the attached trunk patch that localizes this_prg, we have xur2%42:fdo size a.out-new text databssdechex filename 803479 6456 15512 825447 c9867 a.out-new xur2%43:fdo size a.out-new-no text databssdechex filename 790545 6112 11368 808025 c5459 a.out-new-no We are now using 64 more bytes in m64. Objects size for r204730: text databssdechex filename 57 0 0 57 39 _gcov_average_profiler.o 66 0 0 66 42 _gcov_dump.o 516 0 0516204 _gcov_execle.o 476 0 04761dc _gcov_execl.o 476 0 04761dc _gcov_execlp.o 108 0 0108 6c _gcov_execve.o 98 0 0 98 62 _gcov_execv.o 98 0 0 98 62 _gcov_execvp.o 126 0 40166 a6 _gcov_flush.o 101 0 0101 65 _gcov_fork.o 122 0 0122 7a _gcov_indirect_call_profiler.o 178 0 16194 c2 _gcov_indirect_call_profiler_v2.o 89 0 0 89 59 _gcov_interval_profiler.o 52 0 0 52 34 _gcov_ior_profiler.o 126 0 0126 7e _gcov_merge_add.o 242 0 0242 f2 _gcov_merge_delta.o 126 0 0126 7e _gcov_merge_ior.o 251 0 0251 fb _gcov_merge_single.o 156 0 0156 9c _gcov_merge_time_profile.o 9252 0 6144 15396 3c24 _gcov.o 115 0 0115 73 _gcov_one_value_profiler.o 69 0 0 69 45 _gcov_pow2_profiler.o 66 0 0 66 42 _gcov_reset.o 77 0 8 85 55 _gcov_time_profiler.o Objects size for r204729: text databssdechex filename 57 0 0 57 39 _gcov_average_profiler.o 72 0 0 72 48 _gcov_dump.o 516 0 0516204 _gcov_execle.o 476 0 04761dc _gcov_execl.o 476 0 04761dc _gcov_execlp.o 108 0 0108 6c _gcov_execve.o 98 0 0 98 62 _gcov_execv.o 98 0 0 98 62 _gcov_execvp.o 101 0 0101 65 _gcov_fork.o 122 0 0122 7a
Re: [Patch] Regex bracket matcher cache optimization
On Wed, Jan 8, 2014 at 5:20 AM, Paolo Carlini paolo.carl...@oracle.com wrote: On 01/08/2014 10:24 AM, Jonathan Wakely wrote: Ouch! Yes, that's quite a bit slower, and this code is already very slow to compile. With this patch (who is based on a-fixed.diff, committed earlerly), who use templated member functions instead of templating the whole _Compiler, time consumption is: g++ -g -Wall -std=c++11 -g -Wall -std=c++11 -O3 regextest.cc 3.79s user 0.14s system 98% cpu 3.981 total Comparing to 4.5s it's better and probably fine. Booted and tested with -m32 and -m64 respectively. I only want to add that, besides keeping compile-time under control for 4.9.0 - please investigate a bit more along the mentioned lines - we should also start experimenting with exporting the instantiations. I don't know what the other implementations are doing, but in general it definitely makes sense, for compile-time performance too. I think we already said that some time ago, but the issue seems more important now. Maybe it's really unavoidable if we need template complexity for first class run-time performance. After this patch I plan to instantiate _Compiler and _Executor. -- Regards, Tim Shen
[MIPS, committed] Fix all but one gcc.dg/tree-ssa failure
Some of the tests were failing due to the branch cost and some were failing due to !LOGICAL_OP_NON_SHORT_CIRCUIT. I just skipped the latter, as for ARM Cortex-M. I'll look at the gcc.dg/tree-ssa/ssa-dom-thread-4.c failure separately. Tested on mips64-linux-gnu and applied. Thanks, Richard gcc/testsuite/ * gcc.dg/tree-ssa/reassoc-32.c, gcc.dg/tree-ssa/reassoc-33.c, gcc.dg/tree-ssa/reassoc-34.c, gcc.dg/tree-ssa/reassoc-35.c, gcc.dg/tree-ssa/reassoc-36.c: Extend -mbranch-cost handling to MIPS. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c, gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c, gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c, gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c, gcc.dg/tree-ssa/vrp87.c, gcc.dg/tree-ssa/forwprop-28.c: Skip for MIPS. Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c 2014-01-08 22:11:48.552943720 + +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c 2014-01-08 22:11:50.069956983 + @@ -1,7 +1,7 @@ /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*} } } */ /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */ -/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */ +/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */ int test (int a, int b, int c) Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c 2014-01-08 22:11:48.553943729 + +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c 2014-01-08 22:11:50.070956992 + @@ -1,7 +1,7 @@ /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-* hppa*-*-*} } } */ /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */ -/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */ +/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */ int test (int a, int b, int c) { Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c 2014-01-08 22:11:48.552943720 + +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c 2014-01-08 22:11:50.070956992 + @@ -1,7 +1,7 @@ /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-* hppa*-*-*} } } */ /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */ -/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */ +/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */ int test (int a, int b, int c) { Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c 2014-01-08 22:11:48.553943729 + +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c 2014-01-08 22:11:50.070956992 + @@ -1,7 +1,7 @@ /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-* hppa*-*-*} } } */ /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */ -/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */ +/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */ int test (unsigned int a, int b, int c) { Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c 2014-01-08 22:11:48.553943729 + +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c 2014-01-08 22:11:50.070956992 + @@ -1,7 +1,7 @@ /* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-* hppa*-*-*} } } */ /* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */ -/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */ +/* { dg-additional-options -mbranch-cost=2 { target mips*-*-* avr-*-* } } */ int test (int a, int b, int c) { Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c2014-01-08 22:11:48.552943720 + +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c2014-01-08 22:11:50.070956992 + @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-*
Re: [RFC] libgcov.c re-factoring and offline profile-tool
On Wed, Dec 18, 2013 at 9:28 AM, Xinliang David Li davi...@google.com wrote: #ifdef L_gcov_merge_ior /* The profile merging function that just adds the counters. It is given - an array COUNTERS of N_COUNTERS old counters and it reads the same number - of counters from the gcov file. */ + an array COUNTERS of N_COUNTERS old counters. + When SRC==NULL, it reads the same number of counters from the gcov file. + Otherwise, it reads from SRC array. */ void -__gcov_merge_ior (gcov_type *counters, unsigned n_counters) +__gcov_merge_ior (gcov_type *counters, unsigned n_counters, + gcov_type *src, unsigned w __attribute__ ((unused))) So the new in-memory variants are introduced for merging tool, while libgcc use gcov_read_counter interface? Perhaps we can actually just duplicate the functions to avoid runtime to do all the scalling and in_mem tests it won't need? I thought about this one a little. How about making the interface change conditionally, but still share the implementation? The merge function bodies mostly remain unchanged and there is no runtime penalty for libgcov. The new macros can be shared across most of the mergers. #ifdef IN_PREOFILE_TOOL #define GCOV_MERGE_EXTRA_ARGS gcov_type *src, unsigned w #define GCOV_READ_COUNTER *(src++) * w #else #define GCOV_MERGE_EXTRA_ARGS #define GCOV_READ_COUNTER gcov_read_counter () #endif __gcov_merge_add (gcov_type *counters, unsigned n_counters, GCOV_MERGE_EXTRA_ARGS) { for (; n_counters; counters++, n_counters--) { *counters += GCOV_READ_COUNTER ; } } thanks, Personally I don't think the run time test of in_mem will cause any issue. This is in profile dumping, why don't we care a few more cycle heres? it won't pollute the profile. If you really don't like that, we can use the above approach, or I can hide the logic in gcov_read_counter(), i.e. overload gcov_read_counter() in profile_tool. For that, I will need a new global variable SRC and set it before calling the merge function. I would prefer to keep weight in _gcov_merge_* argument list. What do you think? -Rong David I would suggest going with libgcov.h changes and clenaups first, with interface changes next and the gcov-tool is probably quite obvious at the end? Do you think you can split the patch this way? Thanks and sorry for taking long to review. I should have more time again now. Honza
Re: [MIPS, committed] Revert some Octeon BADDU patches
This patch just reverts some changes I'd made to the BADDU patterns for the infamous (truncate:QI (plus:SI ...)) - (plus:QI ...) simplification. That simplification was limited to CISCy targets for PR 58295. Tested on mips64-linux-gnu and applied. It fixes the octeon-baddu-1.c failures. You presumably need to apply it to the 4.8 branch as well. -- Eric Botcazou
Re: [RFC] libgcov.c re-factoring and offline profile-tool
On Fri, Dec 6, 2013 at 6:23 AM, Jan Hubicka hubi...@ucw.cz wrote: @@ -325,6 +311,9 @@ static struct gcov_summary all_prg; #endif /* crc32 for this program. */ static gcov_unsigned_t crc32; +/* Use this summary checksum rather the computed one if the value is + *non-zero. */ +static gcov_unsigned_t saved_summary_checksum; Why do you need to save the checksum? Won't it reset summary back with multiple streaming? This was for the gcov_tool. checksum will be recomputed in gcov_exit and the value will depend on the order of gcov_info list. (the order will be different after reading from gcda files to memory). The purpose was to have the same summary_checksum so that I can get identical gcov-dump output. I would really like to avoid introducing those static vars that are used exclusively by gcov_exit. What about putting them into an gcov_context structure that is passed around the functions that was broken out? With my recently patch the localizes this_prg, we only use 64 more bytes in bss. Do you still we have to remove all these statics?
Re: [PATCH] Fix PR59471
On Tue, Jan 07, 2014 at 03:54:56PM +0100, Richard Biener wrote: 2014-01-07 Richard Biener rguent...@suse.de PR middle-end/59471 * gimplify.c (gimplify_expr): Gimplify register-register type VIEW_CONVERT_EXPRs to separate stmts. * gcc.dg/pr59471.c: New testcase. The testcase fails on i686-linux, because of the ABI warnings. I've verified following change ICEd without your fix and works with your fix, bootstrapped/regtested it on x86_64-linux and i686-linux and committed to trunk as obvious. 2014-01-08 Jakub Jelinek ja...@redhat.com PR middle-end/59471 * gcc.dg/pr59471.c (foo): Avoid vector type arguments or return type, use pointers to vector type instead. --- gcc/testsuite/gcc.dg/pr59471.c.jj 2014-01-08 10:23:20.0 +0100 +++ gcc/testsuite/gcc.dg/pr59471.c 2014-01-08 17:52:42.0 +0100 @@ -9,8 +9,8 @@ __attribute__ ((__vector_size__ (16))); typedef unsigned int uint32x4_t __attribute__ ((__vector_size__ (16))); -uint8x4_t -foo (uint16x8_t x) +void +foo (uint16x8_t *x, uint8x4_t *y) { - return (uint8x4_t) ((uint32x4_t) x)[0]; + *y = (uint8x4_t) ((uint32x4_t) (*x))[0]; } Jakub
Re: [Patch] Regex bracket matcher cache optimization
Hi, On 01/08/2014 11:11 PM, Tim Shen wrote: On Wed, Jan 8, 2014 at 5:20 AM, Paolo Carlini paolo.carl...@oracle.com wrote: On 01/08/2014 10:24 AM, Jonathan Wakely wrote: Ouch! Yes, that's quite a bit slower, and this code is already very slow to compile. With this patch (who is based on a-fixed.diff, committed earlerly), who use templated member functions instead of templating the whole _Compiler, time consumption is: g++ -g -Wall -std=c++11 -g -Wall -std=c++11 -O3 regextest.cc 3.79s user 0.14s system 98% cpu 3.981 total Comparing to 4.5s it's better and probably fine. Booted and tested with -m32 and -m64 respectively. I agree, it's probably fine for now, but please actually attach the patch ;) Paolo.
Fix segfault with weak external symbols
This is a regression present on the mainline for weak external symbols and languages with non-call exceptions: 0xb222df crash_signal /home/eric/svn/gcc/gcc/toplev.c:337 0x75ed9c symtab_alias_ultimate_target(symtab_node*, availability*) /home/eric/svn/gcc/gcc/symtab.c:989 0xb69a59 varpool_variable_node /home/eric/svn/gcc/gcc/cgraph.h:1430 0xb69a59 tree_could_trap_p(tree_node*) /home/eric/svn/gcc/gcc/tree-eh.c:2691 0xb6a85c stmt_could_throw_1_p /home/eric/svn/gcc/gcc/tree-eh.c:2751 0xb6a85c stmt_could_throw_p(gimple_statement_base*) /home/eric/svn/gcc/gcc/tree-eh.c:2780 0xb6d46f lower_eh_constructs_2 /home/eric/svn/gcc/gcc/tree-eh.c:2028 0xb6d46f lower_eh_constructs_1 /home/eric/svn/gcc/gcc/tree-eh.c:2123 0xb6f871 lower_eh_constructs /home/eric/svn/gcc/gcc/tree-eh.c:2141 0xb6f871 execute /home/eric/svn/gcc/gcc/tree-eh.c:2193 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. In tree_could_trap_p: case VAR_DECL: /* Assume that accesses to weak vars may trap, unless we know they are certainly defined in current TU or in some other LTO partition. */ if (DECL_WEAK (expr)) { struct varpool_node *node; if (!DECL_EXTERNAL (expr)) return false; node = varpool_variable_node (varpool_get_node (expr), NULL); if (node node-symbol.in_other_partition) return false; return true; } return false; The problem is that varpool_get_node returns NULL and varpool_variable_node (and its callee symtab_alias_ultimate_target) chokes on the NULL. This is a regression from the 4.8.x series, where the same NULL goes through the function without a hitch. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2014-01-08 Eric Botcazou ebotca...@adacore.com * cgraph.h (varpool_variable_node): Do not choke on null node. 2014-01-08 Eric Botcazou ebotca...@adacore.com * gnat.dg/weak2.ad[sb]: New test. -- Eric BotcazouIndex: cgraph.h === --- cgraph.h (revision 206418) +++ cgraph.h (working copy) @@ -1426,8 +1426,12 @@ varpool_variable_node (varpool_node *nod { varpool_node *n; - n = dyn_cast varpool_node (symtab_alias_ultimate_target (node, - availability)); + if (node) +n = dyn_cast varpool_node (symtab_alias_ultimate_target (node, + availability)); + else +n = NULL; + if (!n availability) *availability = AVAIL_NOT_AVAILABLE; return n; -- { dg-do compile } package body Weak2 is function F return Integer is begin return Var; end; end Weak2; package Weak2 is Var : Integer; pragma Import (Ada, Var, var_name); pragma Weak_External (Var); function F return Integer; end Weak2;
[PATCH] Fix cfgcleanup regression (PR rtl-optimization/59724)
On Wed, Jan 08, 2014 at 05:54:55PM +0100, Uros Bizjak wrote: This caused PR59724 on alpha: 20021116-1.c: In function ‘foo’: 20021116-1.c:31:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 9 } ^ 20021116-1.c:31:1: error: insn outside basic block (jump_insn 94 52 93 9 (return) 20021116-1.c:31 -1 (nil) - return) Ugh, indeed. The problem is that try_head_merge_bb really wants flow_find_head_matching_sequence to count all (non-note) insns, not just active insns, because otherwise as in the above testcase we can have e.g. 2 active insns followed by one non-active, all matching (flow_find_head_matching_sequence returns 2) and on another edge just 2 active insns and nothing else matching. 2 == 2, so the caller thinks it doesn't matter which one is shorter, but we have the insn range of 3 insns together. So, this patch just reverts the try_head_merge_bb changes and makes flow_find_head_matching_sequence behave the old way when called from try_head_merge_bb, i.e. count all non-note insns, and only when called from ifcvt.c count just active insns. Plus the ifcvt.c change ensures we don't mistakenly call it with stop_after == 0 (which wouldn't actually stop). Bootstrapped/regtested on x86_64-linux and i686-linux, Uros is testing it on Alpha. Ok for trunk? 2014-01-08 Jakub Jelinek ja...@redhat.com PR rtl-optimization/59724 * ifcvt.c (cond_exec_process_if_block): Don't call flow_find_head_matching_sequence with 0 longest_match. * cfgcleanup.c (flow_find_head_matching_sequence): Count even non-active insns if !stop_after. (try_head_merge_bb): Revert 2014-01-07 changes. --- gcc/ifcvt.c.jj 2014-01-08 10:23:20.0 +0100 +++ gcc/ifcvt.c 2014-01-08 18:46:17.017715169 +0100 @@ -522,7 +522,10 @@ cond_exec_process_if_block (ce_if_block n_insns -= 2 * n_matching; } - if (then_start else_start) + if (then_start + else_start + then_n_insns n_matching + else_n_insns n_matching) { int longest_match = MIN (then_n_insns - n_matching, else_n_insns - n_matching); --- gcc/cfgcleanup.c.jj 2014-01-07 08:54:05.772736321 +0100 +++ gcc/cfgcleanup.c2014-01-08 18:41:14.433307914 +0100 @@ -1421,7 +1421,8 @@ flow_find_cross_jump (basic_block bb1, b /* Like flow_find_cross_jump, except start looking for a matching sequence from the head of the two blocks. Do not include jumps at the end. If STOP_AFTER is nonzero, stop after finding that many matching - instructions. */ + instructions. If STOP_AFTER is zero, count all INSN_P insns, if it is + non-zero, only count active insns. */ int flow_find_head_matching_sequence (basic_block bb1, basic_block bb2, rtx *f1, @@ -1493,7 +1494,7 @@ flow_find_head_matching_sequence (basic_ beforelast1 = last1, beforelast2 = last2; last1 = i1, last2 = i2; - if (active_insn_p (i1)) + if (!stop_after || active_insn_p (i1)) ninsns++; } @@ -2408,7 +2409,9 @@ try_head_merge_bb (basic_block bb) max_match--; if (max_match == 0) return false; - e0_last_head = prev_active_insn (e0_last_head); + do + e0_last_head = prev_real_insn (e0_last_head); + while (DEBUG_INSN_P (e0_last_head)); } if (max_match == 0) @@ -2428,14 +2431,16 @@ try_head_merge_bb (basic_block bb) basic_block merge_bb = EDGE_SUCC (bb, ix)-dest; rtx head = BB_HEAD (merge_bb); - if (!active_insn_p (head)) - head = next_active_insn (head); + while (!NONDEBUG_INSN_P (head)) + head = NEXT_INSN (head); headptr[ix] = head; currptr[ix] = head; /* Compute the end point and live information */ for (j = 1; j max_match; j++) - head = next_active_insn (head); + do + head = NEXT_INSN (head); + while (!NONDEBUG_INSN_P (head)); simulate_backwards_to_point (merge_bb, live, head); IOR_REG_SET (live_union, live); } Jakub
Re: [Patch] Regex bracket matcher cache optimization
On Wed, Jan 8, 2014 at 5:38 PM, Paolo Carlini paolo.carl...@oracle.com wrote: I agree, it's probably fine for now, but please actually attach the patch ;) Oops sorry . So my plan is to instantiate _Compiler and _Executor instead of user interfaces like basic_regex or regex_match, because the implementation may change (say add a new executor) later. Is that Ok? -- Regards, Tim Shen commit d9f47e783680a1cab86bd704e67236025cbdff18 Author: tim timshe...@gmail.com Date: Mon Jan 6 00:03:41 2014 -0500 2014-01-08 Tim Shen timshe...@gmail.com * bits/regex_automaton.tcc: Indentation fix. * bits/regex_compiler.h (__compile_nfa(), _Compiler, _RegexTranslator _AnyMatcher, _CharMatcher, _BracketMatcher): Add bool option template parameters and specializations to make matching more efficient and space saving. * bits/regex_compiler.tcc: Likewise. diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc index 7edc67f..e222803 100644 --- a/libstdc++-v3/include/bits/regex_automaton.tcc +++ b/libstdc++-v3/include/bits/regex_automaton.tcc @@ -134,9 +134,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _NFA_TraitsT::_M_dot(std::ostream __ostr) const { __ostr digraph _Nfa {\n - rankdir=LR;\n; + rankdir=LR;\n; for (size_t __i = 0; __i this-size(); ++__i) -(*this)[__i]._M_dot(__ostr, __i); + (*this)[__i]._M_dot(__ostr, __i); __ostr }\n; return __ostr; } diff --git a/libstdc++-v3/include/bits/regex_compiler.h b/libstdc++-v3/include/bits/regex_compiler.h index 4ac67df..b73fe30 100644 --- a/libstdc++-v3/include/bits/regex_compiler.h +++ b/libstdc++-v3/include/bits/regex_compiler.h @@ -39,7 +39,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @{ */ - templatetypename _TraitsT + templatetypename, bool, bool struct _BracketMatcher; /// Builds an NFA from an input iterator interval. @@ -63,7 +63,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef typename _ScannerT::_TokenT _TokenT; typedef _StateSeq_TraitsT_StateSeqT; typedef std::stack_StateSeqT, std::vector_StateSeqT _StackT; - typedef _BracketMatcher_TraitsT _BMatcherT; typedef std::ctypetypename _TraitsT::char_type_CtypeT; // accepts a specific token or returns false. @@ -91,20 +90,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool _M_bracket_expression(); - void - _M_expression_term(_BMatcherT __matcher); + templatebool __icase, bool __collate + void + _M_insert_any_matcher_ecma(); - bool - _M_range_expression(_BMatcherT __matcher); + templatebool __icase, bool __collate + void + _M_insert_any_matcher_posix(); - bool - _M_collating_symbol(_BMatcherT __matcher); + templatebool __icase, bool __collate + void + _M_insert_char_matcher(); - bool - _M_equivalence_class(_BMatcherT __matcher); + templatebool __icase, bool __collate + void + _M_insert_character_class_matcher(); - bool - _M_character_class(_BMatcherT __matcher); + templatebool __icase, bool __collate + void + _M_insert_bracket_matcher(bool __neg); + + templatebool __icase, bool __collate + void + _M_expression_term(_BracketMatcher_TraitsT, __icase, __collate + __matcher); int _M_cur_int_value(int __radix); @@ -148,16 +157,110 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __compile_nfa(__cfirst, __cfirst + __len, __traits, __flags); } - templatetypename _TraitsT, bool __is_ecma -struct _AnyMatcher + // [28.13.14] + templatetypename _TraitsT, bool __icase, bool __collate +class _RegexTranslator { - typedef typename _TraitsT::char_type _CharT; +public: + typedef typename _TraitsT::char_type _CharT; + typedef typename _TraitsT::string_type _StringT; + typedef typename std::conditional__collate, + _StringT, + _CharT::type _StrTransT; explicit - _AnyMatcher(const _TraitsT __traits) + _RegexTranslator(const _TraitsT __traits) : _M_traits(__traits) { } + _CharT + _M_translate(_CharT __ch) const + { + if (__icase) + return _M_traits.translate_nocase(__ch); + else if (__collate) + return _M_traits.translate(__ch); + else + return __ch; + } + + _StrTransT + _M_transform(_CharT __ch) const + { + return _M_transform_impl(__ch, typename integral_constantbool, +__collate::type()); + } + +private: + _StrTransT + _M_transform_impl(_CharT __ch, false_type) const + { return __ch; } + +
Re: [RFA][PATCH][middle-end/53623] Improve extension elimination
On 01/08/14 01:14, Eric Botcazou wrote: Committed after private email approval from Jakub. I made one additional trivial change (missing whitespace in a comment). This breaks bootstrap with RTL checking enabled: /home/eric/svn/gcc/libgcc/config/libbid/bid64_noncomp.c:119:1: internal compiler error: RTL check: expected code 'set' or 'clobber', have 'parallel' in combine_reaching_defs, at ree.c:711 There were two issues in that code. The first assumed the form of DEF_INSN was (set (dest) (src)), the second assumed that the destination must be a reg before checking its REGNO. ree.c already had some code which effectively defined the form that the defining insn could take. It's not quite single_set, though I'd really prefer that be the form in the future. Anyway, I pulled that code out of merge_def_and_ext so that it could also be used by combine_reaching_defs. With that I was able to bootstrap regression test with --enable-checking=rtl as well as a normal bootstrap and regression test on x86_64-unknown-linux-gnu. OK for the trunk? * ree.c (get_sub_rtx): New function, extracted from... (merge_def_and_ext): Here. (combine_reaching_defs): Use get_sub_rtx. diff --git a/gcc/ree.c b/gcc/ree.c index ec09c7a..b41e891 100644 --- a/gcc/ree.c +++ b/gcc/ree.c @@ -580,27 +580,17 @@ make_defs_and_copies_lists (rtx extend_insn, const_rtx set_pat, return ret; } -/* Merge the DEF_INSN with an extension. Calls combine_set_extension - on the SET pattern. */ - -static bool -merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state) +static rtx * +get_sub_rtx (rtx def_insn) { - enum machine_mode ext_src_mode; - enum rtx_code code; - rtx *sub_rtx; - rtx s_expr; - int i; - - ext_src_mode = GET_MODE (XEXP (SET_SRC (cand-expr), 0)); - code = GET_CODE (PATTERN (def_insn)); - sub_rtx = NULL; + enum rtx_code code = GET_CODE (PATTERN (def_insn)); + rtx *sub_rtx = NULL; if (code == PARALLEL) { - for (i = 0; i XVECLEN (PATTERN (def_insn), 0); i++) + for (int i = 0; i XVECLEN (PATTERN (def_insn), 0); i++) { - s_expr = XVECEXP (PATTERN (def_insn), 0, i); + rtx s_expr = XVECEXP (PATTERN (def_insn), 0, i); if (GET_CODE (s_expr) != SET) continue; @@ -609,7 +599,7 @@ merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state) else { /* PARALLEL with multiple SETs. */ - return false; + return NULL; } } } @@ -618,10 +608,27 @@ merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state) else { /* It is not a PARALLEL or a SET, what could it be ? */ - return false; + return NULL; } gcc_assert (sub_rtx != NULL); + return sub_rtx; +} + +/* Merge the DEF_INSN with an extension. Calls combine_set_extension + on the SET pattern. */ + +static bool +merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state) +{ + enum machine_mode ext_src_mode; + rtx *sub_rtx; + + ext_src_mode = GET_MODE (XEXP (SET_SRC (cand-expr), 0)); + sub_rtx = get_sub_rtx (def_insn); + + if (sub_rtx == NULL) +return false; if (REG_P (SET_DEST (*sub_rtx)) (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode @@ -707,8 +714,13 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) /* If there is an overlap between the destination of DEF_INSN and CAND-insn, then this transformation is not safe. Note we have to test in the widened mode. */ + rtx *dest_sub_rtx = get_sub_rtx (def_insn); + if (dest_sub_rtx == NULL + || !REG_P (SET_DEST (*dest_sub_rtx))) + return false; + rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand-insn))), -REGNO (SET_DEST (PATTERN (def_insn; +REGNO (SET_DEST (*dest_sub_rtx))); if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand-insn return false;
Re: [RFA][PATCH][middle-end/53623] Improve extension elimination
On Wed, Jan 08, 2014 at 04:02:17PM -0700, Jeff Law wrote: * ree.c (get_sub_rtx): New function, extracted from... (merge_def_and_ext): Here. (combine_reaching_defs): Use get_sub_rtx. --- a/gcc/ree.c +++ b/gcc/ree.c @@ -580,27 +580,17 @@ make_defs_and_copies_lists (rtx extend_insn, const_rtx set_pat, return ret; } -/* Merge the DEF_INSN with an extension. Calls combine_set_extension - on the SET pattern. */ - -static bool -merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state) +static rtx * +get_sub_rtx (rtx def_insn) Please add a function comment for it (perhaps saying that it is like single_set but never allows more than one SET). Ok with that change. Jakub
RE: [PATCH] Fix PR58115
Hi, On Tue, 7 Jan 2014 15:10:20, Richard Biener wrote: On Tue, Jan 7, 2014 at 1:12 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Bernd Edlinger bernd.edlin...@hotmail.de writes: How about this patch for the big comment? The comment should say that target_set_current_function() cannot call target_reinit() because: target_reinit()=lang_dependent_init_target() =init_optabs()=init_all_optabs(this_fn_optabs); uses this_fn_optabs which is undefined here. However many targets (nios2, rx, i386, rs6000) do exactly that. Is there currently any target, that sets this_target_optab in the target_set_current_function? MIPS :-) (via save_target_globals_default_opts=save_target_globals) I think other targets need to do the same thing in order for tests like that extended intrinsics_4.c to work. How does this patch look? Tested on x86_64-linux-gnu. I didn't remove save_target_globals_default_opts because there the temporary optimization_current_node also protects a call to init_reg_sets. Well, if it works the patch is ok. You're way more familiar with the details of this machinery. Richard. I found another test case that still fails with today's trunk: #include immintrin.h __m256 a[10], b[10], c[10]; void __attribute__((target (sse2), optimize (3))) foo (void) { } void __attribute__((target (avx), optimize (3))) bar (void) { a[0] = _mm256_and_ps (b[0], c[0]); } compile with i686-pc-linux-gnu-gcc -O2 -msse2 -mno-avx -S The attached patch seems to fix this test case for targets that do not have SWITCHABLE_TARGET. What do you think about it? I think Jakub's patch will fix this case, but I did not try. However even if the i368 is now clean, there are still many targets that use target_reinit() in target_set_current_function. Bernd. Thanks, Richard gcc/ PR target/58115 * target-globals.c (save_target_globals): Remove this_fn_optab handling. * toplev.c: Include optabs.h. (target_reinit): Temporarily restore the global options if another set of options are in force. gcc/testsuite/ * gcc.target/i386/intrinsics_4.c (bar): New function. Index: gcc/target-globals.c === --- gcc/target-globals.c 2014-01-02 22:16:03.042278971 + +++ gcc/target-globals.c 2014-01-07 12:08:33.569900970 + @@ -68,7 +68,6 @@ struct target_globals * save_target_globals (void) { struct target_globals *g; - struct target_optabs *saved_this_fn_optabs = this_fn_optabs; g = ggc_alloc_target_globals (); g-flag_state = XCNEW (struct target_flag_state); @@ -88,10 +87,8 @@ save_target_globals (void) g-bb_reorder = XCNEW (struct target_bb_reorder); g-lower_subreg = XCNEW (struct target_lower_subreg); restore_target_globals (g); - this_fn_optabs = this_target_optabs; init_reg_sets (); target_reinit (); - this_fn_optabs = saved_this_fn_optabs; return g; } Index: gcc/toplev.c === --- gcc/toplev.c 2014-01-07 08:11:43.888058805 + +++ gcc/toplev.c 2014-01-07 12:10:19.448096479 + @@ -78,6 +78,7 @@ Software Foundation; either version 3, o #include diagnostic-color.h #include context.h #include pass_manager.h +#include optabs.h #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO) #include dbxout.h @@ -1752,6 +1753,23 @@ target_reinit (void) { struct rtl_data saved_x_rtl; rtx *saved_regno_reg_rtx; + tree saved_optimization_current_node; + struct target_optabs *saved_this_fn_optabs; + + /* Temporarily switch to the default optimization node, so that + *this_target_optabs is set to the default, not reflecting + whatever a previous function used for the optimize + attribute. */ + saved_optimization_current_node = optimization_current_node; + saved_this_fn_optabs = this_fn_optabs; + if (saved_optimization_current_node != optimization_default_node) + { + optimization_current_node = optimization_default_node; + cl_optimization_restore + (global_options, + TREE_OPTIMIZATION (optimization_default_node)); + } + this_fn_optabs = this_target_optabs; /* Save *crtl and regno_reg_rtx around the reinitialization to allow target_reinit being called even after prepare_function_start. */ @@ -1769,7 +1787,16 @@ target_reinit (void) /* Reinitialize lang-dependent parts. */ lang_dependent_init_target (); - /* And restore it at the end, as free_after_compilation from + /* Restore the original optimization node. */ + if (saved_optimization_current_node != optimization_default_node) + { + optimization_current_node = saved_optimization_current_node; + cl_optimization_restore (global_options, + TREE_OPTIMIZATION (optimization_current_node)); + } + this_fn_optabs = saved_this_fn_optabs; + + /* Restore regno_reg_rtx at the end, as free_after_compilation from expand_dummy_function_end clears it. */ if (saved_regno_reg_rtx) { Index: gcc/testsuite/gcc.target/i386/intrinsics_4.c
Re: [GOOGLE] Remove mod_id_to_name map
Ok. David On Wed, Jan 8, 2014 at 10:58 AM, Dehao Chen de...@google.com wrote: This patch removes mod_id_to_name map because the info is already there in module_infos. And also, AutoFDO don't have access to update this map because its a file-static structure. Bootstrapped and passed regression test. OK for google branch? Thanks, Dehao Index: gcc/coverage.c === --- gcc/coverage.c (revision 206366) +++ gcc/coverage.c (working copy) @@ -615,37 +615,17 @@ reorder_module_groups (const char *imports_file, u module_name_tab.dispose (); } -typedef struct { - unsigned int mod_id; - const char *mod_name; -} mod_id_to_name_t; - -static vecmod_id_to_name_t *mod_names; - -static void -record_module_name (unsigned int mod_id, const char *name) -{ - mod_id_to_name_t t; - - t.mod_id = mod_id; - t.mod_name = xstrdup (name); - if (!mod_names) -vec_alloc (mod_names, 10); - mod_names-safe_push (t); -} - /* Return the module name for module with MOD_ID. */ const char * get_module_name (unsigned int mod_id) { size_t i; - mod_id_to_name_t *elt; - for (i = 0; mod_names-iterate (i, elt); i++) + for (i = 0; i num_in_fnames; i++) { - if (elt-mod_id == mod_id) -return elt-mod_name; + if (module_infos[i]-ident == mod_id) +return lbasename (module_infos[i]-source_filename); } gcc_assert (0); @@ -927,9 +907,6 @@ read_counts_file (const char *da_file_name, unsign } } - record_module_name (mod_info-ident, - lbasename (mod_info-source_filename)); - if (dump_enabled_p ()) { dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
Re: [PATCH] Fix for PR 59524
On 01/08/14 14:16, Iyer, Balaji V wrote: Hello Everyone, Attached, please find a patch will fix the bug mentioned in PR 59524. The main issue was that Cilk keywords tests are running even when the user configured the compiler with --disable-libcilkrts. This patch should fix this issue for C and C++. This is tested on x86 and x86_64. Here are the ChangeLog entries gcc/testsuite/ChangeLog +2014-01-08 Balaji V. Iyer balaji.v.i...@intel.com + + PR testsuite/59524 + * gcc.dg/cilk-plus/cilk-plus.exp: Make sure the cilk keywords tests + are run only if the Cilk library is available/enabled. + * g++.dg/cilk-plus/cilk-plus.exp: Likewise. + * lib/target-supports.exp (check_libcilkrts_available): New function. + Is this Ok for trunk? Yes. jeff