[Fortran, Patch, committed] Update {gfortran,intrinsic}.texi refs to OpenMP 4
Committed as obvious as Rev. 211806. Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 211805) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,8 @@ +2014-06-18 Tobias Burnus bur...@net-b.de + + * gfortran.texi (OpenMP): Update refs to OpenMP 4.0. + * intrinsic.texi (OpenMP Modules): Ditto. + 2014-06-18 Jakub Jelinek ja...@redhat.com * cpp.c (cpp_define_builtins): Change _OPENMP macro to Index: gcc/fortran/gfortran.texi === --- gcc/fortran/gfortran.texi (Revision 211805) +++ gcc/fortran/gfortran.texi (Arbeitskopie) @@ -531,7 +531,7 @@ The current status of the support is can be found @ref{TS 29113 status} sections of the documentation. Additionally, the GNU Fortran compilers supports the OpenMP specification -(version 3.1, @url{http://openmp.org/@/wp/@/openmp-specifications/}). +(version 4.0, @url{http://openmp.org/@/wp/@/openmp-specifications/}). @node Varying Length Character Strings @subsection Varying Length Character Strings @@ -1884,8 +1884,8 @@ It consists of a set of compiler directives, libra and environment variables that influence run-time behavior. GNU Fortran strives to be compatible to the -@uref{http://www.openmp.org/mp-documents/spec31.pdf, -OpenMP Application Program Interface v3.1}. +@uref{http://openmp.org/wp/openmp-specifications/, +OpenMP Application Program Interface v4.0}. To enable the processing of the OpenMP directive @code{!$omp} in free-form source code; the @code{c$omp}, @code{*$omp} and @code{!$omp} Index: gcc/fortran/intrinsic.texi === --- gcc/fortran/intrinsic.texi (Revision 211805) +++ gcc/fortran/intrinsic.texi (Arbeitskopie) @@ -13399,8 +13399,7 @@ named constants: @code{OMP_LIB} provides the scalar default-integer named constant @code{openmp_version} with a value of the form @var{mm}, where @code{} is the year and @var{mm} the month -of the OpenMP version; for OpenMP v3.1 the value is @code{201107} -and for OpenMP v4.0 the value is @code{201307}. +of the OpenMP version; for OpenMP v4.0 the value is @code{201307}. The following scalar integer named constants of the kind @code{omp_sched_kind}:
Re: [PATCH 5/5] add libcc1
Joseph == Joseph S Myers jos...@codesourcery.com writes: Tom This patch adds the plugin to the gcc tree and updates the Tom top-level configury. Following up on your review. Joseph I don't see anything obvious that would disable the plugin if Joseph plugins are unsupported (e.g. on Windows host) or disabled Joseph (--disable-plugin). Probably the relevant support from Joseph gcc/configure.ac needs to go somewhere it can be used at Joseph toplevel. I've moved some relevant code to a new .m4 file in config and used it from the plugin itself. This seemed simpler than dealing with it at the top level. The plugin also self-disables if its configury needs are not met. Tom + self-args.push_back (gcc); Joseph seems wrong - at least you should use the appropriate compiler Joseph name after transformation for cross compilers / Joseph --program-transform-name. Though really the *versioned* driver Joseph $(target_noncanonical)-gcc-$(version) is the right one to use, This turned out to be a pain :-) There are two basic problems. First, gdb gets the names of its architectures from BFD, which doesn't always use the same naming scheme as the GNU configury triplets. It does generally use the same names, but for x86 targets it differs quite a bit. Second, the configury triplets can vary in annoying ways that don't really affect correct operation. For example, i586- versus i686- (there is a difference, but I think ignorable given the compiler flags in the debuginfo, and anyway I suspect not discoverable by gdb); or -unknown- versus -pc- (completely irrelevant AFAIK); or even x86_64-redhat-linux versus x86_64-unknown-linux-gnu (seemingly gratuitous). In the end I added some code to gdb and to libcc1.so to construct a regexp matching plausible results and then search $PATH for matches. Which seems rather gross, but workable in reasonable scenarios. I didn't try to apply the program transform name. I suppose I could apply it to the final gcc component of the name, though, without much trouble. I'll fix this up tomorrow. Let me know if you have any issue with the above. Barring that, I will be resubmitting the series soon, most likely this week. thanks, Tom
C++ PATCH for c++/61507 (variadics and explicit template args)
In this bug we were throwing away the explicit args when doing nested unification for a function, with the result that we didn't remember them when proceeding to do unification for the trailing arguments. Fixed by preserving ARGUMENT_PACK_EXPLICIT_ARGS. Tested x86_64-pc-linux-gnu, applying to trunk. commit 00e76793666566b604903930c77d4a644ec74a12 Author: Jason Merrill ja...@redhat.com Date: Tue Jun 17 16:35:57 2014 +0200 PR c++/61507 * pt.c (resolve_overloaded_unification): Preserve ARGUMENT_PACK_EXPLICIT_ARGS. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index d5cc257..f0a598b 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -16838,7 +16838,16 @@ resolve_overloaded_unification (tree tparms, int i = TREE_VEC_LENGTH (targs); for (; i--; ) if (TREE_VEC_ELT (tempargs, i)) - TREE_VEC_ELT (targs, i) = TREE_VEC_ELT (tempargs, i); + { + tree old = TREE_VEC_ELT (targs, i); + tree new_ = TREE_VEC_ELT (tempargs, i); + if (new_ old ARGUMENT_PACK_P (old) + ARGUMENT_PACK_EXPLICIT_ARGS (old)) + /* Don't forget explicit template arguments in a pack. */ + ARGUMENT_PACK_EXPLICIT_ARGS (new_) + = ARGUMENT_PACK_EXPLICIT_ARGS (old); + TREE_VEC_ELT (targs, i) = new_; + } } if (good) return true; diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic159.C b/gcc/testsuite/g++.dg/cpp0x/variadic159.C new file mode 100644 index 000..2b14d30 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic159.C @@ -0,0 +1,14 @@ +// PR c++/61507 +// { dg-do compile { target c++11 } } + +struct A { + void foo(const int ); + void foo(float); +}; + +template typename... Args +void bar(void (A::*memfun)(Args...), Args... args); + +void go(const int i) { + barconst int (A::foo, i); +}
C++ PATCH for c++/59296 (rvalue object and lvalue ref-qualifier)
We were treating a const member function like a normal const reference, and binding an rvalue object argument to it. But it doesn't work that way. Tested x86_64-pc-linux-gnu, applying to trunk. commit 20a165532a9b0b0dada391716a1fb781af3ec005 Author: Jason Merrill ja...@redhat.com Date: Wed Jun 18 22:56:25 2014 +0200 PR c++/59296 * call.c (add_function_candidate): Set LOOKUP_NO_RVAL_BIND for ref-qualifier handling. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 1d4c4f9..b4adf36 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -2025,6 +2025,8 @@ add_function_candidate (struct z_candidate **candidates, object parameter has reference type. */ bool rv = FUNCTION_RVALUE_QUALIFIED (TREE_TYPE (fn)); parmtype = cp_build_reference_type (parmtype, rv); + /* Don't bind an rvalue to a const lvalue ref-qualifier. */ + lflags |= LOOKUP_NO_RVAL_BIND; } else { diff --git a/gcc/testsuite/g++.dg/cpp0x/ref-qual15.C b/gcc/testsuite/g++.dg/cpp0x/ref-qual15.C new file mode 100644 index 000..ca333c2 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/ref-qual15.C @@ -0,0 +1,13 @@ +// PR c++/59296 +// { dg-do compile { target c++11 } } + +struct Type +{ + void get() const { } + void get() const { } +}; + +int main() +{ + Type{}.get(); +}
[patch committed] [SH] Fix build failure in libgomp
Hi, Trunk fails to build on sh4-unknown-linux-gnu with an ICE during compiling libgomp. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61550 for detail. sh.c:prepare_move_operands has the code for TLS addresses which shouldn't be run if reload in progress or done. The attached patch is to fix it. Committed on trunk. Regards, kaz -- 2014-06-18 Kaz Kojima kkoj...@gcc.gnu.org PR target/61550 * config/sh/sh.c (prepare_move_operands): Don't process TLS addresses here if reload in progress or completed. --- ORIG/trunk/gcc/config/sh/sh.c 2014-06-17 21:21:32.043445314 +0900 +++ trunk/gcc/config/sh/sh.c2014-06-18 08:26:27.846157153 +0900 @@ -1758,7 +1758,8 @@ prepare_move_operands (rtx operands[], e else opc = NULL_RTX; - if ((tls_kind = tls_symbolic_operand (op1, Pmode)) != TLS_MODEL_NONE) + if (! reload_in_progress ! reload_completed + (tls_kind = tls_symbolic_operand (op1, Pmode)) != TLS_MODEL_NONE) { rtx tga_op1, tga_ret, tmp, tmp2;
Re: [PATCH, aarch64] Fix 61545
On Tue, Jun 17, 2014 at 10:19:06PM -0700, Richard Henderson wrote: Trivial fix for missing clobber of the flags over the tlsdesc call. Ok for all branches? r~ * config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM. pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr could clobber CC as well. regards, Kyle diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a4d8887..1ee2cae 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3855,6 +3855,7 @@ (unspec:PTR [(match_operand 0 aarch64_valid_symref S)] UNSPEC_TLSDESC)) (clobber (reg:DI LR_REGNUM)) + (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:DI 1 =r))] TARGET_TLS_DESC adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, %L0\;.tlsdesccall\\t%0\;blr\\t%1
Re: [PATCH, aarch64] Fix 61545
On 06/18/2014 03:57 PM, Kyle McMartin wrote: pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr could clobber CC as well. As I replied in IRC, no, because tlsgd_small is modeled with an actual CALL_INSN, and thus call-clobbered registers work as normal. r~
Re: [PATCH, aarch64] Fix 61545
On Wed, Jun 18, 2014 at 04:04:53PM -0700, Richard Henderson wrote: On 06/18/2014 03:57 PM, Kyle McMartin wrote: pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr could clobber CC as well. As I replied in IRC, no, because tlsgd_small is modeled with an actual CALL_INSN, and thus call-clobbered registers work as normal. Ah, sorry I missed your reply. Makes sense. regards, Kyle
Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64
On 06/01/2014 03:00 AM, Tom de Vries wrote: +/* Emit call insn with PAT and do aarch64-specific handling. */ + +bool +aarch64_emit_call_insn (rtx pat) +{ + rtx insn = emit_call_insn (pat); + + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM)); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM)); +} + Which can't have been bootstrapped, since this has no return stmt. Why the bool return type anyway? Nothing appears to use it. r~
Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64
On 06/01/2014 03:00 AM, Tom de Vries wrote: +aarch64_emit_call_insn (rtx pat) +{ + rtx insn = emit_call_insn (pat); + + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM)); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM)); Actually, I'd like to know more about how this is supposed to work. Why are you only marking the two registers that would be used by a PLT entry, but not those clobbered by the ld.so trampoline, or indeed the unknown function that would be called from the PLT. Oh, I see, looking at the code we do actually follow the cgraph and make sure it is a direct call with a known destination. So, in fact, it's only the registers that could be clobbered by ld branch islands (so these two are still correct for aarch64). This means the documentation is actually wrong when it mentions PLTs at all. Do we in fact make sure this isn't an ifunc resolver? I don't immediately see how those get wired up in the cgraph... r~
Re: [PATCH, ARM] Enable fuse-caller-save for ARM
On 06/01/2014 04:27 AM, Tom de Vries wrote: + if (TARGET_AAPCS_BASED) +{ + /* For AAPCS, IP and CC can be clobbered by veneers inserted by the + linker. We need to add these to allow + arm_call_fusage_contains_non_callee_clobbers to return true. */ + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP_REGNUM)); + clobber_reg (fusage, gen_rtx_REG (word_mode, CC_REGNUM)); Why are you adding CC_REGNUM if fixed registers are automatically included? r~
Re: Update gcc.gnu.org/projects/gomp/
On Wed, 18 Jun 2014, Jakub Jelinek wrote: I've committed following change: Cool. h2Status/h2 dl +dtbJun 18, 2014/b/dt +ddpThe last major part of Fortran OpenMP v4.0 support has been +committed into SVN mainline./p/dd + dtbOct 11, 2013/b/dt ddpThe codegomp-4_0-branch/code has been merged into SVN -mainline, so GCC 4.9 and later will feature OpenMP v4.0 support./p/dd +mainline, so GCC 4.9 and later will feature OpenMP v4.0 support for +C and C++./p/dd Isn't that worth a note on our homepage as well? Gerald
Re: -fuse-caller-save - Collect register usage information
On 05/19/2014 07:30 AM, Tom de Vries wrote: + for (insn = get_insns (); insn != NULL_RTX; insn = next_insn (insn)) +{ + HARD_REG_SET insn_used_regs; + + if (!NONDEBUG_INSN_P (insn)) + continue; + + find_all_hard_reg_sets (insn, insn_used_regs, false); + + if (CALL_P (insn) +!get_call_reg_set_usage (insn, insn_used_regs, call_used_reg_set)) + { + CLEAR_HARD_REG_SET (node-function_used_regs); + return; + } + + IOR_HARD_REG_SET (node-function_used_regs, insn_used_regs); +} As an aside, wouldn't it work out better if we collect into a local variable instead of writing to memory here in node-function_used_regs each time? But not the main point... Let's suppose that we've got a rather large function, with only local calls for which we can acquire usage. Let's suppose that even one of those callees further calls something else, such that insn_used_regs == call_used_reg_set. We fill node-function_used_regs immediately, but keep scanning the rest of the large function. + + /* Be conservative - mark fixed and global registers as used. */ + IOR_HARD_REG_SET (node-function_used_regs, fixed_reg_set); + for (i = 0; i FIRST_PSEUDO_REGISTER; i++) +if (global_regs[i]) + SET_HARD_REG_BIT (node-function_used_regs, i); + +#ifdef STACK_REGS + /* Handle STACK_REGS conservatively, since the df-framework does not + provide accurate information for them. */ + + for (i = FIRST_STACK_REG; i = LAST_STACK_REG; i++) +SET_HARD_REG_BIT (node-function_used_regs, i); +#endif + + node-function_used_regs_valid = 1; Wouldn't it be better to compare the collected function_used_regs; if it contains all of call_used_reg_set, decline to set function_used_regs_valid. That way, we'll early exit from the above loop whenever we see that we can't improve over the default call-clobber set. Although perhaps function_used_regs_valid is no longer the best name in that case... r~
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Thursday, June 19, 2014 1:54 AM Seems there are actually two spots with this, not just one. Completely untested fix: 2014-06-18 Jakub Jelinek ja...@redhat.com * tree-ssa-math-opts.c (do_shift_rotate, find_bswap_or_nop_1): Cast 0xff to uint64_t before shifting it up. --- gcc/tree-ssa-math-opts.c 2014-06-13 08:08:42.354136356 +0200 +++ gcc/tree-ssa-math-opts.c 2014-06-18 19:50:59.486916201 +0200 @@ -1669,7 +1669,8 @@ do_shift_rotate (enum tree_code code, break; case RSHIFT_EXPR: /* Arithmetic shift of signed type: result is dependent on the value. */ - if (!TYPE_UNSIGNED (n-type) (n-n (0xff (bitsize - 8 + if (!TYPE_UNSIGNED (n-type) +(n-n ((uint64_t) 0xff (bitsize - 8 return false; n-n = count; break; @@ -1903,7 +1904,7 @@ find_bswap_or_nop_1 (gimple stmt, struct old_type_size = TYPE_PRECISION (n-type); if (!TYPE_UNSIGNED (n-type) type_size old_type_size - n-n (0xff (old_type_size - 8))) + n-n ((uint64_t) 0xff (old_type_size - 8))) return NULL_TREE; if (type_size / BITS_PER_UNIT (int)(sizeof (int64_t))) Yep, that's the right fix. I tested it on both a bootstrapped gcc on x86_64-linux-gnu and an arm-none-eabi cross-compiler with no regression on the testsuite. Jakub, since you made the patch, the honor of commiting it should be yours. Richard, given this issue, I think we should wait a few more days before I commit A backported (and fixed of course) version to 4.8 and 4.9. Best regards, Thomas
[Patch, Fortran, committed] PR61126 – fix wextra_1.f regression
Committed as Rev. 211766. See PR comments 10, 23 and 24 for the patch and the background. Thanks to Manuel and Dominque for the patch! Tobias 2014-06-18 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/61126 * options.c (gfc_handle_option): Remove call to handle_generated_option. 2014-06-18 Dominique d'Humieres domi...@lps.ens.fr PR fortran/61126 * gfortran.dg/wextra_1.f: Add -Wall to dg-options. diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c index a2b91ca..e4931f0 100644 --- a/gcc/fortran/options.c +++ b/gcc/fortran/options.c @@ -674,12 +674,7 @@ gfc_handle_option (size_t scode, const char *arg, int value, break; case OPT_Wextra: - handle_generated_option (global_options, global_options_set, - OPT_Wunused_parameter, NULL, value, - gfc_option_lang_mask (), kind, loc, - handlers, global_dc); set_Wextra (value); - break; case OPT_Wfunction_elimination: diff --git a/gcc/testsuite/gfortran.dg/wextra_1.f b/gcc/testsuite/gfortran.dg/wextra_1.f index 94c8edd..0eb28e1 100644 --- a/gcc/testsuite/gfortran.dg/wextra_1.f +++ b/gcc/testsuite/gfortran.dg/wextra_1.f @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options -Wextra } +! { dg-options -Wall -Wextra } program main integer, parameter :: x=3 ! { dg-warning Unused parameter } real :: a
Re: [Patch, Fortran, committed] PR61126 – fix wextra_1.f regression
Tobias Burnus wrote: Committed as Rev. 211766. See PR comments 10, 23 and 24 for the patch and the background. Thanks to Manuel and Dominque for the patch! And as follow up, I have committed the attached documentation patch. I think it is sufficient, even though it does not explicitly state that -Wall only works because -Wall implies -Wunused. Committed as Rev. 211767. Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 211766) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,9 @@ +2014-06-18 Tobias Burnus bur...@net-b.de + + PR fortran/61126 + * invoke.texi (-Wunused-parameter): Make clearer when + -Wextra implies this option. + 2014-06-18 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/61126 Index: gcc/fortran/invoke.texi === --- gcc/fortran/invoke.texi (Revision 211766) +++ gcc/fortran/invoke.texi (Arbeitskopie) @@ -911,7 +911,8 @@ Contrary to @command{gcc}'s meaning of @option{-Wu @command{gfortran}'s implementation of this option does not warn about unused dummy arguments (see @option{-Wunused-dummy-argument}), but about unused @code{PARAMETER} values. @option{-Wunused-parameter} -is not included in @option{-Wall} but is implied by @option{-Wall -Wextra}. +is implied by @option{-Wextra} if also @option{-Wunused} or +@option{-Wall} is used. @item -Walign-commons @opindex @code{Walign-commons}
Re: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants
On 10 June 2014 19:16, Steven Bosscher stevenb@gmail.com wrote: On Tue, Jun 10, 2014 at 11:23 AM, Zhenqiang Chen wrote: * loop-invariant.c (struct invariant): Add a new member: eqno; (find_identical_invariants): Update eqno; (create_new_invariant): Init eqno; (get_inv_cost): Compute comp_cost wiht eqno; (gain_for_invariant): Take spill cost into account. Look OK except ... @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, + IRA_LOOP_RESERVED_REGS - ira_class_hard_regs_num[cl]; if (size_cost 0) - return -1; + { + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl]; + if (comp_cost = spill_cost) + return -1; + + return 2; + } else size_cost = 0; } ... why return 2, instead of just falling through to return comp_cost - size_cost;? Thanks for the comments. Updated. As your comments for the previous patch, I should also check the overlap between reg classes. So I change the logic to check spill cost. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 6e43b49..af0c95b 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -104,6 +104,9 @@ struct invariant /* The number of the invariant with the same value. */ unsigned eqto; + /* The number of invariants which eqto this. */ + unsigned eqno; + /* If we moved the invariant out of the loop, the register that contains its value. */ rtx reg; @@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type eq, struct invariant *inv) struct invariant *dep; rtx expr, set; enum machine_mode mode; + struct invariant *tmp; if (inv-eqto != ~0u) return; @@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type eq, struct invariant *inv) mode = GET_MODE (expr); if (mode == VOIDmode) mode = GET_MODE (SET_DEST (set)); - inv-eqto = find_or_insert_inv (eq, expr, mode, inv)-invno; + + tmp = find_or_insert_inv (eq, expr, mode, inv); + inv-eqto = tmp-invno; + + if (tmp-invno != inv-invno inv-always_executed) +tmp-eqno++; if (dump_file inv-eqto != inv-invno) fprintf (dump_file, @@ -725,6 +734,10 @@ create_new_invariant (struct def *def, rtx insn, bitmap depends_on, inv-invno = invariants.length (); inv-eqto = ~0u; + + /* Itself. */ + inv-eqno = 1; + if (def) def-invno = inv-invno; invariants.safe_push (inv); @@ -1141,7 +1154,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, if (!inv-cheap_address || inv-def-n_addr_uses inv-def-n_uses) -(*comp_cost) += inv-cost; +(*comp_cost) += inv-cost * inv-eqno; #ifdef STACK_REGS { @@ -1249,7 +1262,7 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, unsigned *new_regs, unsigned regs_used, bool speed, bool call_p) { - int comp_cost, size_cost; + int comp_cost, size_cost = 0; enum reg_class cl; int ret; @@ -1273,6 +1286,8 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, { int i; enum reg_class pressure_class; + int spill_cost = 0; + int base_cost = target_spill_cost [speed]; for (i = 0; i ira_pressure_classes_num; i++) { @@ -1286,30 +1301,13 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, + LOOP_DATA (curr_loop)-max_reg_pressure[pressure_class] + IRA_LOOP_RESERVED_REGS ira_class_hard_regs_num[pressure_class]) - break; + { + spill_cost += base_cost * (int) regs_needed[pressure_class]; + size_cost = -1; + } } - if (i ira_pressure_classes_num) - /* There will be register pressure excess and we want not to - make this loop invariant motion. All loop invariants with - non-positive gains will be rejected in function - find_invariants_to_move. Therefore we return the negative - number here. - - One could think that this rejects also expensive loop - invariant motions and this will hurt code performance. - However numerous experiments with different heuristics - taking invariant cost into account did not confirm this - assumption. There are possible explanations for this - result: - o probably all expensive invariants were already moved out - of the loop by PRE and gimple invariant motion pass. - o expensive invariant execution will be hidden by insn - scheduling or OOO processor hardware because usually such - invariants have a lot of freedom to be executed - out-of-order. - Another reason for ignoring invariant cost vs spilling cost - heuristics is also in difficulties to evaluate accurately -
Re: [patch] improve sloc assignment on bind_expr entry/exit code
Hi Jeff, On Jun 17, 2014, at 22:42 , Jeff Law l...@redhat.com wrote: * tree-core.h (tree_block): Add an end_locus field, allowing memorization of the end of block source location. * tree.h (BLOCK_SOURCE_END_LOCATION): New accessor. * gimplify.c (gimplify_bind_expr): Propagate the block start and end source location info we have on the block entry/exit code we generate. OK. Great, thanks! :-) I assume y'all will add a suitable test to the Ada testsuite and propagate it into the GCC testsuite in due course? Yes, I will. At the patch submission time, I was unclear on what dejagnu device was available to setup a reliable testing protocol for this kind of issue and I was interested in getting feedback on the patch contents first. ISTM that dg-scan-asm for the expected extra .loc's would work, maybe restricted to some target we know produces .loc directives. Sounds appropriate ? Thanks again for your feedback, Olivier
[PATCH] Make sure cfg-cleanup runs
This makes sure we run cfg-cleanup when we propagate into PHI nodes or on the FRE/PRE side remove any stmt. Otherwise we can end up with not removed forwarder blocks. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-06-18 Richard Biener rguent...@suse.de * tree-ssa-propagate.c (replace_phi_args_in): Return whether we propagated anything. (substitute_and_fold_dom_walker::before_dom_children): Something changed if we propagated into PHI arguments. * tree-ssa-pre.c (eliminate): Always schedule cfg-cleanup if we removed a stmt. Index: gcc/tree-ssa-propagate.c === --- gcc/tree-ssa-propagate.c(revision 211738) +++ gcc/tree-ssa-propagate.c(working copy) @@ -964,7 +964,7 @@ replace_uses_in (gimple stmt, ssa_prop_g /* Replace propagated values into all the arguments for PHI using the values from PROP_VALUE. */ -static void +static bool replace_phi_args_in (gimple phi, ssa_prop_get_value_fn get_value) { size_t i; @@ -1015,6 +1015,8 @@ replace_phi_args_in (gimple phi, ssa_pro fprintf (dump_file, \n); } } + + return replaced; } @@ -1066,7 +1068,7 @@ substitute_and_fold_dom_walker::before_d continue; } } - replace_phi_args_in (phi, get_value_fn); + something_changed |= replace_phi_args_in (phi, get_value_fn); } /* Propagate known values into stmts. In some case it exposes Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 211738) +++ gcc/tree-ssa-pre.c (working copy) @@ -4521,11 +4521,7 @@ eliminate (bool do_pre) gsi = gsi_for_stmt (stmt); if (gimple_code (stmt) == GIMPLE_PHI) - { - remove_phi_node (gsi, true); - /* Removing a PHI node in a block may expose a forwarder block. */ - el_todo |= TODO_cleanup_cfg; - } + remove_phi_node (gsi, true); else { basic_block bb = gimple_bb (stmt); @@ -4534,6 +4530,9 @@ eliminate (bool do_pre) bitmap_set_bit (need_eh_cleanup, bb-index); release_defs (stmt); } + + /* Removing a stmt may expose a forwarder block. */ + el_todo |= TODO_cleanup_cfg; } el_to_remove.release ();
Re: [PATCH, aarch64] Fix 61545
On 18/06/14 06:19, Richard Henderson wrote: Trivial fix for missing clobber of the flags over the tlsdesc call. Ok for all branches? OK. R. r~ * config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM. z diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a4d8887..1ee2cae 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3855,6 +3855,7 @@ (unspec:PTR [(match_operand 0 aarch64_valid_symref S)] UNSPEC_TLSDESC)) (clobber (reg:DI LR_REGNUM)) + (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:DI 1 =r))] TARGET_TLS_DESC adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, %L0\;.tlsdesccall\\t%0\;blr\\t%1
Re: [PATCH 1/5] New Identical Code Folding IPA pass
On 06/17/2014 10:14 PM, David Malcolm wrote: On Fri, 2014-06-13 at 12:24 +0200, mliska wrote: [...snip...] Statistics about the pass: Inkscape: 11.95 MB - 11.44 MB (-4.27%) Firefox: 70.12 MB - 70.12 MB (-3.07%) FWIW, you wrote 70.12 MB here for both before and after for Firefox, but give a -3.07% change, which seems like a typo. A 3.07% reduction from 70.12 MB would be 67.97 MB; was this what the pass achieved? Hi, it's typo, original size of FF is 72.34 MB. I hope -3.07% is the correctly evaluated achievement. Thanks, Martin [...snip...] Thanks (nice patch, btw) Dave
Re: [patch] improve sloc assignment on bind_expr entry/exit code
On Jun 18, 2014, at 09:42 , Olivier Hainque hain...@adacore.com wrote: I assume y'all will add a suitable test to the Ada testsuite and propagate it into the GCC testsuite in due course? ISTM that dg-scan-asm for the expected extra .loc's would work, maybe restricted to some target we know produces .loc directives. Sounds appropriate ? Ah, we already have one test doing exactly that (return3.adb). I'll just add one. With Kind Regards, Olivier
[PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO
Hi, With LTO, -fno-short-enums is ignored, resulting in ABI mis-matching in linking. Refer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61123 for details. This patch add fshort-enums and fshout-wchar to LTO group. To check it, a new procedure object-readelf is added in testsuite/lib/lto.exp and new lto tests are added in gcc.target/arm/lto. Bootstrap and no make check regression on X86-64. Patch also attached for convenience. Is It ok for trunk? Thanks and Best Regards, Hale Wang c-family/ChangeLog 2014-06-18 Hale Wang hale.w...@arm.com PR lto/61123 *c.opt (fshort-enums): Add to LTO. *c.opt (fshort-wchar): Likewise. testsuite/ChangeLog 2014-06-18 Hale Wang hale.w...@arm.com * gcc.target/arm/lto/: New folder to verify the LTO option for ARM specific. * gcc.target/arm/lto/pr61123-enum-size_0.c: New test case. * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise. * gcc.target/arm/lto/lto.exp: New exp file used to test LTO option for ARM specific. * lib/lto.exp (object-readelf): New procedure used to catch the enum size in the final executable. Index: gcc/c-family/c.opt === --- gcc/c-family/c.opt (revision 211394) +++ gcc/c-family/c.opt (working copy) @@ -1189,11 +1189,11 @@ Use the same size for double as for float fshort-enums -C ObjC C++ ObjC++ Optimization Var(flag_short_enums) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums) Use the narrowest integer type possible for enumeration types fshort-wchar -C ObjC C++ ObjC++ Optimization Var(flag_short_wchar) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_wchar) Force the underlying type for \wchar_t\ to be \unsigned short\ fsigned-bitfields Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c === --- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c (revision 0) +++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -fno-short-enums -Wl,-Ur,--no-enum-size-warning -Os -nostdlib -flto } } } */ + +#include stdlib.h + +enum enum_size_attribute +{ + small_size, int_size +}; + +struct debug_ABI_enum_size +{ + enum enum_size_attribute es; +}; + +int +foo1 (struct debug_ABI_enum_size *x) +{ + return sizeof (x-es); +} + +/* { dg-final { object-readelf Tag_ABI_enum_size int { target arm_eabi } } } */ Index: gcc/testsuite/gcc.target/arm/lto/lto.exp === --- gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0) +++ gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0) @@ -0,0 +1,59 @@ +# Copyright (C) 2009-2014 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. +# +# Contributed by Diego Novillo dnovi...@google.com + + +# Test link-time optimization across multiple files. +# +# Programs are broken into multiple files. Each one is compiled +# separately with LTO information. The final executable is generated +# by collecting all the generated object files using regular LTO or WHOPR. + +if $tracelevel then { + strace $tracelevel +} + +# Load procedures from common libraries. +load_lib standard.exp +load_lib gcc.exp + +# Load the language-independent compabibility support procedures. +load_lib lto.exp + +# If LTO has not been enabled, bail. +if { ![check_effective_target_lto] } { + return +} + +gcc_init +lto_init no-mathlib + +# Define an identifier for use with this suite to avoid name conflicts +# with other lto tests running at the same time. +set sid c_lto + +# Main loop. +foreach src [lsort [find $srcdir/$subdir *_0.c]] { + # If we're only testing specific files and this isn't one of them, skip it. + if ![runtest_file_p $runtests $src] then { + continue + } + + lto-execute $src $sid +} + +lto_finish Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_1.c === --- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_1.c (revision 0) +++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_1.c (revision 0) @@ -0,0 +1,5 @@ +int +foo2 (int
[PATCH, PR 61540] Do not ICE on impossible devirtualization
Hi, I was quite surprised we still had an assert checking that the target of a virtual call derived by ipa-cp is among possible ones derived by ipa-devirt. This is not true for various down-casts and I managed to trigger it in PR 61540 (where the testcase purposefully invokes undefined behavior but we should not ICE). Fixed thusly. Bootstrapped and tested on x86_64-linux. OK for trunk and the 4.9 branch? Thanks, Martin 2014-06-17 Martin Jambor mjam...@suse.cz PR ipa/61540 * ipa-prop.c (impossible_devirt_target): New function. (try_make_edge_direct_virtual_call): Use it, also instead of asserting. testsuite/ * g++.dg/ipa/pr61540.C: New test. diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index b67deed..f5ec67a 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -2912,6 +2912,29 @@ try_make_edge_direct_simple_call (struct cgraph_edge *ie, return cs; } +/* Return the target to be used in cases of impossible devirtualization. IE + and target (the latter can be NULL) are dumped when dumping is enabled. */ + +static tree +impossible_devirt_target (struct cgraph_edge *ie, tree target) +{ + if (dump_file) +{ + if (target) + fprintf (dump_file, +Type inconsident devirtualization: %s/%i-%s\n, +ie-caller-name (), ie-caller-order, +IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (target))); + else + fprintf (dump_file, +No devirtualization target in %s/%i\n, +ie-caller-name (), ie-caller-order); +} + tree new_target = builtin_decl_implicit (BUILT_IN_UNREACHABLE); + cgraph_get_create_node (new_target); + return new_target; +} + /* Try to find a destination for indirect edge IE that corresponds to a virtual call based on a formal parameter which is described by jump function JFUNC and if it can be determined, make it direct and return the direct edge. @@ -2946,15 +2969,7 @@ try_make_edge_direct_virtual_call (struct cgraph_edge *ie, DECL_FUNCTION_CODE (target) == BUILT_IN_UNREACHABLE) || !possible_polymorphic_call_target_p (ie, cgraph_get_node (target))) - { - if (dump_file) - fprintf (dump_file, -Type inconsident devirtualization: %s/%i-%s\n, -ie-caller-name (), ie-caller-order, -IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (target))); - target = builtin_decl_implicit (BUILT_IN_UNREACHABLE); - cgraph_get_create_node (target); - } + target = impossible_devirt_target (ie, target); return ipa_make_edge_direct_to_target (ie, target); } } @@ -2984,10 +2999,7 @@ try_make_edge_direct_virtual_call (struct cgraph_edge *ie, if (targets.length () == 1) target = targets[0]-decl; else - { - target = builtin_decl_implicit (BUILT_IN_UNREACHABLE); - cgraph_get_create_node (target); - } + target = impossible_devirt_target (ie, NULL_TREE); } else { @@ -3002,10 +3014,8 @@ try_make_edge_direct_virtual_call (struct cgraph_edge *ie, if (target) { -#ifdef ENABLE_CHECKING - gcc_assert (possible_polymorphic_call_target_p -(ie, cgraph_get_node (target))); -#endif + if (!possible_polymorphic_call_target_p (ie, cgraph_get_node (target))) + return ipa_make_edge_direct_to_target (ie, target); return ipa_make_edge_direct_to_target (ie, target); } else diff --git a/gcc/testsuite/g++.dg/ipa/pr61540.C b/gcc/testsuite/g++.dg/ipa/pr61540.C new file mode 100644 index 000..d298964 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr61540.C @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -fno-early-inlining -fdump-ipa-cp } */ + +struct data { + data(int) {} +}; + +struct top { + virtual int topf() {} +}; + +struct intermediate: top { +int topf() /* override */ { return 0; } +}; + +struct child1: top { +void childf() +{ +data d(topf()); +} +}; + +struct child2: intermediate {}; + +void test(top t) +{ +child1 c = static_castchild1(t); +c.childf(); +child2 d; +test(d); +} + +int main (int argc, char **argv) +{ + child1 c; + test (c); + return 0; +} + +/* { dg-final { scan-ipa-dump Type inconsident devirtualization cp } } */ +/* { dg-final { cleanup-ipa-dump cp } } */
[PATCH] pass cleanups
This removes the special dce_loop pass in favor of dealing with scev and niter estimates in dce generally. Likewise it makes copyprop always cleanup after itself, dealing with scev and niter estimates. It also makes copyprop not unconditionally schedule a cfg-cleanup but only do so if copyprop did any transform. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-06-18 Richard Biener rguent...@suse.de * tree-pass.h (make_pass_dce_loop): Remove. * passes.def: Replace pass_dce_loop with pass_dce. * tree-ssa-dce.c (perform_tree_ssa_dce): If something changed free niter estimates and reset the scev cache. (tree_ssa_dce_loop, pass_data_dce_loop, pass_dce_loop, make_pass_dce_loop): Remove. * tree-ssa-copy.c: Include tree-ssa-loop-niter.h. (fini_copy_prop): Return whether something changed. Always let substitute_and_fold perform DCE and free niter estimates and reset the scev cache if so. (execute_copy_prop): If sth changed schedule cleanup-cfg. (pass_data_copy_prop): Do not unconditionally schedule cleanup-cfg or update-ssa. Index: gcc/tree-pass.h === *** gcc/tree-pass.h (revision 211738) --- gcc/tree-pass.h (working copy) *** extern gimple_opt_pass *make_pass_build_ *** 382,388 extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt); - extern gimple_opt_pass *make_pass_dce_loop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt); --- 382,387 Index: gcc/passes.def === *** gcc/passes.def (revision 211738) --- gcc/passes.def (working copy) *** along with GCC; see the file COPYING3. *** 203,209 NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce_loop); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); --- 206,212 NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); *** along with GCC; see the file COPYING3. *** 215,221 NEXT_PASS (pass_graphite_transforms); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce_loop); POP_INSERT_PASSES () NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); --- 218,224 NEXT_PASS (pass_graphite_transforms); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce); POP_INSERT_PASSES () NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); *** along with GCC; see the file COPYING3. *** 224,230 Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) ! NEXT_PASS (pass_dce_loop); POP_INSERT_PASSES () NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); --- 227,233 Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) ! NEXT_PASS (pass_dce); POP_INSERT_PASSES () NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); Index: gcc/tree-ssa-dce.c === *** gcc/tree-ssa-dce.c (revision 211738) --- gcc/tree-ssa-dce.c (working copy) *** perform_tree_ssa_dce (bool aggressive) *** 1479,1485 tree_dce_done (aggressive); if (something_changed) ! return TODO_update_ssa | TODO_cleanup_cfg; return 0; } --- 1479,1490 tree_dce_done (aggressive); if (something_changed) ! { ! free_numbers_of_iterations_estimates (); ! if (scev_initialized_p) ! scev_reset (); ! return TODO_update_ssa | TODO_cleanup_cfg; ! } return 0; } *** tree_ssa_dce (void) *** 1491,1509 } static unsigned int - tree_ssa_dce_loop (void) - { - unsigned int todo; - todo = perform_tree_ssa_dce (/*aggressive=*/false); - if (todo) - { - free_numbers_of_iterations_estimates (); -
Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg
On 10/06/14 12:42, Terry Guo wrote: Hi There, The thumb1_reorg function use macro INSN_CODE to find expected instructions. But the macro INSN_CODE doesn’t work for label type instruction. The INSN_CODE(label_insn) will return the label number. When we have a lot of labels and current label_insn is the first insn of basic block, the INSN_CODE(label_insn) could accidentally equal to CODE_FOR_cbranchsi4_insn in this case. This leads to ICE due to SET_SRC(label_insn) in subsequent code. In general we should skip all such improper insns. This is the purpose of attached small patch. Some failures in recent gcc regression test on thumb1 target are caused by this reason. So with this patch, all of them passed and no new failures. Is it ok to trunk? BR, Terry 2014-06-10 Terry Guo terry@arm.com * config/arm/arm.c (thumb1_reorg): Move to next basic block if the head of current basic block isn’t a proper insn. I think you should just test that insn != BB_HEAD (bb). The loop immediately above this deals with the !NON-DEBUG insns, so the logic is confusing the way you've written it. R. thumb1-reorg-v2.txt diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index ccad548..3ebe424 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16939,7 +16939,8 @@ thumb1_reorg (void) insn = PREV_INSN (insn); /* Find the last cbranchsi4_insn in basic block BB. */ - if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) + if (!NONDEBUG_INSN_P (insn) + || INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) continue; /* Get the register with which we are comparing. */
Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)
On Tue, 17 Jun 2014, Jeff Law wrote: On 06/17/14 07:07, Richard Biener wrote: I felt that -ftree-XXX is bad naming so I went for -fssa-XXX even if that is now inconsistent. Any optinion here? For RTL we simply have unsuffixed names so shall we instead go for -fphiopt? PHI implies SSA anyway and 'SSA' or 'RTL' is an implementation detail that the user should not be interested in (applies to tree- as well, of course). Now, 'phiopt' is a bad name when thinking of users (but they shouldn't play with those options anyway). Our flags are a mess. If I put my user hat on, then I'd have to ask the question, why would I care about tree, ssa, or even phis. The pass converts branchy code into straightline code. So, arguably, the right name would reflect that it changes branchy code to straight line code. Yeah, but we have so many of those ... well, ideally the user wouldn't be able to disable random passes with a non-debug option (and we have -fdisable-tree-XXX to disable individual pass instances). But I believe most of our flag names are poor in this regard (and I'm as much to blame as anyone). So go with your best judgement IMHO. Indeed. It'd be nice to have some testcases here to show why we want this moved earlier so that a few years from now when someone else wants to move it back, we can say umm, see test frobit.c, make that work and you can move it back :-) Hmm, yeah. But it's really doing this earlier so it would probably invoke inliner heuristics and -Os ... I'll try to come up with sth. For now I have committed the new flag related changes. Richard.
Re: [PATCH 1/5] New Identical Code Folding IPA pass
On 06/17/2014 10:09 PM, Paolo Carlini wrote: Hi, On 13/06/14 12:24, mliska wrote: The optimization is inspired by Microsoft /OPT:ICF optimization (http://msdn.microsoft.com/en-us/library/bxwfs976.aspx) that merges COMDAT sections with each function reside in a separate section. In terms of C++ testcases, I'm wondering if you already double checked that the new pass already does well on the typical examples on which, I was told, the Microsoft optimization is known to do well, eg, code instantiating std::vector for different pointer types, or even long and long long on x86_64-linux, things like that. I've just added another C++ test case: #include vector using namespace std; static vectorvectorint * a; static vectorvoid * b; int main() { return b.size() + a.size (); } where the pass identifies following equality: Semantic equality hit:std::vector_Tp, _Alloc::size_type std::vector_Tp, _Alloc::size() const [with _Tp = std::vectorint*; _Alloc = std::allocatorstd::vectorint*; std::vector_Tp, _Alloc::size_type = long unsigned int]-std::vector_Tp, _Alloc::size_type std::vector_Tp, _Alloc::size() const [with _Tp = void*; _Alloc = std::allocatorvoid*; std::vector_Tp, _Alloc::size_type = long unsigned int] Semantic equality hit:static void std::_Destroy_auxtrue::__destroy(_ForwardIterator, _ForwardIterator) [with _ForwardIterator = void**]-static void std::_Destroy_auxtrue::__destroy(_ForwardIterator, _ForwardIterator) [with _ForwardIterator = std::vectorint**] Semantic equality hit:void std::_Destroy(_ForwardIterator, _ForwardIterator) [with _ForwardIterator = void**]-void std::_Destroy(_ForwardIterator, _ForwardIterator) [with _ForwardIterator = std::vectorint**] Semantic equality hit:void std::_Destroy(_ForwardIterator, _ForwardIterator, std::allocator_T2) [with _ForwardIterator = void**; _Tp = void*]-void std::_Destroy(_ForwardIterator, _ForwardIterator, std::allocator_T2) [with _ForwardIterator = std::vectorint**; _Tp = std::vectorint*] Semantic equality hit:void __gnu_cxx::new_allocator_Tp::deallocate(__gnu_cxx::new_allocator_Tp::pointer, __gnu_cxx::new_allocator_Tp::size_type) [with _Tp = void*; __gnu_cxx::new_allocator_Tp::pointer = void**; __gnu_cxx::new_allocator_Tp::size_type = long unsigned int]-void __gnu_cxx::new_allocator_Tp::deallocate(__gnu_cxx::new_allocator_Tp::pointer, __gnu_cxx::new_allocator_Tp::size_type) [with _Tp = std::vectorint*; __gnu_cxx::new_allocator_Tp::pointer = std::vectorint**; __gnu_cxx::new_allocator_Tp::size_type = long unsigned int] Semantic equality hit:static void __gnu_cxx::__alloc_traits_Alloc::deallocate(_Alloc, __gnu_cxx::__alloc_traits_Alloc::pointer, __gnu_cxx::__alloc_traits_Alloc::size_type) [with _Alloc = std::allocatorvoid*; __gnu_cxx::__alloc_traits_Alloc::pointer = void**; __gnu_cxx::__alloc_traits_Alloc::size_type = long unsigned int]-static void __gnu_cxx::__alloc_traits_Alloc::deallocate(_Alloc, __gnu_cxx::__alloc_traits_Alloc::pointer, __gnu_cxx::__alloc_traits_Alloc::size_type) [with _Alloc = std::allocatorstd::vectorint*; __gnu_cxx::__alloc_traits_Alloc::pointer = std::vectorint**; __gnu_cxx::__alloc_traits_Alloc::size_type = long unsigned int] As one would expect, there is a function 'size'. Martin Thanks, Paolo.
Re: [PATCH 1/5] New Identical Code Folding IPA pass
Hi, On 18/06/14 10:46, Martin Liška wrote: As one would expect, there is a function 'size'. Cool, thanks! Paolo.
Re: [PATCH 4/5] Existing tests fix
On 06/17/2014 10:50 PM, Rainer Orth wrote: Jeff Law l...@redhat.com writes: On 06/13/14 04:48, mliska wrote: Hi, many tests rely on a precise number of scanned functions in a dump file. If IPA ICF decides to merge some function and(or) read-only variables, counts do not match. Martin Changelog: 2014-06-13 Martin Liska mli...@suse.cz Honza Hubicka hubi...@ucw.cz * c-c++-common/rotate-1.c: Text ^ Huh? You are right, batch replacement mistake. There should be: * c-c++-common/rotate-1.c: Update dg-options. * c-c++-common/rotate-2.c: Likewise. ... Martin * c-c++-common/rotate-2.c: New test. * c-c++-common/rotate-3.c: Likewise. Rainer
Re: [patch] fix tests for AVX512
On Mon, Jun 09, 2014 at 01:43:48PM +0200, Uros Bizjak wrote: On Mon, Jun 9, 2014 at 1:34 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 08 Jun 11:26, Uros Bizjak wrote: On Tue, May 27, 2014 at 12:28 PM, Petr Murzin petrmurz...@gmail.com wrote: Hi, I've fixed tests for AVX512, so they could be compiled with -Werror -Wall. Please have a look. From a quick look, this looks OK. Thanks, checked into trunk. Could we apply that to 4.9 branch? OK, but please wait a couple of days to check if everything is OK in mainline and also for Release Manager to reject the patch. LGTM. Jakub
RE: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg
-Original Message- From: Richard Earnshaw Sent: Wednesday, June 18, 2014 4:31 PM To: Terry Guo Cc: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan Subject: Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg On 10/06/14 12:42, Terry Guo wrote: Hi There, The thumb1_reorg function use macro INSN_CODE to find expected instructions. But the macro INSN_CODE doesn’t work for label type instruction. The INSN_CODE(label_insn) will return the label number. When we have a lot of labels and current label_insn is the first insn of basic block, the INSN_CODE(label_insn) could accidentally equal to CODE_FOR_cbranchsi4_insn in this case. This leads to ICE due to SET_SRC(label_insn) in subsequent code. In general we should skip all such improper insns. This is the purpose of attached small patch. Some failures in recent gcc regression test on thumb1 target are caused by this reason. So with this patch, all of them passed and no new failures. Is it ok to trunk? BR, Terry 2014-06-10 Terry Guo terry@arm.com * config/arm/arm.c (thumb1_reorg): Move to next basic block if the head of current basic block isn’t a proper insn. I think you should just test that insn != BB_HEAD (bb). The loop immediately above this deals with the !NON-DEBUG insns, so the logic is confusing the way you've written it. R. Thanks for comments. The patch is updated and tested. No more ICE. Is this one OK? BR, Terry diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 85d2114..463707e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16946,7 +16946,8 @@ thumb1_reorg (void) insn = PREV_INSN (insn); /* Find the last cbranchsi4_insn in basic block BB. */ - if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) + if (insn == BB_HEAD (bb) + || INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) continue; /* Get the register with which we are comparing. */
Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg
On 18/06/14 10:16, Terry Guo wrote: -Original Message- From: Richard Earnshaw Sent: Wednesday, June 18, 2014 4:31 PM To: Terry Guo Cc: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan Subject: Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg On 10/06/14 12:42, Terry Guo wrote: Hi There, The thumb1_reorg function use macro INSN_CODE to find expected instructions. But the macro INSN_CODE doesn’t work for label type instruction. The INSN_CODE(label_insn) will return the label number. When we have a lot of labels and current label_insn is the first insn of basic block, the INSN_CODE(label_insn) could accidentally equal to CODE_FOR_cbranchsi4_insn in this case. This leads to ICE due to SET_SRC(label_insn) in subsequent code. In general we should skip all such improper insns. This is the purpose of attached small patch. Some failures in recent gcc regression test on thumb1 target are caused by this reason. So with this patch, all of them passed and no new failures. Is it ok to trunk? BR, Terry 2014-06-10 Terry Guo terry@arm.com * config/arm/arm.c (thumb1_reorg): Move to next basic block if the head of current basic block isn’t a proper insn. I think you should just test that insn != BB_HEAD (bb). The loop immediately above this deals with the !NON-DEBUG insns, so the logic is confusing the way you've written it. R. Thanks for comments. The patch is updated and tested. No more ICE. Is this one OK? BR, Terry Yes, this is fine. R. thumb1-reorg-v3.txt diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 85d2114..463707e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16946,7 +16946,8 @@ thumb1_reorg (void) insn = PREV_INSN (insn); /* Find the last cbranchsi4_insn in basic block BB. */ - if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) + if (insn == BB_HEAD (bb) + || INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) continue; /* Get the register with which we are comparing. */
[Patch libstdc++] PR61536 Export out of line comparison operations.
PR61536 is a case where linking fails on arm-linux-gnueabi* and arm-eabi* systems as the C++ ABI for ARM specifies out of line comparison operators for typeinfo. Rev r211355 tightened the symbols exported by libstdc++ a bit too much which caused some carnage in the test results for arm-linux-gnueabihf. Paolo proposed this on the bugzilla and asked if I could commit it. I've tweaked the comment slightly. Tested on arm-none-linux-gnueabihf and verified the link time failures now disappear. Applied to trunk. Ramana 2014-06-18 Paolo Carlini paolo.carl...@oracle.com Ramana Radhakrishnan ramana.radhakrish...@arm.com PR libstdc++/61536 * config/abi/pre/gnu.ver: Adjust for out of line comparisons. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index e7de756..63c9130 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -16,6 +16,18 @@ ## You should have received a copy of the GNU General Public License along ## with this library; see the file COPYING3. If not see ## http://www.gnu.org/licenses/. +// By default follow the old inline rules to avoid ABI changes. +// Logic similar to libsupc++/typeinfo (libstdc++/61536). See +// commentary on out of line comparisons. + +#ifndef __GXX_TYPEINFO_EQUALITY_INLINE + #if !__GXX_WEAK__ + #define __GXX_TYPEINFO_EQUALITY_INLINE 0 + #else +#define __GXX_TYPEINFO_EQUALITY_INLINE 1 + #endif +#endif + GLIBCXX_3.4 { @@ -760,6 +772,11 @@ GLIBCXX_3.4 { _ZNKSt9type_info1*; _ZNSt9type_infoD*; +#if !__GXX_TYPEINFO_EQUALITY_INLINE +_ZNKSt9type_info6before*; +_ZNKSt9type_infoeq*; +#endif + # std::exception _ZNKSt9exception4whatEv; _ZNSt9exceptionD*;
Update gcc.gnu.org/projects/gomp/
Hi! I've committed following change: --- projects/gomp/index.html25 Oct 2013 07:16:35 - 1.13 +++ projects/gomp/index.html18 Jun 2014 09:48:34 - @@ -63,9 +63,19 @@ available./p h2Status/h2 dl +dtbJun 18, 2014/b/dt +ddpThe last major part of Fortran OpenMP v4.0 support has been +committed into SVN mainline./p/dd + dtbOct 11, 2013/b/dt ddpThe codegomp-4_0-branch/code has been merged into SVN -mainline, so GCC 4.9 and later will feature OpenMP v4.0 support./p/dd +mainline, so GCC 4.9 and later will feature OpenMP v4.0 support for +C and C++./p/dd + +dtbJuly 23, 2013/b/dt +ddpThe final a +href=http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf;OpenMP v4.0/a +specification has been released./p/dd dtbAug 2, 2011/b/dt ddpThe codegomp-3_1-branch/code has been merged into SVN Jakub
[PATCH] Create less TARGET_MEM_REFs
I just figured that we create TARGET_MEM_REF [base: a_4, offset: 0] from within IVOPTs. That pessimizes further passes unnecessarily. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-06-18 Richard Biener rguent...@suse.de * tree-ssa-address.c (create_mem_ref_raw): Use proper predicate to catch all valid MEM_REF pointer operands. Index: gcc/tree-ssa-address.c === --- gcc/tree-ssa-address.c (revision 211771) +++ gcc/tree-ssa-address.c (working copy) @@ -393,7 +393,7 @@ create_mem_ref_raw (tree type, tree alia ??? As IVOPTs does not follow restrictions to where the base pointer may point to create a MEM_REF only if we know that base is valid. */ - if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST) + if (is_gimple_mem_ref_addr (base) (!index2 || integer_zerop (index2)) (!addr-index || integer_zerop (addr-index))) return fold_build2 (MEM_REF, type, base, addr-offset);
Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar venkataramanan.ku...@linaro.org wrote: Hi Maintainers, This patch fixes the PR 60617 that occurs when we turn on reload pass in thumb2 mode. It occurs for the pattern *ior_scc_scc that gets generated for the 3 argument of the below function call. JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst))); (snip---) (insn 634 633 635 27 (parallel [ (set (reg:SI 3 r3) (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) == This operand r5 is registers gets assigned (reg/v:SI 112 [ op2 ])) (eq:SI (reg/v:SI 110 [ dst ]) == This operand (reg/v:SI 111 [ op1 ] (clobber (reg:CC 100 cc)) ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300 {*ior_scc_scc (snip---) The issue here is that the above pattern demands 5 registers (LO_REGS). But when we are in reload, registers r0 is used for pointer to the class, r1 and r2 for first and second argument. r7 is used for stack pointer. So we are left with r3,r4,r5 and r6. But the above patterns needs five LO_REGS. Hence we get spill failure when processing the last register operand in that pattern, In ARM port, TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and for thumb 2 mode there is mention of using LO_REG in the comment as below. Care should be taken to avoid adding thumb-2 patterns that require many low registers So conservative fix is not to allow this pattern for Thumb-2 mode. I don't have an additional solution off the top of my head and probably need to go do some digging. It sounds like the conservative fix but what's the impact of doing so ? Have you measured that in terms of performance or code size on a range of benchmarks ? I allowed these pattern for Thumb2 when we have constant operands for comparison. That makes the target tests arm/thum2-cond-cmp-1.c to thum2-cond-cmp-4.c pass. That sounds fine and fair - no trouble there. My concern is with removing the register alternatives and loosing the ability to trigger conditional compares on 4.9 and trunk for Thumb1 till the time the new conditional compare work makes it in. Ramana Regression tested with gcc 4.9 branch since in trunk this bug is masked revision 209897. Please provide your suggestion on this patch regards, Venkat.
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis charles.bay...@linaro.org wrote: This patch adds support for post-indexed addressing for NEON structure memory accesses. For example VLD1.8 {d0}, [r0], r1 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. Ok for trunk? This is OK. Ramana gcc/Changelog: 2014-06-02 Charles Baylis charles.bay...@linaro.org * config/arm/arm.c (neon_vector_mem_operand): Allow register POST_MODIFY for neon loads and stores. (arm_print_operand): Output post-index register for neon loads and stores.
Re: [PATCH] Create less TARGET_MEM_REFs
On Wed, Jun 18, 2014 at 11:56:01AM +0200, Richard Biener wrote: I just figured that we create TARGET_MEM_REF [base: a_4, offset: 0] from within IVOPTs. That pessimizes further passes unnecessarily. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Isn't that against the comment above it? ??? As IVOPTs does not follow restrictions to where the base pointer may point to create a MEM_REF only if we know that base is valid. Perhaps it is fine only if addr-offset is integer_zerop? 2014-06-18 Richard Biener rguent...@suse.de * tree-ssa-address.c (create_mem_ref_raw): Use proper predicate to catch all valid MEM_REF pointer operands. Index: gcc/tree-ssa-address.c === --- gcc/tree-ssa-address.c(revision 211771) +++ gcc/tree-ssa-address.c(working copy) @@ -393,7 +393,7 @@ create_mem_ref_raw (tree type, tree alia ??? As IVOPTs does not follow restrictions to where the base pointer may point to create a MEM_REF only if we know that base is valid. */ - if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST) + if (is_gimple_mem_ref_addr (base) (!index2 || integer_zerop (index2)) (!addr-index || integer_zerop (addr-index))) return fold_build2 (MEM_REF, type, base, addr-offset); Jakub
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis charles.bay...@linaro.org wrote: On 5 June 2014 07:27, Ramana Radhakrishnan ramana@googlemail.com wrote: On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis charles.bay...@linaro.org wrote: This patch adds support for post-indexed addressing for NEON structure memory accesses. For example VLD1.8 {d0}, [r0], r1 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. Ok for trunk? This looks like a reasonable start but this work doesn't look complete to me yet. Can you also look at the impact on performance of a range of benchmarks especially a popular embedded one to see how this behaves unless you have already done so ? I ran a popular suite of embedded benchmarks, and there is no impact at all on Chromebook (including with the additional attached patch) Thanks for the due diligence The patch was developed to address a performance issue with a new version of libvpx which uses intrinsics instead of NEON assembler. The patch results in a 3% improvement for VP8 decode. Good - 3% not to be sneezed at. POST_INC, POST_MODIFY usually have a funny way of biting you with either ivopts or the way in which address costs work. I think there maybe further tweaks needed but for a first step I'd like to know what the performance impact is. I would also suggest running this through clyon's neon intrinsics testsuite to see if that catches any issues especially with the large vector modes. Thanks. No issues found in clyon's tests. Please keep an eye out for any regressions. Your mention of larger vector modes prompted me to check that the patch has the desired result with them. In fact, the costs are estimated incorrectly which means the post_modify pattern is not used. The attached patch fixes that. (used in combination with my original patch) 2014-06-15 Charles Baylis charles.ba...@linaro.org * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with embedded side effects. I'm not too thrilled with putting in more special cases that are not table driven in there. Can you file a PR with some testcases that show this so that we don't forget and CC me on it please ? Ramana
Re: [PATCH] PR61517: fix stmt replacement in bswap pass
On Wed, Jun 18, 2014 at 3:30 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi everybody, Thanks to a comment from Richard Biener, the bswap pass take care to not perform its optimization is memory is modified between the load of the original expression. However, when it replaces these statements by a single load, it does so in the gimple statement that computes the final bitwise OR of the original expression. However, memory could be modified between the last load statement and this bitwise OR statement. Therefore the result is to read memory *after* it was changed instead of before. This patch takes care to move the statement to be replaced close to one of the original load, thus avoiding this problem. Ok. Thanks, Richard. ChangeLog entries for this fix are: *** gcc/ChangeLog *** 2014-06-16 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt whose rhs's first tree is the source expression instead of the expression itself. (find_bswap_or_nop): Likewise. (bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as a gimple stmt whose rhs's first tree is the source. In the memory source case, move the stmt to be replaced close to one of the original load to avoid the problem of a store between the load and the stmt's original location. (pass_optimize_bswap::execute): Adapt to change in bswap_replace's signature. *** gcc/testsuite/ChangeLog *** 2014-06-16 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New. (incorrect_read_be32): Likewise. (main): Call incorrect_read_* to test stmt replacement is made by bswap at the right place. * gcc.c-torture/execute/pr61517.c: New test. Patch also attached for convenience. Is it ok for trunk? diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index a47e01a..88132fe 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y) return c3 | c2 8 | c1 16 | c0 24; } +__attribute__ ((noinline, noclone)) uint32_t +incorrect_read_le32 (char *x, char *y) +{ + unsigned char c0, c1, c2, c3; + + c0 = x[0]; + c1 = x[1]; + c2 = x[2]; + c3 = x[3]; + *y = 1; + return c0 | c1 8 | c2 16 | c3 24; +} + +__attribute__ ((noinline, noclone)) uint32_t +incorrect_read_be32 (char *x, char *y) +{ + unsigned char c0, c1, c2, c3; + + c0 = x[0]; + c1 = x[1]; + c2 = x[2]; + c3 = x[3]; + *y = 1; + return c3 | c2 8 | c1 16 | c0 24; +} + int main () { @@ -92,8 +118,17 @@ main () out = fake_read_le32 (cin, cin[2]); if (out != 0x89018583) __builtin_abort (); + cin[2] = 0x87; out = fake_read_be32 (cin, cin[2]); if (out != 0x83850189) __builtin_abort (); + cin[2] = 0x87; + out = incorrect_read_le32 (cin, cin[2]); + if (out != 0x89878583) +__builtin_abort (); + cin[2] = 0x87; + out = incorrect_read_be32 (cin, cin[2]); + if (out != 0x83858789) +__builtin_abort (); return 0; } diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c b/gcc/testsuite/gcc.c-torture/execute/pr61517.c new file mode 100644 index 000..fc9bbe8 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c @@ -0,0 +1,19 @@ +int a, b, *c = a; +unsigned short d; + +int +main () +{ + unsigned int e = a; + *c = 1; + if (!b) +{ + d = e; + *c = d | e; +} + + if (a != 0) +__builtin_abort (); + + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c868e92..1ee2ba8 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform the operation given by the rhs of STMT on the result. If the operation - could successfully be executed the function returns the tree expression of - the source operand and NULL otherwise. */ + could successfully be executed the function returns a gimple stmt whose + rhs's first tree is the expression of the source operand and NULL + otherwise. */ -static tree +static gimple find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) { enum tree_code code; tree rhs1, rhs2 = NULL; - gimple rhs1_stmt, rhs2_stmt; - tree source_expr1; + gimple rhs1_stmt, rhs2_stmt, source_stmt1; enum gimple_rhs_class rhs_class; if (!limit || !is_gimple_assign (stmt)) -return NULL_TREE; +return NULL; rhs1 = gimple_assign_rhs1 (stmt); if (find_bswap_or_nop_load (stmt, rhs1, n)) -
Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
On Wed, Jun 18, 2014 at 6:55 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, June 11, 2014 4:32 PM Is this OK for trunk? Does this bug qualify for a backport patch to 4.8 and 4.9 branches? This is ok for trunk and also for backporting (after a short while to see if there is any fallout). Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and 4.9? If yes, how much more should I wait before committing? Tested on both 4.8 and 4.9 without regression in the testsuite after a bootstrap. This is ok to commit now. Thanks, Richard. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1e35bbe..0559b7f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,16 @@ +2014-06-12 Thomas Preud'homme thomas.preudho...@arm.com + + PR tree-optimization/61306 + * tree-ssa-math-opts.c (struct symbolic_number): Store type of + expression instead of its size. + (do_shift_rotate): Adapt to change in struct symbolic_number. Return + false to prevent optimization when the result is unpredictable due to + arithmetic right shift of signed type with highest byte is set. + (verify_symbolic_number_p): Adapt to change in struct symbolic_number. + (find_bswap_1): Likewise. Return NULL to prevent optimization when the + result is unpredictable due to sign extension. + (find_bswap): Adapt to change in struct symbolic_number. + 2014-06-12 Alan Modra amo...@gmail.com PR target/61300 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 757cb74..139f23c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-06-12 Thomas Preud'homme thomas.preudho...@arm.com + + * gcc.c-torture/execute/pr61306-1.c: New test. + * gcc.c-torture/execute/pr61306-2.c: Likewise. + * gcc.c-torture/execute/pr61306-3.c: Likewise. + 2014-06-11 Richard Biener rguent...@suse.de PR tree-optimization/61452 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c new file mode 100644 index 000..ebc90a3 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c @@ -0,0 +1,39 @@ +#ifdef __INT32_TYPE__ +typedef __INT32_TYPE__ int32_t; +#else +typedef int int32_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ + (((uint32_t)(x) (uint32_t)0x00ffUL) 24) |\ + (((uint32_t)(x) (uint32_t)0xff00UL) 8) |\ + (((uint32_t)(x) (uint32_t)0x00ffUL) 8) |\ + (( (int32_t)(x) (int32_t)0xff00UL) 24))) + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (int32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (fake_bswap32 (0x87654321) != 0xff87) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c new file mode 100644 index 000..886ecfd --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c @@ -0,0 +1,40 @@ +#ifdef __INT16_TYPE__ +typedef __INT16_TYPE__ int16_t; +#else +typedef short int16_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ + (((uint32_t) (x) (uint32_t)0x00ffUL) 24) | \ + (((uint32_t)(int16_t)(x) (uint32_t)0x0000UL) 8) | \ + (((uint32_t) (x) (uint32_t)0x00ffUL) 8) | \ + (((uint32_t) (x) (uint32_t)0xff00UL) 24))) + + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (int16_t) * __CHAR_BIT__ != 16) +return 0; + if (fake_bswap32 (0x81828384) != 0xff838281) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c new file mode 100644 index 000..6086e27 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c @@ -0,0 +1,13 @@ +short a = -1;
Re: [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO
On Wed, Jun 18, 2014 at 10:14 AM, Hale Wang hale.w...@arm.com wrote: Hi, With LTO, -fno-short-enums is ignored, resulting in ABI mis-matching in linking. Refer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61123 for details. This patch add fshort-enums and fshout-wchar to LTO group. To check it, a new procedure object-readelf is added in testsuite/lib/lto.exp and new lto tests are added in gcc.target/arm/lto. Bootstrap and no make check regression on X86-64. Patch also attached for convenience. Is It ok for trunk? Thanks and Best Regards, Hale Wang c-family/ChangeLog 2014-06-18 Hale Wang hale.w...@arm.com PR lto/61123 *c.opt (fshort-enums): Add to LTO. *c.opt (fshort-wchar): Likewise. Space after the *. I think you don't need to copy the LTO harness but you can simply use dg.exp and sth similar to gcc.dg/20081223-1.c (there is an effective target 'lto' to guard for lto support). So simply place the testcase in gcc.target/arm/ (make sure to put a dg-do compile on the 2nd file and use dg-additional-sources). If that doesn't work I'd say put the testcase in gcc.dg/lto/ instead and do a dg-skip-if for non-arm targets. Ok with one of those changes. Thanks, Richard. testsuite/ChangeLog 2014-06-18 Hale Wang hale.w...@arm.com * gcc.target/arm/lto/: New folder to verify the LTO option for ARM specific. * gcc.target/arm/lto/pr61123-enum-size_0.c: New test case. * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise. * gcc.target/arm/lto/lto.exp: New exp file used to test LTO option for ARM specific. * lib/lto.exp (object-readelf): New procedure used to catch the enum size in the final executable. Index: gcc/c-family/c.opt === --- gcc/c-family/c.opt (revision 211394) +++ gcc/c-family/c.opt (working copy) @@ -1189,11 +1189,11 @@ Use the same size for double as for float fshort-enums -C ObjC C++ ObjC++ Optimization Var(flag_short_enums) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums) Use the narrowest integer type possible for enumeration types fshort-wchar -C ObjC C++ ObjC++ Optimization Var(flag_short_wchar) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_wchar) Force the underlying type for \wchar_t\ to be \unsigned short\ fsigned-bitfields Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c === --- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c (revision 0) +++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c(revision 0) @@ -0,0 +1,22 @@ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -fno-short-enums -Wl,-Ur,--no-enum-size-warning -Os -nostdlib -flto } } } */ + +#include stdlib.h + +enum enum_size_attribute +{ + small_size, int_size +}; + +struct debug_ABI_enum_size +{ + enum enum_size_attribute es; +}; + +int +foo1 (struct debug_ABI_enum_size *x) +{ + return sizeof (x-es); +} + +/* { dg-final { object-readelf Tag_ABI_enum_size int { target arm_eabi } } } */ Index: gcc/testsuite/gcc.target/arm/lto/lto.exp === --- gcc/testsuite/gcc.target/arm/lto/lto.exp(revision 0) +++ gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0) @@ -0,0 +1,59 @@ +# Copyright (C) 2009-2014 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. +# +# Contributed by Diego Novillo dnovi...@google.com + + +# Test link-time optimization across multiple files. +# +# Programs are broken into multiple files. Each one is compiled +# separately with LTO information. The final executable is generated +# by collecting all the generated object files using regular LTO or WHOPR. + +if $tracelevel then { +strace $tracelevel +} + +# Load procedures from common libraries. +load_lib standard.exp +load_lib gcc.exp + +# Load the language-independent compabibility support procedures. +load_lib lto.exp + +# If LTO has not been enabled, bail. +if { ![check_effective_target_lto] } { +return +} + +gcc_init +lto_init no-mathlib + +# Define an identifier for use with this
Re: [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO
On Wed, Jun 18, 2014 at 12:21 PM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Jun 18, 2014 at 10:14 AM, Hale Wang hale.w...@arm.com wrote: Hi, With LTO, -fno-short-enums is ignored, resulting in ABI mis-matching in linking. Refer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61123 for details. This patch add fshort-enums and fshout-wchar to LTO group. To check it, a new procedure object-readelf is added in testsuite/lib/lto.exp and new lto tests are added in gcc.target/arm/lto. Bootstrap and no make check regression on X86-64. Patch also attached for convenience. Is It ok for trunk? Thanks and Best Regards, Hale Wang c-family/ChangeLog 2014-06-18 Hale Wang hale.w...@arm.com PR lto/61123 *c.opt (fshort-enums): Add to LTO. *c.opt (fshort-wchar): Likewise. Space after the *. I think you don't need to copy the LTO harness but you can simply use dg.exp and sth similar to gcc.dg/20081223-1.c (there is an effective target 'lto' to guard for lto support). So simply place the testcase in gcc.target/arm/ (make sure to put a dg-do compile on the 2nd file and use dg-additional-sources). If that doesn't work I'd say put the testcase in gcc.dg/lto/ instead and do a dg-skip-if for non-arm targets. Ok with one of those changes. Oh, I see you need a new object-readelf ... I defer to a testsuite maintainer for this part. Richard. Thanks, Richard. testsuite/ChangeLog 2014-06-18 Hale Wang hale.w...@arm.com * gcc.target/arm/lto/: New folder to verify the LTO option for ARM specific. * gcc.target/arm/lto/pr61123-enum-size_0.c: New test case. * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise. * gcc.target/arm/lto/lto.exp: New exp file used to test LTO option for ARM specific. * lib/lto.exp (object-readelf): New procedure used to catch the enum size in the final executable. Index: gcc/c-family/c.opt === --- gcc/c-family/c.opt (revision 211394) +++ gcc/c-family/c.opt (working copy) @@ -1189,11 +1189,11 @@ Use the same size for double as for float fshort-enums -C ObjC C++ ObjC++ Optimization Var(flag_short_enums) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums) Use the narrowest integer type possible for enumeration types fshort-wchar -C ObjC C++ ObjC++ Optimization Var(flag_short_wchar) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_wchar) Force the underlying type for \wchar_t\ to be \unsigned short\ fsigned-bitfields Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c === --- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c (revision 0) +++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c(revision 0) @@ -0,0 +1,22 @@ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -fno-short-enums -Wl,-Ur,--no-enum-size-warning -Os -nostdlib -flto } } } */ + +#include stdlib.h + +enum enum_size_attribute +{ + small_size, int_size +}; + +struct debug_ABI_enum_size +{ + enum enum_size_attribute es; +}; + +int +foo1 (struct debug_ABI_enum_size *x) +{ + return sizeof (x-es); +} + +/* { dg-final { object-readelf Tag_ABI_enum_size int { target arm_eabi } } } */ Index: gcc/testsuite/gcc.target/arm/lto/lto.exp === --- gcc/testsuite/gcc.target/arm/lto/lto.exp(revision 0) +++ gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0) @@ -0,0 +1,59 @@ +# Copyright (C) 2009-2014 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. +# +# Contributed by Diego Novillo dnovi...@google.com + + +# Test link-time optimization across multiple files. +# +# Programs are broken into multiple files. Each one is compiled +# separately with LTO information. The final executable is generated +# by collecting all the generated object files using regular LTO or WHOPR. + +if $tracelevel then { +strace $tracelevel +} + +# Load procedures from common libraries. +load_lib standard.exp +load_lib gcc.exp + +# Load the language-independent compabibility support procedures.
[PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
Hello! Attached patch fixes recently added sibcall insns and their corresponding peephole2 patterns: - There is no need for new memory_nox32_operand. A generic memory_operand can be used, since new insns and peephole2 patterns should be disabled for TARGET_X32 entirely. - Adds missing m constraint in insn patterns. - Macroizes peephole2 patterns - Adds check that eliminated register is really dead after the call (maybe an overkill, but some hard-to-debug problems surfaced due to missing liveness checks in the past) - Fixes call RTXes in sibcall_pop related patterns (and fixes two newly introduced warnings in i386.md) 2014-06-18 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*sibcall_memory): Rename from *sibcall_intern. Do not use unspec as call operand. Use memory_operand instead of memory_nox32_operand and add m operand constraint. Disable pattern for TARGET_X32. (*sibcall_pop_memory): Ditto. (*sibcall_value_memory): Ditto. (*sibcall_value_pop_memory): Ditto. (sibcall peepholes): Merge SImode and DImode patterns using W mode iterator. Use memory_operand instead of memory_nox32_operand. Disable pattern for TARGET_X32. Check if eliminated register is really dead after call insn. Generate call RTX without unspec operand. (sibcall_value peepholes): Ditto. (sibcall_pop peepholes): Fix call insn RTXes. Use memory_operand instead of memory_nox32_operand. Check if eliminated register is really dead after call insn. Generate call RTX without unspec operand. (sibcall_value_pop peepholes): Ditto. * config/i386/predicates.md (memory_nox32_operand): Remove predicate. The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 211725) +++ i386.md (working copy) @@ -11354,53 +11354,38 @@ * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) -(define_insn *sibcall_intern - [(call (unspec [(mem:QI (match_operand:W 0 memory_nox32_operand))] - UNSPEC_PEEPSIB) -(match_operand 1))] - +(define_insn *sibcall_memory + [(call (mem:QI (match_operand:W 0 memory_operand m)) +(match_operand 1)) + (unspec [(const_int 0)] UNSPEC_PEEPSIB)] + !TARGET_X32 * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) (define_peephole2 - [(set (match_operand:DI 0 register_operand) -(match_operand:DI 1 memory_nox32_operand)) + [(set (match_operand:W 0 register_operand) + (match_operand:W 1 memory_operand)) (call (mem:QI (match_dup 0)) (match_operand 3))] - TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (1)) - [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) - (match_dup 3))]) + !TARGET_X32 SIBLING_CALL_P (peep2_next_insn (1)) +peep2_reg_dead_p (2, operands[0]) + [(parallel [(call (mem:QI (match_dup 1)) + (match_dup 3)) + (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) (define_peephole2 - [(set (match_operand:DI 0 register_operand) -(match_operand:DI 1 memory_nox32_operand)) + [(set (match_operand:W 0 register_operand) + (match_operand:W 1 memory_operand)) (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) (call (mem:QI (match_dup 0)) (match_operand 3))] - TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (2)) + !TARGET_X32 SIBLING_CALL_P (peep2_next_insn (2)) +peep2_reg_dead_p (3, operands[0]) [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) - (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) - (match_dup 3))]) + (parallel [(call (mem:QI (match_dup 1)) + (match_dup 3)) + (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) -(define_peephole2 - [(set (match_operand:SI 0 register_operand) -(match_operand:SI 1 memory_nox32_operand)) - (call (mem:QI (match_dup 0)) - (match_operand 3))] - !TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (1)) - [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) - (match_dup 3))]) - -(define_peephole2 - [(set (match_operand:SI 0 register_operand) -(match_operand:SI 1 memory_nox32_operand)) - (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) - (call (mem:QI (match_dup 0)) - (match_operand 3))] - !TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (2)) - [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) - (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) (match_dup 3))]) - (define_expand call_pop [(parallel [(call (match_operand:QI 0) (match_operand:SI 1)) @@ -11434,42 +11419,52 @@ * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) -(define_insn *sibcall_pop_intern - [(call (unspec [(mem:QI (match_operand:SI 0 memory_nox32_operand))] - UNSPEC_PEEPSIB) +(define_insn *sibcall_pop_memory +
Re: [PATCH] Create less TARGET_MEM_REFs
On Wed, 18 Jun 2014, Jakub Jelinek wrote: On Wed, Jun 18, 2014 at 11:56:01AM +0200, Richard Biener wrote: I just figured that we create TARGET_MEM_REF [base: a_4, offset: 0] from within IVOPTs. That pessimizes further passes unnecessarily. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Isn't that against the comment above it? ??? As IVOPTs does not follow restrictions to where the base pointer may point to create a MEM_REF only if we know that base is valid. Perhaps it is fine only if addr-offset is integer_zerop? Oh yeah, I guess so. Damn IVOPTs ... (though I wonder what's the difference with MEM[foo, -4B] then, which we don't catch either). That said, I'm not sure if it really fixes anything not allowing MEM_REFs in all cases. I've found a different workaround for the issue I was facing so I'm dropping this patch instead. Richard. 2014-06-18 Richard Biener rguent...@suse.de * tree-ssa-address.c (create_mem_ref_raw): Use proper predicate to catch all valid MEM_REF pointer operands. Index: gcc/tree-ssa-address.c === --- gcc/tree-ssa-address.c (revision 211771) +++ gcc/tree-ssa-address.c (working copy) @@ -393,7 +393,7 @@ create_mem_ref_raw (tree type, tree alia ??? As IVOPTs does not follow restrictions to where the base pointer may point to create a MEM_REF only if we know that base is valid. */ - if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST) + if (is_gimple_mem_ref_addr (base) (!index2 || integer_zerop (index2)) (!addr-index || integer_zerop (addr-index))) return fold_build2 (MEM_REF, type, base, addr-offset);
Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
Hi Ramana, On 18 June 2014 15:29, Ramana Radhakrishnan ramana@googlemail.com wrote: On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar venkataramanan.ku...@linaro.org wrote: Hi Maintainers, This patch fixes the PR 60617 that occurs when we turn on reload pass in thumb2 mode. It occurs for the pattern *ior_scc_scc that gets generated for the 3 argument of the below function call. JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst))); (snip---) (insn 634 633 635 27 (parallel [ (set (reg:SI 3 r3) (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) == This operand r5 is registers gets assigned (reg/v:SI 112 [ op2 ])) (eq:SI (reg/v:SI 110 [ dst ]) == This operand (reg/v:SI 111 [ op1 ] (clobber (reg:CC 100 cc)) ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300 {*ior_scc_scc (snip---) The issue here is that the above pattern demands 5 registers (LO_REGS). But when we are in reload, registers r0 is used for pointer to the class, r1 and r2 for first and second argument. r7 is used for stack pointer. So we are left with r3,r4,r5 and r6. But the above patterns needs five LO_REGS. Hence we get spill failure when processing the last register operand in that pattern, In ARM port, TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and for thumb 2 mode there is mention of using LO_REG in the comment as below. Care should be taken to avoid adding thumb-2 patterns that require many low registers So conservative fix is not to allow this pattern for Thumb-2 mode. I don't have an additional solution off the top of my head and probably need to go do some digging. It sounds like the conservative fix but what's the impact of doing so ? Have you measured that in terms of performance or code size on a range of benchmarks ? I haven't done any benchmark testing. I will try and run some benchmarks with my patch. I allowed these pattern for Thumb2 when we have constant operands for comparison. That makes the target tests arm/thum2-cond-cmp-1.c to thum2-cond-cmp-4.c pass. That sounds fine and fair - no trouble there. My concern is with removing the register alternatives and loosing the ability to trigger conditional compares on 4.9 and trunk for Thumb1 till the time the new conditional compare work makes it in. Ramana This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the LRA pass is enabled by default now . May be too conservative, but is there a way to enable this pattern when we have LRA pass and prevent it we have old reload pass? regards, Venkat. Regression tested with gcc 4.9 branch since in trunk this bug is masked revision 209897. Please provide your suggestion on this patch regards, Venkat.
Re: [PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper
Jeff Law l...@redhat.com writes: Let's do better this time ;-) Add a testcase for the m68k port which verifies we're getting the desired code. Make sense. Installed with the following test case. Andreas. PR rtl-optimization/54555 * gcc.target/m68k/pr54555.c: New test. diff --git a/gcc/testsuite/gcc.target/m68k/pr54555.c b/gcc/testsuite/gcc.target/m68k/pr54555.c new file mode 100644 index 000..4be704b --- /dev/null +++ b/gcc/testsuite/gcc.target/m68k/pr54555.c @@ -0,0 +1,13 @@ +/* PR rtl-optimization/54555 + Test that postreload does not shorten the load of small constants to + use move.b instead of moveq. */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-final { scan-assembler-not move\\.?b } } */ + +void foo (void); +void bar (int a) +{ + if (a == 16 || a == 23) foo (); + if (a == -110 || a == -128) foo (); +} -- 2.0.0 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group
The following aims at reducing the number of pointless passes we run on functions containing no loops. Those are at least two copyprop and one dce pass (two dce passes when vectorization is enabled, three dce passes and an additional copyprop pass when any graphite optimization is enabled). Simply gating pass_tree_loop on number_of_loops () 1 would disable basic-block vectorization on loopless functions. Moving basic-block vectorization out of pass_tree_loop works to the extent that you'd need to move IVOPTs as well as data-ref analysis cannot cope with TARGET_MEM_REFs. So the following introduces a pass_tree_no_loop pass group which is enabled whenever the pass_tree_loop group is disabled. As followup this would allow to skip cleanup work we do after the loop pipeline just to cleanup after it. Any comments? Does such followup sound realistic or would it be better to take the opportunity to move IVOPTs a bit closer to RTL expansion and avoid that pass_tree_no_loop hack? Bootstrap and regtest running on x86_64-unknown-linux-gnu. Thanks, Richard. 2014-06-18 Richard Biener rguent...@suse.de * tree-ssa-loop.c (gate_loop): New function. (pass_tree_loop::gate): Call it. (pass_data_tree_no_loop, pass_tree_no_loop, make_pass_tree_no_loop): New. * tree-vectorizer.c: Include tree-scalar-evolution.c (pass_slp_vectorize::execute): Initialize loops and SCEV if required. (pass_slp_vectorize::clone): New method. * timevar.def (TV_TREE_NOLOOP): New. * tree-pass.h (make_pass_tree_no_loop): Declare. * passes.def (pass_tree_no_loop): New pass group with SLP vectorizer. Index: gcc/tree-ssa-loop.c === *** gcc/tree-ssa-loop.c.orig2014-06-18 12:06:19.226205380 +0200 --- gcc/tree-ssa-loop.c 2014-06-18 12:06:39.103204012 +0200 *** along with GCC; see the file COPYING3. *** 42,47 --- 42,63 #include diagnostic-core.h #include tree-vectorizer.h + + /* Gate for loop pass group. The group is controlled by -ftree-loop-optimize +but we also avoid running it when the IL doesn't contain any loop. */ + + static bool + gate_loop (function *fn) + { + if (!flag_tree_loop_optimize) + return false; + + /* Make sure to drop / re-discover loops when necessary. */ + if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) + fix_loop_structure (NULL); + return number_of_loops (fn) 1; + } + /* The loop superpass. */ namespace { *** public: *** 68,74 {} /* opt_pass methods: */ ! virtual bool gate (function *) { return flag_tree_loop_optimize != 0; } }; // class pass_tree_loop --- 84,90 {} /* opt_pass methods: */ ! virtual bool gate (function *fn) { return gate_loop (fn); } }; // class pass_tree_loop *** make_pass_tree_loop (gcc::context *ctxt) *** 80,85 --- 96,140 return new pass_tree_loop (ctxt); } + /* The no-loop superpass. */ + + namespace { + + const pass_data pass_data_tree_no_loop = + { + GIMPLE_PASS, /* type */ + no_loop, /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + false, /* has_execute */ + TV_TREE_NOLOOP, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ + }; + + class pass_tree_no_loop : public gimple_opt_pass + { + public: + pass_tree_no_loop (gcc::context *ctxt) + : gimple_opt_pass (pass_data_tree_no_loop, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *fn) { return !gate_loop (fn); } + + }; // class pass_tree_no_loop + + } // anon namespace + + gimple_opt_pass * + make_pass_tree_no_loop (gcc::context *ctxt) + { + return new pass_tree_no_loop (ctxt); + } + + /* Loop optimizer initialization. */ namespace { Index: gcc/tree-vectorizer.c === *** gcc/tree-vectorizer.c.orig 2014-06-18 12:06:19.226205380 +0200 --- gcc/tree-vectorizer.c 2014-06-18 12:10:55.958186328 +0200 *** along with GCC; see the file COPYING3. *** 82,87 --- 82,89 #include tree-ssa-propagate.h #include dbgcnt.h #include gimple-fold.h + #include tree-scalar-evolution.h + /* Loop or bb location. */ source_location vect_location; *** public: *** 610,615 --- 612,618 {} /* opt_pass methods: */ + opt_pass * clone () { return new pass_slp_vectorize (m_ctxt); } virtual bool gate (function *) { return flag_tree_slp_vectorize != 0; } virtual unsigned int execute (function *); *** pass_slp_vectorize::execute (function *f *** 620,625 --- 623,635 { basic_block bb; + bool in_loop_pipeline = scev_initialized_p (); + if (!in_loop_pipeline) + { + loop_optimizer_init
[PATCH PR61518]
Hi All, Here is a fix for PR 61518 - additional test was added to reject transformation if reduction variable is not used in reduction statement only since such reduction will not be vectorized. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-06-18 Yuri Rumyantsev ysrum...@gmail.com PR tree-optimization/61518 * tree-if-conv.c (is_cond_scalar_reduction): Add missed check that reduction var is used in reduction stmt or phi-function only. gcc/testsuite/ChangeLog * gcc.dg/torture/pr61518.c: New test. patch Description: Binary data
Re: [PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group
On Wed, Jun 18, 2014 at 12:42:19PM +0200, Richard Biener wrote: Any comments? Does such followup sound realistic or would it be better to take the opportunity to move IVOPTs a bit closer to RTL expansion and avoid that pass_tree_no_loop hack? I think it is fine to have pass_tree_no_loop pass pipeline. Jakub
[PATCH][ARM][committed] Fix check for __FAST_MATH in arm_neon.h
Hi all, All other #ifdefs in arm_neon.h that look for fast math use the __FAST_MATH form rather than __FAST_MATH__. They have the same effect AFAICS. This patch fixes the one sticking out. Committed as obvious with r211779. Thanks, Kyrill 2014-06-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm_neon.h (vadd_f32): Change #ifdef to __FAST_MATH.diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3e29f44..47f6c5e 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -474,7 +474,7 @@ vadd_s32 (int32x2_t __a, int32x2_t __b) __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vadd_f32 (float32x2_t __a, float32x2_t __b) { -#ifdef __FAST_MATH__ +#ifdef __FAST_MATH return __a + __b; #else return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b, 3);
Re: [PATCH PR61518]
On Wed, Jun 18, 2014 at 12:47 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is a fix for PR 61518 - additional test was added to reject transformation if reduction variable is not used in reduction statement only since such reduction will not be vectorized. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? Ok. Thanks, Richard. gcc/ChangeLog 2014-06-18 Yuri Rumyantsev ysrum...@gmail.com PR tree-optimization/61518 * tree-if-conv.c (is_cond_scalar_reduction): Add missed check that reduction var is used in reduction stmt or phi-function only. gcc/testsuite/ChangeLog * gcc.dg/torture/pr61518.c: New test.
Re: [PATCH] pass cleanups
On Wed, 18 Jun 2014, Richard Biener wrote: This removes the special dce_loop pass in favor of dealing with scev and niter estimates in dce generally. Likewise it makes copyprop always cleanup after itself, dealing with scev and niter estimates. It also makes copyprop not unconditionally schedule a cfg-cleanup but only do so if copyprop did any transform. Bootstrap and regtest running on x86_64-unknown-linux-gnu. I have applied the following with the testsuite adjustments needed for gcc.dg/vect/dump-tree-dceloop-pr26359.c. Richard. 2014-06-18 Richard Biener rguent...@suse.de * tree-pass.h (make_pass_dce_loop): Remove. * passes.def: Replace pass_dce_loop with pass_dce. * tree-ssa-dce.c (perform_tree_ssa_dce): If something changed free niter estimates and reset the scev cache. (tree_ssa_dce_loop, pass_data_dce_loop, pass_dce_loop, make_pass_dce_loop): Remove. * tree-ssa-copy.c: Include tree-ssa-loop-niter.h. (fini_copy_prop): Return whether something changed. Always let substitute_and_fold perform DCE and free niter estimates and reset the scev cache if so. (execute_copy_prop): If sth changed schedule cleanup-cfg. (pass_data_copy_prop): Do not unconditionally schedule cleanup-cfg or update-ssa. * gcc.dg/vect/vect.exp: Remove dump-tree-dceloop-* processing. * gcc.dg/vect/dump-tree-dceloop-pr26359.c: Rename to ... * gcc.dg/vect/pr26359.c: ... this and adjust appropriately. Index: gcc/tree-pass.h === *** gcc/tree-pass.h (revision 211738) --- gcc/tree-pass.h (working copy) *** extern gimple_opt_pass *make_pass_build_ *** 382,388 extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt); - extern gimple_opt_pass *make_pass_dce_loop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt); --- 382,387 Index: gcc/passes.def === *** gcc/passes.def (revision 211738) --- gcc/passes.def (working copy) *** along with GCC; see the file COPYING3. *** 203,209 NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce_loop); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); --- 206,212 NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); *** along with GCC; see the file COPYING3. *** 215,221 NEXT_PASS (pass_graphite_transforms); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce_loop); POP_INSERT_PASSES () NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); --- 218,224 NEXT_PASS (pass_graphite_transforms); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); ! NEXT_PASS (pass_dce); POP_INSERT_PASSES () NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); *** along with GCC; see the file COPYING3. *** 224,230 Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) ! NEXT_PASS (pass_dce_loop); POP_INSERT_PASSES () NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); --- 227,233 Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) ! NEXT_PASS (pass_dce); POP_INSERT_PASSES () NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); Index: gcc/tree-ssa-dce.c === *** gcc/tree-ssa-dce.c (revision 211738) --- gcc/tree-ssa-dce.c (working copy) *** perform_tree_ssa_dce (bool aggressive) *** 1479,1485 tree_dce_done (aggressive); if (something_changed) ! return TODO_update_ssa | TODO_cleanup_cfg; return 0; } --- 1479,1490 tree_dce_done (aggressive); if (something_changed) ! { ! free_numbers_of_iterations_estimates (); ! if (scev_initialized_p) !
Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
2014-06-18 12:24 GMT+02:00 Uros Bizjak ubiz...@gmail.com: Hello! Attached patch fixes recently added sibcall insns and their corresponding peephole2 patterns: - There is no need for new memory_nox32_operand. A generic memory_operand can be used, since new insns and peephole2 patterns should be disabled for TARGET_X32 entirely. - Adds missing m constraint in insn patterns. - Macroizes peephole2 patterns - Adds check that eliminated register is really dead after the call (maybe an overkill, but some hard-to-debug problems surfaced due to missing liveness checks in the past) - Fixes call RTXes in sibcall_pop related patterns (and fixes two newly introduced warnings in i386.md) 2014-06-18 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*sibcall_memory): Rename from *sibcall_intern. Do not use unspec as call operand. Use memory_operand instead of memory_nox32_operand and add m operand constraint. Disable pattern for TARGET_X32. (*sibcall_pop_memory): Ditto. (*sibcall_value_memory): Ditto. (*sibcall_value_pop_memory): Ditto. (sibcall peepholes): Merge SImode and DImode patterns using W mode iterator. Use memory_operand instead of memory_nox32_operand. Disable pattern for TARGET_X32. Check if eliminated register is really dead after call insn. Generate call RTX without unspec operand. (sibcall_value peepholes): Ditto. (sibcall_pop peepholes): Fix call insn RTXes. Use memory_operand instead of memory_nox32_operand. Check if eliminated register is really dead after call insn. Generate call RTX without unspec operand. (sibcall_value_pop peepholes): Ditto. * config/i386/predicates.md (memory_nox32_operand): Remove predicate. The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN. Uros. The following change in predicates.md seems to be a bit premature. There is still the point about Darwin's PIC issue for unspec-gotpcrel. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387 return ANY_QI_REG_P (op); }) +;; Return true if OP is a memory operands that can be used in sibcalls. (define_predicate sibcall_memory_operand - (match_operand 0 memory_operand) -{ - return CONSTANT_P (XEXP (op, 0)); -}) + (and (match_operand 0 memory_operand) + (match_test CONSTANT_P (XEXP (op, 0) as we might to pessimize for Darwin UNSPEC_GOTPCREL at that point. In general there is still the question why this issue just happens for Darwin, but not for linux. For linux that gotpcrel-code path seems not to be hit at all (at least is that what Ians told). Kai
[GSoC] [match-and-simplify] check for capture index
Put a check for capture index. * genmatch.c (parse_capture): Add condition to check capture index. (capture_max): New constant. (stdlib.h): Include. Thanks and Regards, Prathamesh Index: genmatch.c === --- genmatch.c (revision 211732) +++ genmatch.c (working copy) @@ -29,7 +29,9 @@ along with GCC; see the file COPYING3. #include hashtab.h #include hash-table.h #include vec.h +#include stdlib.h +const unsigned capture_max = 4; /* libccp helpers. */ @@ -816,7 +818,11 @@ static struct operand * parse_capture (cpp_reader *r, operand *op) { eat_token (r, CPP_ATSIGN); - return new capture (get_number (r), op); + const cpp_token *token = peek (r); + const char *num = get_number (r); + if (atoi (num) = capture_max) +fatal_at (token, capture cannot be greater than %u, capture_max - 1); + return new capture (num, op); } /* Parse
Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
On Wed, Jun 18, 2014 at 2:24 PM, Kai Tietz ktiet...@googlemail.com wrote: The following change in predicates.md seems to be a bit premature. There is still the point about Darwin's PIC issue for unspec-gotpcrel. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387 return ANY_QI_REG_P (op); }) +;; Return true if OP is a memory operands that can be used in sibcalls. (define_predicate sibcall_memory_operand - (match_operand 0 memory_operand) -{ - return CONSTANT_P (XEXP (op, 0)); -}) + (and (match_operand 0 memory_operand) + (match_test CONSTANT_P (XEXP (op, 0) as we might to pessimize for Darwin UNSPEC_GOTPCREL at that point. In general there is still the question why this issue just happens for Darwin, but not for linux. For linux that gotpcrel-code path seems not to be hit at all (at least is that what Ians told). Oh, this part doesn't change any functionality at all. The predicate is just written in a different way. Uros.
Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
The following change in predicates.md seems to be a bit premature. There is still the point about Darwin's PIC issue for unspec-gotpcrel. The change is indeed incompatible with the patch in pr61387 comment 9. And without it the failures are back!-( Kai, what is wrong with Iain's patch in comment 4? TIA Dominique
Re: [PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group
On 06/18/14 04:42, Richard Biener wrote: The following aims at reducing the number of pointless passes we run on functions containing no loops. Those are at least two copyprop and one dce pass (two dce passes when vectorization is enabled, three dce passes and an additional copyprop pass when any graphite optimization is enabled). Simply gating pass_tree_loop on number_of_loops () 1 would disable basic-block vectorization on loopless functions. Moving basic-block vectorization out of pass_tree_loop works to the extent that you'd need to move IVOPTs as well as data-ref analysis cannot cope with TARGET_MEM_REFs. So the following introduces a pass_tree_no_loop pass group which is enabled whenever the pass_tree_loop group is disabled. As followup this would allow to skip cleanup work we do after the loop pipeline just to cleanup after it. Any comments? Does such followup sound realistic or would it be better to take the opportunity to move IVOPTs a bit closer to RTL expansion and avoid that pass_tree_no_loop hack? Sounds good. I've always believed that each pass should be bubbling back up some kind of status about what it did/found as well. It was more of an RTL issue, but we had a certain commercial testsuite which created large loopless tests (*) that consumed vast quantities of wall clock time. I always wanted the RTL loop passes to signal back to toplev.c that no loops were found, which would in turn be used to say we really don't need cse-after-loop and friends. It's certainly more complex these days, but I'd still like to be able to do such things. Regardless, that's well outside the scope of what you're trying to accomplish. * Those tests consistently found port bugs, so we really didn't want to disable them. jeff
Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
2014-06-18 15:11 GMT+02:00 Dominique Dhumieres domi...@lps.ens.fr: The following change in predicates.md seems to be a bit premature. There is still the point about Darwin's PIC issue for unspec-gotpcrel. The change is indeed incompatible with the patch in pr61387 comment 9. And without it the failures are back!-( Kai, what is wrong with Iain's patch in comment 4? TIA Dominique Well, I don't see there patch, just a comment about a possible code-change. I think it is getting fixed at wrong place. Reverting fnaddr back to its original seems to be wrong solution here. It would be better to do something in the lines 'if (!flag_pic || targetm.binds_local_p (function) || TARGET_PECOFF)' instead. Why shall we make it here an unspec, and later on revert that change? If Darwin isn't able to handle such an UNSPEC_GOTPCREL as address for a sibcall pattern, we should avoid it in general, and not just papering over at place of mi__tunk. Kai
Re: [patch] improve sloc assignment on bind_expr entry/exit code
On 06/18/14 01:42, Olivier Hainque wrote: Hi Jeff, On Jun 17, 2014, at 22:42 , Jeff Law l...@redhat.com wrote: * tree-core.h (tree_block): Add an end_locus field, allowing memorization of the end of block source location. * tree.h (BLOCK_SOURCE_END_LOCATION): New accessor. * gimplify.c (gimplify_bind_expr): Propagate the block start and end source location info we have on the block entry/exit code we generate. OK. Great, thanks! :-) I assume y'all will add a suitable test to the Ada testsuite and propagate it into the GCC testsuite in due course? Yes, I will. At the patch submission time, I was unclear on what dejagnu device was available to setup a reliable testing protocol for this kind of issue and I was interested in getting feedback on the patch contents first. ISTM that dg-scan-asm for the expected extra .loc's would work, maybe restricted to some target we know produces .loc directives. Sounds appropriate ? Yea, that should be fine. Most folks test x86-64 linux, so that's going to get you the widest net for coverage. jeff
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 18 June 2014 11:01, Ramana Radhakrishnan ramana@googlemail.com wrote: On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis charles.bay...@linaro.org wrote: This patch adds support for post-indexed addressing for NEON structure memory accesses. For example VLD1.8 {d0}, [r0], r1 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. Ok for trunk? This is OK. Committed as r211783.
Re: [patch] improve sloc assignment on bind_expr entry/exit code
On Jun 18, 2014, at 15:48 , Jeff Law l...@redhat.com wrote: ISTM that dg-scan-asm for the expected extra .loc's would work, maybe restricted to some target we know produces .loc directives. Sounds appropriate ? Yea, that should be fine. Most folks test x86-64 linux, so that's going to get you the widest net for coverage. OK, patch test checked in. Thanks again for your feedback. Cheers, Olivier
Re: [PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux
On Tue, Jun 17, 2014 at 4:48 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: William J. Schmidt wschm...@linux.vnet.ibm.com writes: Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 === --- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (revision 211741) +++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-require-effective-target fortran_large_real } -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } } +! { dg-do run { xfail powerpc*-apple-darwin* } } ! Test XFAILed on these platforms because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685 You should also update the comment: `these platforms' no longer applies. Rainer Yes, okay, with the grammar fix also. Thanks, David
Re: [PATCH 3/9] Optimise __aeabi_uldivmod (stack manipulation)
On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer manipulation. OK. R. --- libgcc/config/arm/bpabi.S | 54 +-- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index ae76cd3..67246b0 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -120,6 +120,46 @@ ARM_FUNC_START aeabi_ulcmp #endif .endm +/* we can use STRD/LDRD on v5TE and later, and any Thumb-2 architecture. */ +#if (defined(__ARM_EABI__)\ + (defined(__thumb2__) \ + || (__ARM_ARCH = 5 defined(__TARGET_FEATURE_DSP +#define CAN_USE_LDRD 1 +#else +#define CAN_USE_LDRD 0 +#endif + +/* set up stack from for call to __udivmoddi4. At the end of the macro the + stack is arranged as follows: + sp+12 / space for remainder + sp+8\ (written by __udivmoddi4) + sp+4lr + sp+0sp+8 [rp (remainder pointer) argument for __udivmoddi4] + + */ +.macro push_for_divide fname +#if defined(__thumb2__) CAN_USE_LDRD + sub ip, sp, #8 + strdip, lr, [sp, #-16]! +#else + sub sp, sp, #8 + do_push {sp, lr} +#endif +98: cfi_push98b - \fname, 0xe, -0xc, 0x10 +.endm + +/* restore stack */ +.macro pop_for_divide + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrdr2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif +.endm + #ifdef L_aeabi_ldivmod /* Perform 64 bit signed division. @@ -165,18 +205,10 @@ ARM_FUNC_START aeabi_uldivmod cfi_start __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod) test_div_by_zerounsigned - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push{ip, lr} -#else - do_push {sp, lr} -#endif -98: cfi_push 98b - __aeabi_uldivmod, 0xe, -0xc, 0x10 + push_for_divide __aeabi_uldivmod + /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__gnu_uldivmod_helper) __PLT__ - ldr lr, [sp, #4] - add sp, sp, #8 - do_pop {r2, r3} + pop_for_divide RET cfi_end LSYM(Lend_aeabi_uldivmod)
Re: [PATCH 4/9] Optimise __aeabi_uldivmod
On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. OK. R. --- libgcc/config/arm/bpabi.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 67246b0..927e37f 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -207,7 +207,7 @@ ARM_FUNC_START aeabi_uldivmod push_for_divide __aeabi_uldivmod /* arguments in (r0:r1), (r2:r3) and *sp */ - bl SYM(__gnu_uldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ pop_for_divide RET cfi_end LSYM(Lend_aeabi_uldivmod)
Re: [PATCH 5/9] Optimise __aeabi_ldivmod (stack manipulation)
On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation. OK. R. --- libgcc/config/arm/bpabi.S | 14 +++--- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 927e37f..3f9ece5 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -174,18 +174,10 @@ ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) test_div_by_zerosigned - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push{ip, lr} -#else - do_push {sp, lr} -#endif -98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + push_for_divide __aeabi_ldivmod + /* arguments in (r0:r1), (r2:r3) and *sp */ bl SYM(__gnu_ldivmod_helper) __PLT__ - ldr lr, [sp, #4] - add sp, sp, #8 - do_pop {r2, r3} + pop_for_divide RET cfi_end LSYM(Lend_aeabi_ldivmod)
Re: [PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian
On Tue, Jun 17, 2014 at 6:44 PM, BIll Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9. This exposes a bug on PowerPC little endian for extracting an element from a V4SF value that goes back to 4.8. The following patch fixes the problem. Tested on powerpc64le-unknown-linux-gnu with no regressions. Ok to commit to trunk? I would also like to commit to 4.8 and 4.9 as soon as possible to be picked up by the distros. This is okay everywhere. I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to provide regression coverage. You should ask Bernd and the RMs. Was the bug fix that prompted the new testcase backported to all targets? Thanks, David
Re: [PATCH 6/9] Optimise __aeabi_ldivmod
On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using __udivmoddi4, and fixups for negative operands. OK. --- libgcc/config/arm/bpabi.S | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 3f9ece5..c044167 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -175,10 +175,49 @@ ARM_FUNC_START aeabi_ldivmod test_div_by_zerosigned push_for_divide __aeabi_ldivmod + cmp xxh, #0 + blt 1f + cmp yyh, #0 + blt 2f + /* arguments in (r0:r1), (r2:r3) and *sp */ + bl SYM(__udivmoddi4) __PLT__ + pop_for_divide + RET + +1: /* xxh:xxl is negative */ + negsxxl, xxl + sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + cmp yyh, #0 + blt 3f + /* arguments in (r0:r1), (r2:r3) and *sp */ + bl SYM(__udivmoddi4) __PLT__ + pop_for_divide + negsxxl, xxl + sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + RET + +2: /* only yyh:yyl is negative */ + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + /* arguments in (r0:r1), (r2:r3) and *sp */ + bl SYM(__udivmoddi4) __PLT__ + pop_for_divide + negsxxl, xxl + sbc xxh, xxh, xxh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ + RET + +3: /* both xxh:xxl and yyh:yyl are negative */ + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ /* arguments in (r0:r1), (r2:r3) and *sp */ - bl SYM(__gnu_ldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ pop_for_divide + negsyyl, yyl + sbc yyh, yyh, yyh, lsl #1 /* Thumb-2 has no RSC, so use X - 2X */ RET + cfi_end LSYM(Lend_aeabi_ldivmod) #endif /* L_aeabi_ldivmod */
Re: [PATCH 8/9] Use __udivmoddi4 for v6M aeabi_uldivmod
On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using __udivmoddi4. OK. R. --- libgcc/config/arm/bpabi-v6m.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S index 0bf2e55..d549fa6 100644 --- a/libgcc/config/arm/bpabi-v6m.S +++ b/libgcc/config/arm/bpabi-v6m.S @@ -148,7 +148,7 @@ FUNC_START aeabi_uldivmod mov r0, sp push {r0, lr} ldr r0, [sp, #8] - bl SYM(__gnu_uldivmod_helper) + bl SYM(__udivmoddi4) ldr r3, [sp, #4] mov lr, r3 add sp, sp, #8
Re: [PATCH 9/9] Remove __gnu_uldivmod_helper
On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove. OK. R. --- libgcc/config/arm/bpabi.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c index 7b155cc..e90d044 100644 --- a/libgcc/config/arm/bpabi.c +++ b/libgcc/config/arm/bpabi.c @@ -26,9 +26,6 @@ extern long long __divdi3 (long long, long long); extern unsigned long long __udivdi3 (unsigned long long, unsigned long long); extern long long __gnu_ldivmod_helper (long long, long long, long long *); -extern unsigned long long __gnu_uldivmod_helper (unsigned long long, - unsigned long long, - unsigned long long *); long long @@ -43,14 +40,3 @@ __gnu_ldivmod_helper (long long a, return quotient; } -unsigned long long -__gnu_uldivmod_helper (unsigned long long a, -unsigned long long b, -unsigned long long *remainder) -{ - unsigned long long quotient; - - quotient = __udivdi3 (a, b); - *remainder = a - b * quotient; - return quotient; -}
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 17 Jun 21:22, Bernd Schmidt wrote: On 06/17/2014 08:20 PM, Ilya Verbin wrote: I don't get this part of the plan. Where a host compiler will look for mkoffloads? E.g., first I configure/make/install the target gcc and corresponding mkoffload with the following options: --enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux --prefix=/install_gcc/accel_intelmic Next I configure/make/install the host gcc with: --enable-accelerator=intelmic --prefix=/install_gcc/host Try using the same prefix for both. I tried to do: 1. --enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-intelmic-linux-gnu --prefix=/install_gcc/both 2. --enable-accelerator=intelmic --prefix=/install_gcc/both In this case only bin/x86_64-intelmic-linux-gnu-accel-intelmic-gcc from accel compiler is saved. All other binaries in bin, lib, lib64, libexec are replaced by host's ones. Is there a way to have 2 working compilers and libs in the same prefix? Thanks, -- Ilya
Re: [PATCH, ARM] Enable fuse-caller-save for ARM
On Sun, Jun 1, 2014 at 12:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, This patch: - adds the for TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS required clobbers in CALL_INSN_FUNCTION_USAGE, - sets TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true, which enables the fuse-caller-save optimisation, and - adds an arm fuse-caller-save test-case. Build and tested on arm-linux-gnueabi. OK for trunk? + /* For AAPCS, IP and CC can be clobbered by veneers inserted by the + linker. We need to add these to allow + arm_call_fusage_contains_non_callee_clobbers to return true. */ Please reindent so that arm_call_fusage is on the 2nd line. Otherwise ok if no regressions. regards Ramana Thanks, - Tom
[PATCH, Testsuite, AArch64] Make Function Return Value Test More Robust
Hi, This improves the robustness of the aapcs64 test framework for testing function return ABI rules. It ensures the test facility functions now able to see the exact content of return registers right at the moment when a function returns. OK for trunk? Thanks, Yufeng gcc/testsuite Make the AAPCS64 function return tests more robust. * gcc.target/aarch64/aapcs64/abitest-2.h (saved_return_address): New global variable. (FUNC_VAL_CHECK): Update to call myfunc via the 'ret' instruction, instead of calling sequentially in the C code. * gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Store saved_return_address to the stack frame where LR register was stored. (saved_return_address): Declare weak.diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h index c56e7cc..c87fe9b 100644 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h @@ -5,6 +5,7 @@ #include validate_memory.h void (*testfunc_ptr)(char* stack); +unsigned long long saved_return_address; /* Helper macros to generate function name. Example of the function name: func_return_val_1. */ @@ -71,6 +72,17 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, double d, type t) \ optimized away. Using i and d prevents \ warnings about unused parameters. \ */ \ +/* We save and set up the LR register in a way that essentially \ + inserts myfunc () between the returning of this function and the \ + continueous execution of its caller. By doing this, myfunc ()\ + can save and check the exact content of the registers that are\ + used forthe function return value. \ + The previous approach of sequentially calling myfunc right after \ + this function does not guarantee myfunc see the exact register\ + content, as compiler mayemit code in between the two calls, \ + especially during the -O0 codegen. */\ +asm volatile (mov %0, x30 : =r (saved_return_address)); \ +asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc)); \ return t;\ } #include TESTFILE @@ -84,7 +96,8 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, double d, type t) \ {\ testfunc_ptr = TEST_FUNC_NAME(id); \ FUNC_NAME(id) (0, 0.0, var); \ -myfunc (); \ +/* The above function implicitly calls myfunc () on its return,\ + and the execution resumes from here after myfunc () finishes. */\ } int main() diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S index 86ce7be..68845fb 100644 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S @@ -50,6 +50,10 @@ LABEL_TEST_FUNC_RETURN: add x9, x9, :lo12:testfunc_ptr ldr x9, [x9, #0] blr x9 // function return value test + adrp x9, saved_return_address + add x9, x9, :lo12:saved_return_address + ldr x9, [x9, #0] + str x9, [sp, #8]// Update the copy of LR reg saved on stack LABEL_RET: ldp x0, x30, [sp] mov sp, x0 @@ -57,3 +61,4 @@ LABEL_RET: .weak testfunc .weak testfunc_ptr +.weak saved_return_address
[PATCH] rs6000: Make cr5 allocatable
A comment in rs6000.h says cr5 is not supposed to be used. I checked all ABIs, going as far back as PowerOpen (1994), and found no mention of this. Also document cr6 is used by some vector instructions. Tested on powerpc64-linux, no regressions. Okay to apply? Segher 2014-06-18 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/rs6000.h (FIXED_REGISTERS): Update comment. Remove cr5. (REG_ALLOC_ORDER): Update comment. Move cr5 earlier. --- gcc/config/rs6000/rs6000.h | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3bd0104..569ae2d 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -978,8 +978,6 @@ enum data_align { align_abi, align_opt, align_both }; On RS/6000, r1 is used for the stack. On Darwin, r2 is available as a local register; for all other OS's r2 is the TOC pointer. - cr5 is not supposed to be used. - On System V implementations, r13 is fixed and not available for use. */ #define FIXED_REGISTERS \ @@ -987,7 +985,7 @@ enum data_align { align_abi, align_opt, align_both }; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ - 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, \ + 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, \ /* AltiVec registers. */ \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ @@ -1048,7 +1046,8 @@ enum data_align { align_abi, align_opt, align_both }; fp13 - fp2 (not saved; incoming fp arg registers) fp1 (not saved; return value) fp31 - fp14 (saved; order given to save least number) - cr7, cr6(not saved or special) + cr7, cr5(not saved or special) + cr6 (not saved, but used for vector operations) cr1 (not saved, but used for FP operations) cr0 (not saved, but used for arithmetic operations) cr4, cr3, cr2 (saved) @@ -1061,7 +1060,7 @@ enum data_align { align_abi, align_opt, align_both }; r12 (not saved; if used for DImode or DFmode would use r13) ctr (not saved; when we have the choice ctr is better) lr (saved) - cr5, r1, r2, ap, ca (fixed) + r1, r2, ap, ca (fixed) v0 - v1 (not saved or used for anything) v13 - v3(not saved; incoming vector arg registers) v2 (not saved; incoming vector arg reg; return value) @@ -1099,14 +1098,14 @@ enum data_align { align_abi, align_opt, align_both }; 33, \ 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, \ 50, 49, 48, 47, 46, \ - 75, 74, 69, 68, 72, 71, 70, \ + 75, 73, 74, 69, 68, 72, 71, 70, \ MAYBE_R2_AVAILABLE \ 9, 10, 8, 7, 6, 5, 4, \ 3, EARLY_R12 11, 0, \ 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, \ 18, 17, 16, 15, 14, 13, LATE_R12\ 66, 65, \ - 73, 1, MAYBE_R2_FIXED 67, 76, \ + 1, MAYBE_R2_FIXED 67, 76, \ /* AltiVec registers. */ \ 77, 78, \ 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, \ -- 1.8.1.4
Re: [PATCH, Testsuite, AArch64] Make Function Return Value Test More Robust
On 18/06/14 15:16, Yufeng Zhang wrote: Hi, This improves the robustness of the aapcs64 test framework for testing function return ABI rules. It ensures the test facility functions now able to see the exact content of return registers right at the moment when a function returns. OK for trunk? OK once the issues with the comment are clarified. R. Thanks, Yufeng gcc/testsuite Make the AAPCS64 function return tests more robust. * gcc.target/aarch64/aapcs64/abitest-2.h (saved_return_address): New global variable. (FUNC_VAL_CHECK): Update to call myfunc via the 'ret' instruction, instead of calling sequentially in the C code. * gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Store saved_return_address to the stack frame where LR register was stored. (saved_return_address): Declare weak. patch diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h index c56e7cc..c87fe9b 100644 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h @@ -5,6 +5,7 @@ #include validate_memory.h void (*testfunc_ptr)(char* stack); +unsigned long long saved_return_address; /* Helper macros to generate function name. Example of the function name: func_return_val_1. */ @@ -71,6 +72,17 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, double d, type t) \ optimized away. Using i and d prevents \ warnings about unused parameters. \ */ \ +/* We save and set up the LR register in a way that essentially\ + inserts myfunc () between the returning of this function and the \ s/returning/return/ + continueous execution of its caller. By doing this, myfunc () \ Typo: continueous. Do you mean continuing? + can save and check the exact content of the registers that are \ + used for the function return value. \ stray tab. + The previous approach of sequentially calling myfunc right after \ + this function does not guarantee myfunc see the exact register \ + content, as compiler may emit code in between the two calls, \ Similarly. + especially during the -O0 codegen. */ \ +asm volatile (mov %0, x30 : =r (saved_return_address)); \ +asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc)); \ return t; \ } #include TESTFILE @@ -84,7 +96,8 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, double d, type t) \ { \ testfunc_ptr = TEST_FUNC_NAME(id); \ FUNC_NAME(id) (0, 0.0, var); \ -myfunc (); \ +/* The above function implicitly calls myfunc () on its return, \ + and the execution resumes from here after myfunc () finishes. */\ } int main() diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S index 86ce7be..68845fb 100644 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S @@ -50,6 +50,10 @@ LABEL_TEST_FUNC_RETURN: addx9, x9, :lo12:testfunc_ptr ldrx9, [x9, #0] blrx9 // function return value test + adrp x9, saved_return_address + addx9, x9, :lo12:saved_return_address + ldrx9, [x9, #0] + strx9, [sp, #8]// Update the copy of LR reg saved on stack LABEL_RET: ldpx0, x30, [sp] movsp, x0 @@ -57,3 +61,4 @@ LABEL_RET: .weaktestfunc .weaktestfunc_ptr +.weaksaved_return_address
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 18 June 2014 11:06, Ramana Radhakrishnan ramana@googlemail.com wrote: 2014-06-15 Charles Baylis charles.ba...@linaro.org * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with embedded side effects. I'm not too thrilled with putting in more special cases that are not table driven in there. Can you file a PR with some testcases that show this so that we don't forget and CC me on it please ? I created PR61551 and CC'd.
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 06/18/2014 04:13 PM, Ilya Verbin wrote: On 17 Jun 21:22, Bernd Schmidt wrote: On 06/17/2014 08:20 PM, Ilya Verbin wrote: I don't get this part of the plan. Where a host compiler will look for mkoffloads? E.g., first I configure/make/install the target gcc and corresponding mkoffload with the following options: --enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux --prefix=/install_gcc/accel_intelmic Next I configure/make/install the host gcc with: --enable-accelerator=intelmic --prefix=/install_gcc/host Try using the same prefix for both. I tried to do: 1. --enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-intelmic-linux-gnu --prefix=/install_gcc/both 2. --enable-accelerator=intelmic --prefix=/install_gcc/both In this case only bin/x86_64-intelmic-linux-gnu-accel-intelmic-gcc from accel compiler is saved. All other binaries in bin, lib, lib64, libexec are replaced by host's ones. Is there a way to have 2 working compilers and libs in the same prefix? Sure, as long as the target triplet is different. What I think you need to do is For the first compiler: --enable-as-accelerator-for=x86_64-pc-linux-gnu --target=x86_64-intelmic-linux-gnu --prefix=/somewhere Build and install, then: For the second: configure --enable-offload-targets=x86_64-intelmic-linux-gnu x86_64-pc-linux-gnu --prefix=/somewhere No --enable-accelerator options at all. This should work, if it doesn't let me know what you find in /somewhere after installation for both compilers. Bernd
Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64
On 1 June 2014 11:00, Tom de Vries tom_devr...@mentor.com wrote: Richard, This patch: - adds the for TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS required clobbers in CALL_INSN_FUNCTION_USAGE, - sets TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true, which enables the fuse-caller-save optimisation, and - adds an aarch64 fuse-caller-save test-case. Build and tested on aarch64-linux-gnu. OK for trunk? Thanks, - Tom OK /Marcus
Formatting fixes for (gccint) Standard Names
Tested with make info and installed as obvious. Andreas. * doc/md.texi (Standard Names): Use @itemx for grouped items. Remove blank line after @item. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index e17ffca..1c3a326 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4835,7 +4835,7 @@ and the scalar result is stored in the least significant bits of operand 0 @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} @cindex @code{udot_prod@var{m}} instruction pattern -@item @samp{udot_prod@var{m}} +@itemx @samp{udot_prod@var{m}} Compute the sum of the products of two signed/unsigned elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or @@ -4845,7 +4845,7 @@ is of the same mode as operand 3. @cindex @code{ssum_widen@var{m3}} instruction pattern @item @samp{ssum_widen@var{m3}} @cindex @code{usum_widen@var{m3}} instruction pattern -@item @samp{usum_widen@var{m3}} +@itemx @samp{usum_widen@var{m3}} Operands 0 and 2 are of the same mode, which is wider than the mode of operand 1. Add operand 1 to operand 2 and place the widened result in operand 0. (This is used express accumulation of elements into an accumulator @@ -6218,7 +6218,6 @@ A typical @code{ctrap} pattern looks like @cindex @code{prefetch} instruction pattern @item @samp{prefetch} - This pattern, if defined, emits code for a non-faulting data prefetch instruction. Operand 0 is the address of the memory to prefetch. Operand 1 is a constant 1 if the prefetch is preparing for a write to the memory @@ -6234,7 +6233,6 @@ the values of operands 1 and 2. @cindex @code{blockage} instruction pattern @item @samp{blockage} - This pattern defines a pseudo insn that prevents the instruction scheduler and other passes from moving instructions and using register equivalences across the boundary defined by the blockage insn. @@ -6242,7 +6240,6 @@ This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. @cindex @code{memory_barrier} instruction pattern @item @samp{memory_barrier} - If the target memory model is not fully synchronous, then this pattern should be defined to an instruction that orders both loads and stores before the instruction with respect to loads and stores after the instruction. @@ -6250,7 +6247,6 @@ This pattern has no operands. @cindex @code{sync_compare_and_swap@var{mode}} instruction pattern @item @samp{sync_compare_and_swap@var{mode}} - This pattern, if defined, emits code for an atomic compare-and-swap operation. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the ``old'' value to be compared against the @@ -6299,7 +6295,6 @@ interruptable locking. @item @samp{sync_add@var{mode}}, @samp{sync_sub@var{mode}} @itemx @samp{sync_ior@var{mode}}, @samp{sync_and@var{mode}} @itemx @samp{sync_xor@var{mode}}, @samp{sync_nand@var{mode}} - These patterns emit code for an atomic operation on memory. Operand 0 is the memory on which the atomic operation is performed. Operand 1 is the second operand to the binary operator. @@ -6321,7 +6316,6 @@ from a compare-and-swap operation, if defined. @item @samp{sync_old_add@var{mode}}, @samp{sync_old_sub@var{mode}} @itemx @samp{sync_old_ior@var{mode}}, @samp{sync_old_and@var{mode}} @itemx @samp{sync_old_xor@var{mode}}, @samp{sync_old_nand@var{mode}} - These patterns emit code for an atomic operation on memory, and return the value that the memory contained before the operation. Operand 0 is the result value, operand 1 is the memory on which the @@ -6345,14 +6339,12 @@ from a compare-and-swap operation, if defined. @item @samp{sync_new_add@var{mode}}, @samp{sync_new_sub@var{mode}} @itemx @samp{sync_new_ior@var{mode}}, @samp{sync_new_and@var{mode}} @itemx @samp{sync_new_xor@var{mode}}, @samp{sync_new_nand@var{mode}} - These patterns are like their @code{sync_old_@var{op}} counterparts, except that they return the value that exists in the memory location after the operation, rather than before the operation. @cindex @code{sync_lock_test_and_set@var{mode}} instruction pattern @item @samp{sync_lock_test_and_set@var{mode}} - This pattern takes two forms, based on the capabilities of the target. In either case, operand 0 is the result of the operand, operand 1 is the memory on which the atomic operation is performed, and operand 2 @@ -6377,7 +6369,6 @@ a compare-and-swap operation, if defined. @cindex @code{sync_lock_release@var{mode}} instruction pattern @item @samp{sync_lock_release@var{mode}} - This pattern, if defined, releases a lock set by @code{sync_lock_test_and_set@var{mode}}. Operand 0 is the memory that contains the lock; operand 1 is the value to store in the lock. @@ -6467,7 +6458,6 @@ compare and swap loop. @item @samp{atomic_add@var{mode}}, @samp{atomic_sub@var{mode}} @itemx @samp{atomic_or@var{mode}}, @samp{atomic_and@var{mode}}
Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
On Wed, Jun 18, 2014 at 12:55:01PM +0800, Thomas Preud'homme wrote: @@ -1646,20 +1648,23 @@ do_shift_rotate (enum tree_code code, n-n = count; break; case RSHIFT_EXPR: + /* Arithmetic shift of signed type: result is dependent on the value. */ + if (!TYPE_UNSIGNED (n-type) (n-n (0xff (bitsize - 8 + return false; Looks like here an undefined behavior happens: tree-ssa-math-opts.c:1672:53: runtime error: shift exponent 56 is too large for 32-bit type 'int' Marek
Re: [PATCH 1/9] Whitespace
On 11 June 2014 13:55, Richard Earnshaw rearn...@arm.com wrote: On 11/06/14 11:19, Charles Baylis wrote: 2014-05-22 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace. (__aeabi_ldivmod): Fix whitespace. This is OK, but please wait until the others are ready to go in. The series is now committed as r211789-r211797.
Re: [PATCH, PR 61540] Do not ICE on impossible devirtualization
On 18 June 2014 10:24:16 Martin Jambor mjam...@suse.cz wrote: @@ -3002,10 +3014,8 @@ try_make_edge_direct_virtual_call (struct cgraph_edge *ie, if (target) { -#ifdef ENABLE_CHECKING - gcc_assert (possible_polymorphic_call_target_p -(ie, cgraph_get_node (target))); -#endif + if (!possible_polymorphic_call_target_p (ie, cgraph_get_node (target))) + return ipa_make_edge_direct_to_target (ie, target); return ipa_make_edge_direct_to_target (ie, target); } The above looks odd. You return the same thing both conditionally and unconditionally? Thanks, else Sent with AquaMail for Android http://www.aqua-mail.com
[PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing
Hi, This patch improves the code-gen of -marm in the case of two-dimensional array access. Given the following code: typedef struct { int x,y,a,b; } X; int f7a(X p[][4], int x, int y) { return p[x][y].a; } The code-gen on -O2 -marm -mcpu=cortex-a15 is currently mov r2, r2, asl #4 add r1, r2, r1, asl #6 add r0, r0, r1 ldr r0, [r0, #8] bx lr With the patch, we'll get: add r1, r0, r1, lsl #6 add r2, r1, r2, lsl #4 ldr r0, [r2, #8] bx lr The -mthumb code-gen had been OK. The patch has passed the bootstrapping on cortex-a15 and the arm-none-eabi regtest, with no code-gen difference in spec2k (unfortunately). OK for the trunk? Thanks, Yufeng gcc/ * config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration and new function. (arm_legitimize_address): Call the new functions. (thumb_legitimize_address): Prefix the declaration with static. gcc/testsuite/ * gcc.target/arm/shifted-add-1.c: New test. * gcc.target/arm/shifted-add-2.c: Ditto.
Re: [PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing
This time with patch... Apologize. Yufeng On 06/18/14 17:31, Yufeng Zhang wrote: Hi, This patch improves the code-gen of -marm in the case of two-dimensional array access. Given the following code: typedef struct { int x,y,a,b; } X; int f7a(X p[][4], int x, int y) { return p[x][y].a; } The code-gen on -O2 -marm -mcpu=cortex-a15 is currently mov r2, r2, asl #4 add r1, r2, r1, asl #6 add r0, r0, r1 ldr r0, [r0, #8] bx lr With the patch, we'll get: add r1, r0, r1, lsl #6 add r2, r1, r2, lsl #4 ldr r0, [r2, #8] bx lr The -mthumb code-gen had been OK. The patch has passed the bootstrapping on cortex-a15 and the arm-none-eabi regtest, with no code-gen difference in spec2k (unfortunately). OK for the trunk? Thanks, Yufeng gcc/ * config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration and new function. (arm_legitimize_address): Call the new functions. (thumb_legitimize_address): Prefix the declaration with static. gcc/testsuite/ * gcc.target/arm/shifted-add-1.c: New test. * gcc.target/arm/shifted-add-2.c: Ditto. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 16fc7ed..281c96a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -88,6 +88,7 @@ static int thumb1_base_register_rtx_p (rtx, enum machine_mode, int); static rtx arm_legitimize_address (rtx, rtx, enum machine_mode); static reg_class_t arm_preferred_reload_class (rtx, reg_class_t); static rtx thumb_legitimize_address (rtx, rtx, enum machine_mode); +static void arm_reassoc_shifts_in_address (rtx); inline static int thumb1_index_register_rtx_p (rtx, int); static bool arm_legitimate_address_p (enum machine_mode, rtx, bool); static int thumb_far_jump_used_p (void); @@ -7501,7 +7502,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode) { /* TODO: legitimize_address for Thumb2. */ if (TARGET_THUMB2) -return x; + return x; + return thumb_legitimize_address (x, orig_x, mode); } @@ -7551,6 +7553,9 @@ arm_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode) } else if (xop0 != XEXP (x, 0) || xop1 != XEXP (x, 1)) x = gen_rtx_PLUS (SImode, xop0, xop1); + + if (GET_CODE (xop0) == PLUS) + arm_reassoc_shifts_in_address (xop0); } /* XXX We don't allow MINUS any more -- see comment in @@ -7614,7 +7619,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode) /* Try machine-dependent ways of modifying an illegitimate Thumb address to be legitimate. If we find one, return the new, valid address. */ -rtx + +static rtx thumb_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode) { if (GET_CODE (x) == PLUS @@ -7679,6 +7685,47 @@ thumb_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode) return x; } +/* Transform + PLUS (PLUS (MULT1, MULT2), REG) + to + PLUS (PLUS (MULT1, REG), MULT2) + so that we can use two add (shifted register) instructions + to compute the expression. Note that SHIFTs has already + been replaced with MULTs as a result of canonicalization. + + This routine is to help undo the undesired canonicalization + that is done by simplify_gen_binary on addresses with + multiple shifts. For example, it will help transform + (x 6) + (y 4) + p + 8 + back to: + (x 6) + p + (y 4) + 8 + where p is the start address of a two-dimensional array and + x and y are the indexes. */ + +static void +arm_reassoc_shifts_in_address (rtx x) +{ + if (GET_CODE (x) == PLUS) +{ + rtx op0 = XEXP (x, 0); + rtx op1 = XEXP (x, 1); + + if (GET_CODE (op0) == PLUS REG_P (op1)) + { + rtx xop0 = XEXP (op0, 0); + rtx xop1 = XEXP (op0, 1); + + if (GET_CODE (xop0) == MULT GET_CODE (xop1) == MULT + power_of_two_operand (XEXP (xop0, 1), GET_MODE (xop0)) + power_of_two_operand (XEXP (xop1, 1), GET_MODE (xop1))) + { + XEXP (op0, 1) = op1; + XEXP (x, 1) = xop1; + } + } +} +} + bool arm_legitimize_reload_address (rtx *p, enum machine_mode mode, diff --git a/gcc/testsuite/gcc.target/arm/shifted-add-1.c b/gcc/testsuite/gcc.target/arm/shifted-add-1.c new file mode 100644 index 000..8777fe4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/shifted-add-1.c @@ -0,0 +1,47 @@ +/* { dg-do assemble } */ +/* { dg-options -O2 } */ + +typedef struct { int x,y,a,b; } x; + +int +f7a(x p[][4], int x, int y) +{ + return p[x][y].a; +} + +/* { dg-final { object-size text = 16 { target { { ! arm_thumb1 } { ! arm_thumb2 } } } } } */ +/* { dg-final { object-size text = 12 { target arm_thumb2 } } } */ + + +/* For arm code-gen, expect four instructions like: + + 0: e0801301add r1, r0,
RE: [PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian
Hi, On Wed, 18 Jun 2014 09:56:15, David Edelsohn wrote: On Tue, Jun 17, 2014 at 6:44 PM, BIll Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9. This exposes a bug on PowerPC little endian for extracting an element from a V4SF value that goes back to 4.8. The following patch fixes the problem. Tested on powerpc64le-unknown-linux-gnu with no regressions. Ok to commit to trunk? I would also like to commit to 4.8 and 4.9 as soon as possible to be picked up by the distros. This is okay everywhere. I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to provide regression coverage. You should ask Bernd and the RMs. Was the bug fix that prompted the new testcase backported to all targets? Thanks, David actually I only added the check_vect to that test case, but that exposed a bug on Solaris-9. See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=207668. That was in the -fdump-rtl-combine-details handling, where fprintf got a NULL value passed for %s, which ICEs on Solaris9. So if you backport that test case, be sure to check that one too. Originally the test case seems to check something for the aarch64-target. See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205712. Obviously the patch in rtlanal.c (set_noop_p) was never backported to the 4.8 branch. Maybe Tejas who originally wrote that test case, can explain, if it makes sense to backport this fix too. Thanks Bernd.
Re: [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO
On Jun 18, 2014, at 3:22 AM, Richard Biener richard.guent...@gmail.com wrote: Space after the *. I think you don't need to copy the LTO harness but you can simply use dg.exp and sth similar to gcc.dg/20081223-1.c (there is an effective target 'lto' to guard for lto support). So simply place the testcase in gcc.target/arm/ (make sure to put a dg-do compile on the 2nd file and use dg-additional-sources). If that doesn't work I'd say put the testcase in gcc.dg/lto/ instead and do a dg-skip-if for non-arm targets. Ok with one of those changes. Oh, I see you need a new object-readelf ... I defer to a testsuite maintainer for this part. The testsuite bits are Ok. My guidance on the test suite would be this, all lto test cases in .*lto directories. 20 or fewer test cases for a given target, in the main lto directory, more than 50, in the arm/lto directory. When one is tracking down bugs and trying to clean test suite results if they break, it is nice to be able to skip in mass all lto bugs first, and resolve all non-lto issues and then come back to the lto issues last, in hopes that they are all then resolved. Also, if one it redoing lto bits, and a test case with lto in the name pops up as a regression, and you’re not an lto person, you can stop thinking about it and just pass to the lto person, it is a slightly different mindset. :-)
Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
On Wed, Jun 18, 2014 at 05:23:13PM +0200, Marek Polacek wrote: On Wed, Jun 18, 2014 at 12:55:01PM +0800, Thomas Preud'homme wrote: @@ -1646,20 +1648,23 @@ do_shift_rotate (enum tree_code code, n-n = count; break; case RSHIFT_EXPR: + /* Arithmetic shift of signed type: result is dependent on the value. */ + if (!TYPE_UNSIGNED (n-type) (n-n (0xff (bitsize - 8 + return false; Looks like here an undefined behavior happens: tree-ssa-math-opts.c:1672:53: runtime error: shift exponent 56 is too large for 32-bit type 'int' Seems there are actually two spots with this, not just one. Completely untested fix: 2014-06-18 Jakub Jelinek ja...@redhat.com * tree-ssa-math-opts.c (do_shift_rotate, find_bswap_or_nop_1): Cast 0xff to uint64_t before shifting it up. --- gcc/tree-ssa-math-opts.c2014-06-13 08:08:42.354136356 +0200 +++ gcc/tree-ssa-math-opts.c2014-06-18 19:50:59.486916201 +0200 @@ -1669,7 +1669,8 @@ do_shift_rotate (enum tree_code code, break; case RSHIFT_EXPR: /* Arithmetic shift of signed type: result is dependent on the value. */ - if (!TYPE_UNSIGNED (n-type) (n-n (0xff (bitsize - 8 + if (!TYPE_UNSIGNED (n-type) + (n-n ((uint64_t) 0xff (bitsize - 8 return false; n-n = count; break; @@ -1903,7 +1904,7 @@ find_bswap_or_nop_1 (gimple stmt, struct old_type_size = TYPE_PRECISION (n-type); if (!TYPE_UNSIGNED (n-type) type_size old_type_size -n-n (0xff (old_type_size - 8))) +n-n ((uint64_t) 0xff (old_type_size - 8))) return NULL_TREE; if (type_size / BITS_PER_UNIT (int)(sizeof (int64_t))) Jakub
Re: [PATCH] rs6000: Make cr5 allocatable
On Jun 18, 2014, at 7:14 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: A comment in rs6000.h says cr5 is not supposed to be used. I checked all ABIs, I usually do a blame and find that change that added it… Ah, there it is, r341… let’s see, rms did it! :-) Oh well… never mind. Kinda amazing the bits lost in time.
[GSoC] Addition of ISL AST generation to Graphite
These patches add ISL AST generation to graphite, which can be chosen by the fgraphite-code-generator=[isl|cloog] switch. The first patch makes initial renaming of gloog and gloog_error to graphite_regenerate_ast_cloog and graphite_regenerate_error, respectively. The second one adds new files with generation of ISL AST, new switch, new testcase that checks that the dump is generated. Is it fine for trunk? P.S. My copyright assignment has been already processed. -- Cheers, Roman Gareev ChangeLog_entry1 Description: Binary data ChangeLog_entry2 Description: Binary data patch1 Description: Binary data patch2 Description: Binary data
Re: [GSoC] Addition of ISL AST generation to Graphite
On 18/06/2014 21:00, Roman Gareev wrote: These patches add ISL AST generation to graphite, which can be chosen by the fgraphite-code-generator=[isl|cloog] switch. The first patch makes initial renaming of gloog and gloog_error to graphite_regenerate_ast_cloog and graphite_regenerate_error, respectively. The second one adds new files with generation of ISL AST, new switch, new testcase that checks that the dump is generated. Is it fine for trunk? I went over this from the graphite side and it looks fine. However, as I did not commit for a while to gcc, it would be great if someone else could have a look. Cheers, Tobias
Re: [PATCH 1/5] New Identical Code Folding IPA pass
On 06/13/14 04:24, mliska wrote: You may ask, why the GNU GCC does need such a new optimization. The compiler, having simply better knowledge of a compiled source file, is capable of reaching better results, especially if Link-Time optimization is enabled. Apart from that, GCC implementation adds support for read-only variables like construction vtables (mentioned in: http://hubicka.blogspot.cz/2014/02/devirtualization-in-c-part-3-building.html). Can you outline at a high level cases where GCC's knowledge allows it to reach a better result? Is it because you're not requiring bit for bit identical code, but that the code merely be semantically equivalent? The GCC driven ICF seems to pick up 2X more opportunities than the gold driven ICF. But if I'm reading everything correctly, that includes ICF of both functions and variables. There are important differences between in-GCC ICF and gold's ICF. Basically - GCC ICF runs before most of context sensitive optimizations, so it does see code that is identical to start with, but would become different during optimization. For example if you have function a1...a1000 calling function b1b1000 where all bX are same, but all aX differs, then before inlining one can easily unify b and let inliner's heuristic decide whether it is good idea to duplicate body of b, while after inlining this is no longer possible. We don't do much in this respect, but we should try to unify accidental code duplication early in early passes to not let duplicates bubble until late optimizations where they may or may not be caught by i.e. tail merging This however also means that at least in current implementation it will result in somewhat more corruption of debug info (by replacing inline functions by different inline function with same body). - GCC ICF (doesn't in the current implementation) can do value numbering matching and match identical semantic with different implementation. It is the plan to get smarter here, I just wanted to have something working first and then play with more advanced tricks. - GCC ICF sees some things as different while they are not in final assembly. Types, alias classes and other details that are important for GCC but lost in codegen. So here gold can do better work. - Theoretically, if tuned well, GCC ICF could improve compilation speed by avoiding need to optimize duplicates. - Gold's ICF depends on functions sections that are not free. - GCC ICF can be smarter about objects with address taken: we need analysis deciding when the address can be compared with a different address. This would be useful on other places, too. Honza Do you have any sense of how those improvements break down? ie, is it mostly more function's you're finding as identical, and if so what is it about the GCC implementation that allows us to find more ICF opportunities. If it's mostly variables, that's fine too. I'm just trying to understand where the improvements are coming from. Jeff
Re: [patch i386]: Combine memory and indirect jump
On 06/17/14 14:35, Kai Tietz wrote: I just did retest my testcase with recent source. I can't reproduce this missed optimization before sched2 pass anymore. I moved second peephole2 pass just before split_before_sched2 and everything got caught. Let's go with this if your idea of using a define_split doesn't work out. To remove first peephole2 pass seems to cause weaker code for impossible pushes, etc OK. Nevertheless it might be a point to make this new peephole instead a define_split? I admit that this operation isn't a split, nevertheless we would avoid a second peephole pass. Doesn't hurt to try and as you say, if we can avoid a 2nd peep2 pass, that's good. jeff
Re: [PATCH] rs6000: Make cr5 allocatable
On Wed, Jun 18, 2014 at 10:14 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: A comment in rs6000.h says cr5 is not supposed to be used. I checked all ABIs, going as far back as PowerOpen (1994), and found no mention of this. Also document cr6 is used by some vector instructions. Tested on powerpc64-linux, no regressions. Okay to apply? Segher 2014-06-18 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/rs6000.h (FIXED_REGISTERS): Update comment. Remove cr5. (REG_ALLOC_ORDER): Update comment. Move cr5 earlier. This is okay. I have no idea why RMS assumed that cr5 is fixed. Thanks, David
RE: RFA: Make LRA temporarily eliminate addresses before testing constraints
On 2014-06-16, 12:12 PM, Robert Suchanek wrote: Pinging for approval. This part of the patch will be needed for MIPS16. The second part to enable LRA in MIPS has been already approved. Sorry, Robert. I thought you are waiting for some Richard's comment (actually he knows the code well and wrote address decoding in rtlanal.c). The patch is ok for me and makes LRA even more portable as it adds a new profitable address transformation and the code can be useful for other targets too. Thanks. Core LRA change committed as: r211802 MIPS LRA committed as: r211805 Matthew
[PATCH] dwarf2out.c: Pass DWARF type modifiers around as flags argument.
modified_type_die and add_type_attribute take two separate arguments for whether the type should be const and/or volatile. To help add more type modifiers pass the requested modifiers as one flag value to these functions. And introduce helper functions dw_mod_type_flags and dw_mod_decl_flags to easily extract the modifiers from type and declaration trees. DWARFv3 added restrict_type [PR debug/59051] and DWARFv5 has proposals for atomic_type and aligned_type) pass the requested modifiers as one flag value to these functions. Which will hopefully be easier to implement based on this change. gcc/ChangeLog * dwarf2out.h (enum dw_mod_flag): New enum. * dwarf2out.c (dw_mod_decl_flags): New function. (dw_mod_type_flags): Likewise. (modified_type_die): Take one modifiers flag argument instead of one for const and one for volatile. (add_type_attribute): Likewise. (generic_parameter_die): Call add_type_attribute with one modifier argument. (base_type_for_mode): Likewise. (add_bounds_info): Likewise. (add_subscript_info): Likewise. (gen_array_type_die): Likewise. (gen_descr_array_type_die): Likewise. (gen_entry_point_die): Likewise. (gen_enumeration_type_die): Likewise. (gen_formal_parameter_die): Likewise. (gen_subprogram_die): Likewise. (gen_variable_die): Likewise. (gen_const_die): Likewise. (gen_field_die): Likewise. (gen_pointer_type_die): Likewise. (gen_reference_type_die): Likewise. (gen_ptr_to_mbr_type_die): Likewise. (gen_inheritance_die): Likewise. (gen_subroutine_type_die): Likewise. (gen_typedef_die): Likewise. (force_type_die): Likewise. --- gcc/ChangeLog | 30 gcc/dwarf2out.c | 133 ++- gcc/dwarf2out.h |8 +++ 3 files changed, 110 insertions(+), 61 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2d0a07c..d7ee868 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,33 @@ +2014-06-18 Mark Wielaard m...@redhat.com + + * dwarf2out.h (enum dw_mod_flag): New enum. + * dwarf2out.c (dw_mod_decl_flags): New function. + (dw_mod_type_flags): Likewise. + (modified_type_die): Take one modifiers flag argument instead of + one for const and one for volatile. + (add_type_attribute): Likewise. + (generic_parameter_die): Call add_type_attribute with one modifier + argument. + (base_type_for_mode): Likewise. + (add_bounds_info): Likewise. + (add_subscript_info): Likewise. + (gen_array_type_die): Likewise. + (gen_descr_array_type_die): Likewise. + (gen_entry_point_die): Likewise. + (gen_enumeration_type_die): Likewise. + (gen_formal_parameter_die): Likewise. + (gen_subprogram_die): Likewise. + (gen_variable_die): Likewise. + (gen_const_die): Likewise. + (gen_field_die): Likewise. + (gen_pointer_type_die): Likewise. + (gen_reference_type_die): Likewise. + (gen_ptr_to_mbr_type_die): Likewise. + (gen_inheritance_die): Likewise. + (gen_subroutine_type_die): Likewise. + (gen_typedef_die): Likewise. + (force_type_die): Likewise. + 2014-06-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm_neon.h (vadd_f32): Change #ifdef to __FAST_MATH. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 933ec62..0216801 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3140,7 +3140,9 @@ static void output_file_names (void); static dw_die_ref base_type_die (tree); static int is_base_type (tree); static dw_die_ref subrange_type_die (tree, tree, tree, dw_die_ref); -static dw_die_ref modified_type_die (tree, int, int, dw_die_ref); +static int dw_mod_decl_flags (const_tree); +static int dw_mod_type_flags (const_tree); +static dw_die_ref modified_type_die (tree, int, dw_die_ref); static dw_die_ref generic_parameter_die (tree, tree, bool, dw_die_ref); static dw_die_ref template_parameter_pack_die (tree, tree, dw_die_ref); static int type_is_enum (const_tree); @@ -3198,7 +3200,7 @@ static dw_die_ref scope_die_for (tree, dw_die_ref); static inline int local_scope_p (dw_die_ref); static inline int class_scope_p (dw_die_ref); static inline int class_or_namespace_scope_p (dw_die_ref); -static void add_type_attribute (dw_die_ref, tree, int, int, dw_die_ref); +static void add_type_attribute (dw_die_ref, tree, int, dw_die_ref); static void add_calling_convention_attribute (dw_die_ref, tree); static const char *type_tag (const_tree); static tree member_declared_type (const_tree); @@ -10498,12 +10500,25 @@ subrange_type_die (tree type, tree low, tree high, dw_die_ref context_die) return subrange_die; } +static int +dw_mod_decl_flags (const_tree decl) +{ + return ((TREE_READONLY (decl) ? dw_mod_const : dw_mod_none) + | (TREE_THIS_VOLATILE (decl) ? dw_mod_volatile :