Re: [PATCH, rs6000, testsuite, PR65456] Changes for unaligned vector load/store support on POWER8
On Mon, Mar 30, 2015 at 1:42 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a follow-up to https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00103.html, which adds support for faster unaligned vector memory accesses on POWER8. As pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456, there was a piece missing here. The target macro SLOW_UNALIGNED_ACCESS is still evaluating to 1 for unaligned vector accesses on POWER8, which causes some scalarization to occur during expand. This version of the patch fixes this as well. The only changes from before are the update to config/rs6000/rs6000.h, and the new test case gcc.target/powerpc/pr65456.c. Is this ok for trunk after 5 branches, and backports to 4.8, 4.9, 5 thereafter? Thanks, Bill [gcc] 2015-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_option_override_internal): For VSX + POWER8, enable TARGET_ALLOW_MOVMISALIGN and TARGET_EFFICIENT_UNALIGNED_VSX if not selected by command line option. However, for -mno-allow-movmisalign, be sure to disable TARGET_EFFICIENT_UNALIGNED_VSX to avoid an ICE. (rs6000_builtin_mask_for_load): Return 0 for targets with efficient unaligned VSX accesses so that the vectorizer will use direct unaligned loads. (rs6000_builtin_support_vector_misalignment): Always return true for targets with efficient unaligned VSX accesses. (rs6000_builtin_vectorization_cost): Cost of unaligned loads and stores on targets with efficient unaligned VSX accesses is almost always the same as the cost of an aligned load or store, so model it that way. * config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Evaluate to zero for unaligned vector accesses on POWER8. * config/rs6000/rs6000.opt (mefficient-unaligned-vector): New undocumented option. [gcc/testsuite] 2015-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vect/bb-slp-24.c: Exclude test for POWER8. * gcc.dg/vect/bb-slp-25.c: Likewise. * gcc.dg/vect/bb-slp-29.c: Likewise. * gcc.dg/vect/bb-slp-32.c: Replace vect_no_align with vect_no_align { ! vect_hw_misalign }. * gcc.dg/vect/bb-slp-9.c: Likewise. * gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Exclude test for vect_hw_misalign. * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Likewise. * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust tests to account for POWER8, where peeling for alignment is not needed. * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Replace vect_no_align with vect_no_align { ! vect_hw_misalign }. * gcc.dg.vect.if-cvt-stores-vect-ifcvt-18.c: Likewise. * gcc.dg/vect/no-scevccp-outer-6-global.c: Likewise. * gcc.dg/vect/no-scevccp-outer-6.c: Likewise. * gcc.dg/vect/no-vfa-vect-43.c: Likewise. * gcc.dg/vect/no-vfa-vect-57.c: Likewise. * gcc.dg/vect/no-vfa-vect-61.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise. * gcc.dg/vect/pr16105.c: Likewise. * gcc.dg/vect/pr20122.c: Likewise. * gcc.dg/vect/pr33804.c: Likewise. * gcc.dg/vect/pr33953.c: Likewise. * gcc.dg/vect/pr56787.c: Likewise. * gcc.dg/vect/pr58508.c: Likewise. * gcc.dg/vect/slp-25.c: Likewise. * gcc.dg/vect/vect-105-bit-array.c: Likewise. * gcc.dg/vect/vect-105.c: Likewise. * gcc.dg/vect/vect-27.c: Likewise. * gcc.dg/vect/vect-29.c: Likewise. * gcc.dg/vect/vect-33.c: Exclude unaligned access test for POWER8. * gcc.dg/vect/vect-42.c: Replace vect_no_align with vect_no_align { ! vect_hw_misalign }. * gcc.dg/vect/vect-44.c: Likewise. * gcc.dg/vect/vect-48.c: Likewise. * gcc.dg/vect/vect-50.c: Likewise. * gcc.dg/vect/vect-52.c: Likewise. * gcc.dg/vect/vect-56.c: Likewise. * gcc.dg/vect/vect-60.c: Likewise. * gcc.dg/vect/vect-72.c: Likewise. * gcc.dg/vect/vect-75-big-array.c: Likewise. * gcc.dg/vect/vect-75.c: Likewise. * gcc.dg/vect/vect-77-alignchecks.c: Likewise. * gcc.dg/vect/vect-77-global.c: Likewise. * gcc.dg/vect/vect-78-alignchecks.c: Likewise. * gcc.dg/vect/vect-78-global.c: Likewise. * gcc.dg/vect/vect-93.c: Likewise. * gcc.dg/vect/vect-95.c: Likewise. * gcc.dg/vect/vect-96.c: Likewise. * gcc.dg/vect/vect-cond-1.c: Likewise. * gcc.dg/vect/vect-cond-3.c: Likewise. * gcc.dg/vect/vect-cond-4.c: Likewise. * gcc.dg/vect/vect-cselim-1.c: Likewise. * gcc.dg/vect/vect-multitypes-1.c:
Minor tweak to size functions
The size functions are used by the Ada compiler to factor out the large size computations in self-referential types (dynamic types whose size depends on a discriminant in the object). We generate useless leaf size functions for size expressions involving discriminants: _Global.Sz4_Q (ada__streams__stream_element_count___XDLU_0__9223372036854775807 p0) { return (bitsizetype) p0 * 8; _Global.Sz5_Q (ada__streams__stream_element_count___XDLU_0__9223372036854775807 p0) { return (sizetype) p0; _Global.Sz6_Q (ada__streams__stream_element_count___XDLU_0__9223372036854775807 p0) { return (bitsizetype) p0 * 8; _Global.Sz7_Q (ada__streams__stream_element_count___XDLU_0__9223372036854775807 p0) { return (sizetype) p0; The wrapped expressions are simple arithmetic operations so there is no real point in creating a size function for them. That's already ensured for non- leaf size functions (size functions calling other size functions) and it's immediate to extend it down to the leaves. Tested on x86_64-suse-linux, applied on the mainline (this only affects the Ada compiler). The tree-cfg.c hunk adds the missing curly bracket at the end in the above dumps, applied as obvious. 2015-04-27 Eric Botcazou ebotca...@adacore.com * stor-layout.c (self_referential_component_ref_p): New predicate. (copy_self_referential_tree_r): Use it. (self_referential_size): Punt for simple operations directly involving self-referential component references. * tree-cfg.c (dump_function_to_file): Add missing final curly bracket. -- Eric BotcazouIndex: stor-layout.c === --- stor-layout.c (revision 222439) +++ stor-layout.c (working copy) @@ -127,6 +127,20 @@ variable_size (tree size) /* An array of functions used for self-referential size computation. */ static GTY(()) vectree, va_gc *size_functions; +/* Return true if T is a self-referential component reference. */ + +static bool +self_referential_component_ref_p (tree t) +{ + if (TREE_CODE (t) != COMPONENT_REF) +return false; + + while (REFERENCE_CLASS_P (t)) +t = TREE_OPERAND (t, 0); + + return (TREE_CODE (t) == PLACEHOLDER_EXPR); +} + /* Similar to copy_tree_r but do not copy component references involving PLACEHOLDER_EXPRs. These nodes are spotted in find_placeholder_in_expr and substituted in substitute_in_expr. */ @@ -154,19 +168,10 @@ copy_self_referential_tree_r (tree *tp, } /* Default case: the component reference. */ - else if (code == COMPONENT_REF) + else if (self_referential_component_ref_p (*tp)) { - tree inner; - for (inner = TREE_OPERAND (*tp, 0); - REFERENCE_CLASS_P (inner); - inner = TREE_OPERAND (inner, 0)) - ; - - if (TREE_CODE (inner) == PLACEHOLDER_EXPR) - { - *walk_subtrees = 0; - return NULL_TREE; - } + *walk_subtrees = 0; + return NULL_TREE; } /* We're not supposed to have them in self-referential size trees @@ -199,7 +204,7 @@ self_referential_size (tree size) /* Do not factor out simple operations. */ t = skip_simple_constant_arithmetic (size); - if (TREE_CODE (t) == CALL_EXPR) + if (TREE_CODE (t) == CALL_EXPR || self_referential_component_ref_p (t)) return size; /* Collect the list of self-references in the expression. */ Index: tree-cfg.c === --- tree-cfg.c (revision 222439) +++ tree-cfg.c (working copy) @@ -7441,7 +7441,11 @@ dump_function_to_file (tree fndecl, FILE else { if (!ignore_topmost_bind) - fprintf (file, {\n); + { + fprintf (file, {\n); + /* No topmost bind, pretend it's ignored for later. */ + ignore_topmost_bind = true; + } indent = 2; }
Re: [ARM] Fix RTX cost for vector SET
Hi Kugan, On 25/04/15 00:30, Kugan wrote: Thanks for the review. I have updated the patch based on the comments with some other minor changes. Bootstrapped and regression tested on aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-24 Kugan Vivekanandarajah kug...@linaro.org Jim Wilson jim.wil...@linaro.org * config/arm/aarch-common-protos.h (struct mem_cost_table): Added new fields loadv and storev. * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs): Initialize loadv and storev. * config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise. (cortexa53_extra_costs): Likewise. (cortexa57_extra_costs): Likewise. (xgene1_extra_costs): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector rtx_costs. Due to the struct mem_cost_table update for aarch64, arm cost tables also need to be updated. Please find the patch that does this. Regression tested on arm-none-linux-gnu with no-new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-25 Kugan Vivekanandarajah kug...@linaro.org * config/arm/arm.c (cortexa9_extra_costs): Initialize loadv and storev. (cortexa8_extra_costs): Likewise. (cortexa5_extra_costs): Likewise. (cortexa7_extra_costs): Likewise. (cortexa12_extra_costs): Likewise. (cortexa15_extra_costs): Likewise. (v7m_extra_costs): Likewise. This arm part looks ok to me FWIW (if the approach in the aarch64 patch is deemed ok by the aarch64 maintainers), it just syncs the fields of the common cost struct between arm and aarch64. Please only commit this if the aarch64 patch gets approved and committed at the same time, otherwise we'll get a build failure. Having a look at the arm rtx costs and seeing if we can improve them in the same way as the aarch64 ones would be good as a follow up at some point too. Thanks, Kyrill
Re: [PATCH] Properly valueize values we value-number to
On Fri, 24 Apr 2015, Jeff Law wrote: On 02/17/2015 07:58 AM, Richard Biener wrote: [ Restarting a old thread... ] On a closer look the record_const_or_copy_1 hunk is redundant (record_equality is really a bit obfuscated...). Agreed. I'm not entirely sure how it got to this point. And record_equality is where the SSA_NAME_VALUEs might end up forming chains? At least because we might record a SSA_NAME_VALUE for sth that might be the target of a SSA_NAME_VALUE, as in a_1 = b_2; SSA_NAME_VALUE (a_1) == b_2; if (b_2 == i_3(D)) ... temporarily SSA_NAME_VALUE (b_2) == i_3(D) thus under if (b_2 == i_3(D)) SSA_NAME_VALUE (a_1) == b_2 and SSA_NAME_VALUE (SSA_NAME_VALUE (a_1)) == i_3(D)? I guess we can't easily avoid that because we don't keep track of a reverse mapping (nor would we want to update that). Right. In general, the fact that we're in SSA form means that we ought not get loops in the SSA_NAME_VALUE chain since there's a single static assignment and we'll visit it once. That nice model breaks down in two ways. First we derive equivalences from equality conditions like you've shown above. Second, when threading we can thread through a loop latch and start reprocessing a block. The interaction between those two can result in loops in the value chain. What I'm hoping to do in GCC6 is eliminate all this stuff with a threader that is independent of the dominator walk (and optimizer). Instead of walking forward through the dominator tree recording key nuggets, then removing those nuggets as we pop back up the tree, we instead we start with the conditional and walk up the use-def chains, simplifying as we go, with the goal of simplifying to a constant, of course. You can see the beginnings of that with the FSM bits from Sebastian. Obviously until those bits are in place, we should try to clean up any warts in the current implementation. Btw, in record_equality, the == part of the = check for the loop depth will just randomly swap x and y while we should prefer IL canonical order. If we can't rely on the input being in IL canonical order we should prepend the whole function with if (tree_swap_operands_p (x, y, false)) std::swap (x, y); and change = to for the loop-depth check. Agreed. Makes perfect sense. I'm now re-bootstrapping and testing the following. Richard. 2015-04-27 Richard Biener rguent...@suse.de * tree-ssa-dom.c (record_equivalences_from_phis): Valueize PHI arg. (record_equivalences_from_stmt): Valueize rhs. (record_equality): Canonicalize x and y order via tree_swap_operands_p. Do not swap operands for same loop depth. Index: gcc/tree-ssa-dom.c === *** gcc/tree-ssa-dom.c (revision 222360) --- gcc/tree-ssa-dom.c (working copy) *** record_equivalences_from_phis (basic_blo *** 1519,1524 --- 1519,1531 if (lhs == t) continue; + /* Valueize t. */ + if (TREE_CODE (t) == SSA_NAME) + { + tree tmp = SSA_NAME_VALUE (t); + t = tmp ? tmp : t; + } + /* If we have not processed an alternative yet, then set RHS to this alternative. */ if (rhs == NULL) *** record_equality (tree x, tree y) *** 1752,1757 --- 1759,1767 { tree prev_x = NULL, prev_y = NULL; + if (tree_swap_operands_p (x, y, false)) + std::swap (x, y); + if (TREE_CODE (x) == SSA_NAME) prev_x = SSA_NAME_VALUE (x); if (TREE_CODE (y) == SSA_NAME) *** record_equality (tree x, tree y) *** 1766,1772 else if (is_gimple_min_invariant (x) /* ??? When threading over backedges the following is important for correctness. See PR61757. */ ! || (loop_depth_of_name (x) = loop_depth_of_name (y))) prev_x = x, x = y, y = prev_x, prev_x = prev_y; else if (prev_x is_gimple_min_invariant (prev_x)) x = y, y = prev_x, prev_x = prev_y; --- 1776,1782 else if (is_gimple_min_invariant (x) /* ??? When threading over backedges the following is important for correctness. See PR61757. */ ! || (loop_depth_of_name (x) loop_depth_of_name (y))) prev_x = x, x = y, y = prev_x, prev_x = prev_y; else if (prev_x is_gimple_min_invariant (prev_x)) x = y, y = prev_x, prev_x = prev_y; *** record_equivalences_from_stmt (gimple st *** 2128,2145 if (may_optimize_p (TREE_CODE (rhs) == SSA_NAME || is_gimple_min_invariant (rhs))) ! { ! if (dump_file (dump_flags TDF_DETAILS)) ! { ! fprintf (dump_file, ASGN ); ! print_generic_expr (dump_file, lhs, 0); ! fprintf (dump_file, = ); ! print_generic_expr (dump_file, rhs, 0); ! fprintf
Re: Improve LTO type checking during symtab merging
On Mon, 27 Apr 2015, Jan Hubicka wrote: Hi, this patch strengten ODR checking for requiring match on declarations (if both of them are defined in C++ tranlsation unit). Currently ipa-symtab.c will warn only if declarations are incompatible in useless_type_conversion sense. Now we warn more often, i.e. in the following example: t.C: char a; t2.C: #include stdtype.h extern int8_t a; void *b=a; Will lead to warning: t2.C:2:15: warning: type of ???a??? does not match original declaration extern int8_t a; ^ built-in: note: type name ???char??? should match type name ???signed char??? /usr/include/stdint.h:37:22: note: the incompatible type is defined here typedef signed char int8_t; ^ /aux/hubicka/t.C:1:6: note: previously declared here char a; ^ Funnilly enough we do not get warning when t2.C is just signed char or unsigned char because that is builtin type and thus non-ODR. Something I need to deal with incrementally. I also added call to warn_types_mismatch that outputs extra note about what is wrong with the type: in C++ the mismatch is often carried over several levels on declarations and it is hard to work out the reason without extra info. Richard, According to comments, it seems that the code was tailored to not output warning when we can avoid wrong code issues on mismatches based on fact that linker would be also silent. I am particularly entertained by the following hunk which I killed: - if (!types_compatible_p (TREE_TYPE (prevailing_decl), -TREE_TYPE (decl))) - /* If we don't have a merged type yet...sigh. The linker -wouldn't complain if the types were mismatched, so we -probably shouldn't either. Just use the type from -whichever decl appears to be associated with the -definition. If for some odd reason neither decl is, the -older one wins. */ - (void) 0; It is fun we go through the trouble of comparing types and then do nothing ;) Type merging is also completed at this time, so at least the comment looks out of date. I think as QOI issue, we do want to warn in cases the code is inconsistent (it is pretty much surely a bug), but perhaps we may want to have the warnings with -Wodr only and use slightly different warning and different -W switch for warnings that unavoidably leads to wrong code surprises? This should be easy to implement by adding two levels into new warn_type_compatibility_p predicate. I am personaly also OK however with warning always or making all the warnings -Wodr. What do you think? Only emitting the warnings with -Wodr looks good to me. I can't see how we can decide what cases lead to wrong code surprises and what not (other than using types_compatible_p ...). Wrong-code can only(?) happen if we happen to inline in a way that makes the incosistency visible in a single function. Incrementally I am heading towards proper definition of decl compatibility that I can plug into symtab merging and avoid merging incompatible decls (so FORTIFY_SOURCE works). lto-symtab and ipa-icf both have some knowledge of the problem, I want to get both right and factor out common logic. Sounds good. Other improvement is to preserve the ODR type info when non-ODR variant previals so one can diagnose clash in between C++ units even in mixed language linktimes. Hmm, but then merging ODR with non-ODR variant is already pointing to a ODR violation? So why do you need to retain that info? Btw, there is that old PR41227 which has both wrong-code and diagnostic impact... Richard. Bootstrapped/regtested x86_64-linux with no new ODR warnings hit by -Werror. Honza * ipa-devirt.c (register_odr_type): Be ready for non-odr main variant of odr type. * ipa-utils.h (warn_types_mismatch): New. * lto/lto-symtab.c (warn_type_compatibility_p): Break out from ... (lto_symtab_merge): ... here. (lto_symtab_merge_decls_2): Improve warnings. Index: ipa-devirt.c === --- ipa-devirt.c (revision 222391) +++ ipa-devirt.c (working copy) @@ -2029,10 +2030,14 @@ register_odr_type (tree type) if (in_lto_p) odr_vtable_hash = new odr_vtable_hash_type (23); } - /* Arrange things to be nicer and insert main variants first. */ - if (odr_type_p (TYPE_MAIN_VARIANT (type))) + /* Arrange things to be nicer and insert main variants first. + ??? fundamental prerecorded types do not have mangled names; this + makes it possible that non-ODR type is main_odr_variant of ODR type. + Things may get smoother if LTO FE set mangled name of those types same + way as C++ FE does. */ + if (odr_type_p (main_odr_variant (TYPE_MAIN_VARIANT (type get_odr_type (TYPE_MAIN_VARIANT (type), true); - if (TYPE_MAIN_VARIANT (type) !=
[committed] Make arguments to vec::splice const
vec::splice doesn't modify its source vector, so mark the argument as const. Tested on x86_64-linux-gnu. Applied as obvious. Richard gcc/ * vec.h (vec): Make splice arguments const. Update definitions accordingly. Index: gcc/vec.h === --- gcc/vec.h 2015-04-27 10:38:48.0 +0100 +++ gcc/vec.h 2015-04-27 10:41:57.859920289 +0100 @@ -483,8 +483,8 @@ struct GTY((user)) vecT, A, vl_embed bool iterate (unsigned, T *) const; bool iterate (unsigned, T **) const; vec *copy (ALONE_CXX_MEM_STAT_INFO) const; - void splice (vec ); - void splice (vec *src); + void splice (const vec ); + void splice (const vec *src); T *quick_push (const T ); T pop (void); void truncate (unsigned); @@ -705,7 +705,7 @@ vec_safe_copy (vecT, A, vl_embed *src Reallocate DST, if necessary. */ templatetypename T, typename A inline void -vec_safe_splice (vecT, A, vl_embed *dst, vecT, A, vl_embed *src +vec_safe_splice (vecT, A, vl_embed *dst, const vecT, A, vl_embed *src CXX_MEM_STAT_INFO) { unsigned src_len = vec_safe_length (src); @@ -836,7 +836,7 @@ vecT, A, vl_embed::copy (ALONE_MEM_STA templatetypename T, typename A inline void -vecT, A, vl_embed::splice (vecT, A, vl_embed src) +vecT, A, vl_embed::splice (const vecT, A, vl_embed src) { unsigned len = src.length (); if (len) @@ -849,7 +849,7 @@ vecT, A, vl_embed::splice (vecT, A, v templatetypename T, typename A inline void -vecT, A, vl_embed::splice (vecT, A, vl_embed *src) +vecT, A, vl_embed::splice (const vecT, A, vl_embed *src) { if (src) splice (*src); @@ -1212,8 +1212,8 @@ struct vecT, va_heap, vl_ptr vec copy (ALONE_CXX_MEM_STAT_INFO) const; bool reserve (unsigned, bool = false CXX_MEM_STAT_INFO); bool reserve_exact (unsigned CXX_MEM_STAT_INFO); - void splice (vec ); - void safe_splice (vec CXX_MEM_STAT_INFO); + void splice (const vec ); + void safe_splice (const vec CXX_MEM_STAT_INFO); T *quick_push (const T ); T *safe_push (const T CXX_MEM_STAT_INFO); T pop (void); @@ -1489,7 +1489,7 @@ vecT, va_heap, vl_ptr::release (void) templatetypename T inline void -vecT, va_heap, vl_ptr::splice (vecT, va_heap, vl_ptr src) +vecT, va_heap, vl_ptr::splice (const vecT, va_heap, vl_ptr src) { if (src.m_vec) m_vec-splice (*(src.m_vec)); @@ -1503,7 +1503,7 @@ vecT, va_heap, vl_ptr::splice (vecT, templatetypename T inline void -vecT, va_heap, vl_ptr::safe_splice (vecT, va_heap, vl_ptr src +vecT, va_heap, vl_ptr::safe_splice (const vecT, va_heap, vl_ptr src MEM_STAT_DECL) { if (src.length ())
Fix xstormy16 handling of HImode predicates
xstormy16 has: (define_predicate xs_hi_general_operand (match_code const_int,reg,subreg,mem,symbol_ref,label_ref,const) { if ((GET_CODE (op) == CONST_INT) ((INTVAL (op) = 32768) || (INTVAL (op) -32768))) { error (constant halfword load operand out of range); return false; } return general_operand (op, mode); }) I think this is wrong for two reasons: - Predicates are supposed to accept things in different modes and weed out the invalid values. That's why the predicates start with a check that GET_MODE (op) == VOIDmode || GET_MODE (op) == mode. IMO it doesn't make sense to handle (e.g.) SImode REGs correctly but raise an error for SImode CONST_INTs. The generic predicates later reject CONST_INTs that aren't suitable for the mode, so invalid HImode CONST_INTs will fail to match rather than be silently accepted. - If we did have a check, it should be an assert rather than a user error. The port is effectively relying on the matching algorithm in genrecog to ensure that the predicate is only called on SET_SRC rtxes once the mode of the SET_DEST has been checked. This patch removes the predicate and uses general_operand instead. xs_hi_nonmemory_operand is similar except that it rejects symbol_ref and label_ref constants. The patch therefore removes the error check but keeps the restricted code list. Note that we have an assert in rtl.h that CONST_INTs fit into the range appropriate for the wi:: precision. This should catch more cases than the predicate checks would have. (It has flagged up problems that were latent before.) Tested by building xstormy16-elf and making sure that there were no differences in assembly output for the GCC testsuite. OK to install? Thanks, Richard gcc/ * config/stormy16/predicates.md (xs_hi_general_operand): Delete. (xs_hi_nonmemory_operand): Remove error. * config/stormy16/stormy16.md (movhi, movhi_internal): Use general_operand rather than xs_hi_general_operand. Index: gcc/config/stormy16/predicates.md === --- gcc/config/stormy16/predicates.md 2015-04-27 10:38:48.0 +0100 +++ gcc/config/stormy16/predicates.md 2015-04-27 10:40:00.709344494 +0100 @@ -151,28 +151,8 @@ (define_predicate xstormy16_carry_plus_ (INTVAL (XEXP (op, 1)) -4 || INTVAL (XEXP (op, 1)) 4)); }) -(define_predicate xs_hi_general_operand - (match_code const_int,reg,subreg,mem,symbol_ref,label_ref,const) -{ - if ((GET_CODE (op) == CONST_INT) -((INTVAL (op) = 32768) || (INTVAL (op) -32768))) -{ - error (constant halfword load operand out of range); - return false; -} - - return general_operand (op, mode); -}) - (define_predicate xs_hi_nonmemory_operand (match_code const_int,reg,subreg,const) { - if ((GET_CODE (op) == CONST_INT) -((INTVAL (op) = 32768) || (INTVAL (op) -32768))) -{ - error (constant arithmetic operand out of range); - return false; -} - return nonmemory_operand (op, mode); }) Index: gcc/config/stormy16/stormy16.md === --- gcc/config/stormy16/stormy16.md 2015-04-27 10:38:48.0 +0100 +++ gcc/config/stormy16/stormy16.md 2015-04-27 10:40:00.709344494 +0100 @@ -185,7 +185,7 @@ (define_insn pophi1 (define_expand movhi [(set (match_operand:HI 0 nonimmediate_nonstack_operand ) - (match_operand:HI 1 xs_hi_general_operand ))] + (match_operand:HI 1 general_operand ))] { xstormy16_expand_move (HImode, operands[0], operands[1]); DONE; @@ -193,7 +193,7 @@ (define_expand movhi (define_insn movhi_internal [(set (match_operand:HI 0 nonimmediate_nonstack_operand =r,m,e,e,T,r,S,W,e) - (match_operand:HI 1 xs_hi_general_operand r,e,m,L,L,i,i,ie,W))] + (match_operand:HI 1 general_operand r,e,m,L,L,i,i,ie,W))] @ mov %0,%1
Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
On 22/04/15 12:51, Kyrill Tkachov wrote: On 21/04/15 18:33, Kyrill Tkachov wrote: On 21/04/15 15:09, Jeff Law wrote: On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. So what should be the action plan here? I can't add an assert on positive result as a negative result is valid. We want to catch the case where this would cause trouble on pa, or change the patch until we're confident that it's fine for pa. That being said, reading the documentation of STACK_GROWS_UPWARD and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case where this would cause trouble on pa. Is the problem that in the function: +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) +return -1; + + return INTVAL (sub); +} for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, so the function should something like the following? static int memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) { #ifdef ARGS_GROW_DOWNWARD rtx tmp = plus_constant (Pmode, x, -size); #else rtx tmp = plus_constant (Pmode, x, size); #endif rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); if (!CONST_INT_P (sub)) return -1; #ifdef ARGS_GROW_DOWNWARD return INTVAL (-sub); #else return INTVAL (sub); #endif } now, say for x == sp + 4, y == sp + 8, size == 16: This would be a problematic case for arm, so this code on arm (where ARGS_GROW_DOWNWARD is *not* defined) would return 12, which is the number of bytes that overlap. On a target where ARGS_GROW_DOWNWARD is defined this would return -20, meaning that no overlap occurs (because we read in the descending direction from x, IIUC). Hi Jeff, Here's an attempt to make this more concrete. Only the memory_load_overlap function has changed. This time I tried to take into account the case when ARGS_GROW_DOWNWARD. Take the case where x == sp, y == sp + 8, size == 16. For arm, this would return 8 as that is the number of bytes that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it would return -1 as we're reading down from x rather than up towards y. In the case when x == sp + 8, y == sp, size == 16 This would return -1 on arm since we're reading upwards from x and thefore no overlap would happen. On pa, this would return 8, which I think is the right thing. But again, I don't have access to any pa means of testing. What do you think of this approach? Hi Dave, Would it be possible for you to test this patch on a 64-bit hppa or at least bootstrap it? https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html There is a concern that it may potentially affect the passing of complex arguments partially on the stack and partially in regs on pa because of the way the args and stack grow on that target. Unfortunately I don't have access to any hardware or simulators. It would help a lot with getting this patch in. Thanks, Kyrill Thanks, Kyrill P.S. I've included the testcase from Honggyu in the patch. 2015-04-22 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/65358 * expr.c (memory_load_overlap): New function. (emit_push_insn): When pushing partial args to the stack would clobber the register part load the overlapping part into a pseudo and put it into the hard reg after pushing. 2015-04-22 Honggyu Kim hong.gyu@lge.com PR target/65358 * gcc.dg/pr65358.c: New test. Thanks, Kyrill Jeff
RE: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions.
-Original Message- From: Ramana Radhakrishnan [mailto:ramana@googlemail.com] Sent: Wednesday, April 22, 2015 5:00 PM To: Hale Wang Cc: Ramana Radhakrishnan; Joseph Myers; GCC Patches Subject: Re: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions. On Wed, Apr 22, 2015 at 9:32 AM, Hale Wang hale.w...@arm.com wrote: -Original Message- From: Ramana Radhakrishnan [mailto:ramana@googlemail.com] Sent: Wednesday, April 22, 2015 3:50 PM To: Joseph Myers Cc: Hale Wang; GCC Patches Subject: Re: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions. On Mon, Apr 13, 2015 at 12:25 PM, Joseph Myers jos...@codesourcery.com wrote: On Mon, 13 Apr 2015, Hale Wang wrote: Yes, you are right. It's my fault to add the only here. Thank you to point out this. Beside this, is this patch OK for you? I don't think it's a good idea for libgcc to include large pieces of assembly code generated by a compiler. Just compile the code with whatever options are needed at the time libgcc is built - possibly with #if conditionals to allow compiling different versions of the code. Indeed, just compile the code with option '-mfloat-abi=soft' at the time libgcc is build which can solve this problem. Or why not conditionally use the ``pcs'' attribute on the ARM port ? That then means you don't need options magic on top ? OK. I think your suggestion can solve this problem more clearly. I will resubmit a patch later. I think we can discard this patch this time. Thanks a lot. Hale Ramana
RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits
From: Jeff Law [mailto:l...@redhat.com] Sent: Saturday, April 25, 2015 2:57 AM +static rtx +sign_extend_short_imm (rtx src, machine_mode mode, unsigned int prec) +{ + if (GET_MODE_PRECISION (mode) prec CONST_INT_P (src) + INTVAL (src) 0 val_signbit_known_set_p (mode, INTVAL (src))) +src = GEN_INT (INTVAL (src) | ~GET_MODE_MASK (mode)); Can you go ahead and put each condition of the on a separate line. It uses more vertical space, but IMHO makes this easier to read.As I said, it was a nit :-) You're perfectly right. Anything that can improve readability of source code is a good thing. OK with that fix. Committed. Best regards, Thomas
Add a build version of inchash.o
The generator patch that I'm about to post wants to use inchash.c, so this patch adds it to the list of files that are built for the build system as well as the host. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * inchash.h, inchash.c: Include bconfig.h for build objects. * Makefile.in (build/inchash.o): New rule. Index: gcc/inchash.h === --- gcc/inchash.h 2015-04-27 10:38:48.0 +0100 +++ gcc/inchash.h 2015-04-27 10:42:57.783191573 +0100 @@ -20,7 +20,11 @@ Software Foundation; either version 3, o #ifndef INCHASH_H #define INCHASH_H 1 +#ifdef GENERATOR_FILE +#include bconfig.h +#else #include config.h +#endif #include system.h #include coretypes.h #include hashtab.h Index: gcc/inchash.c === --- gcc/inchash.c 2015-04-27 10:38:48.0 +0100 +++ gcc/inchash.c 2015-04-27 10:42:57.783191573 +0100 @@ -17,7 +17,11 @@ Software Foundation; either version 3, o along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +#ifdef GENERATOR_FILE +#include bconfig.h +#else #include config.h +#endif #include system.h #include coretypes.h #include hashtab.h Index: gcc/Makefile.in === --- gcc/Makefile.in 2015-04-27 10:38:48.0 +0100 +++ gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100 @@ -2434,6 +2434,8 @@ build/vec.o : vec.c $(BCONFIG_H) $(SYSTE $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H) build/hash-table.o : hash-table.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h \ $(HASH_TABLE_H) $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H) +build/inchash.o : inchash.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h \ + $(HASHTAB_H) inchash.h build/gencondmd.o : build/gencondmd.c $(BCONFIG_H) $(SYSTEM_H) \ coretypes.h $(GTM_H) insn-constants.h \ $(filter-out insn-flags.h, $(RTL_H) $(TM_P_H) $(FUNCTION_H) $(REGS_H) \
Re: Add a build version of inchash.o
On Mon, Apr 27, 2015 at 12:09 PM, Richard Sandiford richard.sandif...@arm.com wrote: The generator patch that I'm about to post wants to use inchash.c, so this patch adds it to the list of files that are built for the build system as well as the host. Tested on x86_64-linux-gnu. OK to install? Ok. THanks, Richard. Thanks, Richard gcc/ * inchash.h, inchash.c: Include bconfig.h for build objects. * Makefile.in (build/inchash.o): New rule. Index: gcc/inchash.h === --- gcc/inchash.h 2015-04-27 10:38:48.0 +0100 +++ gcc/inchash.h 2015-04-27 10:42:57.783191573 +0100 @@ -20,7 +20,11 @@ Software Foundation; either version 3, o #ifndef INCHASH_H #define INCHASH_H 1 +#ifdef GENERATOR_FILE +#include bconfig.h +#else #include config.h +#endif #include system.h #include coretypes.h #include hashtab.h Index: gcc/inchash.c === --- gcc/inchash.c 2015-04-27 10:38:48.0 +0100 +++ gcc/inchash.c 2015-04-27 10:42:57.783191573 +0100 @@ -17,7 +17,11 @@ Software Foundation; either version 3, o along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +#ifdef GENERATOR_FILE +#include bconfig.h +#else #include config.h +#endif #include system.h #include coretypes.h #include hashtab.h Index: gcc/Makefile.in === --- gcc/Makefile.in 2015-04-27 10:38:48.0 +0100 +++ gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100 @@ -2434,6 +2434,8 @@ build/vec.o : vec.c $(BCONFIG_H) $(SYSTE $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H) build/hash-table.o : hash-table.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h \ $(HASH_TABLE_H) $(GGC_H) toplev.h $(DIAGNOSTIC_CORE_H) +build/inchash.o : inchash.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h \ + $(HASHTAB_H) inchash.h build/gencondmd.o : build/gencondmd.c $(BCONFIG_H) $(SYSTEM_H) \ coretypes.h $(GTM_H) insn-constants.h \ $(filter-out insn-flags.h, $(RTL_H) $(TM_P_H) $(FUNCTION_H) $(REGS_H) \
[PATCH][combine][obvious] Use std::swap instead of manually swapping
Hi all, This is another one of those std::swap patches, this time I was in the combine neighbourhood Bootstrapped on x86_64 and tested on aarch64. Precedents suggest these changes are considered obvious. So I'll commit this in a couple of days unless someone objects. Thanks, Kyrill 2015-04-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * combine.c (simplify_if_then_else): Use std::swap instead of manually swapping. (known_cond): Likewise. (simplify_comparison): Likewise. commit e8aa8f2ec22cf548bc05f0f8e56a13a2bdd1c228 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Apr 24 17:47:14 2015 +0100 [combine][obvious] Use std::swap instead of manually swapping diff --git a/gcc/combine.c b/gcc/combine.c index 9f840cb..5f7cbc0 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -6185,7 +6185,7 @@ simplify_if_then_else (rtx x) if (false_code == EQ) { swapped = 1, true_code = EQ, false_code = NE; - temp = true_rtx, true_rtx = false_rtx, false_rtx = temp; + std::swap (true_rtx, false_rtx); } /* If we are comparing against zero and the expression being tested has @@ -6250,7 +6250,7 @@ simplify_if_then_else (rtx x) SUBST (XEXP (x, 1), false_rtx); SUBST (XEXP (x, 2), true_rtx); - temp = true_rtx, true_rtx = false_rtx, false_rtx = temp; + std::swap (true_rtx, false_rtx); cond = XEXP (x, 0); /* It is possible that the conditional has been simplified out. */ @@ -9022,7 +9022,6 @@ static rtx known_cond (rtx x, enum rtx_code cond, rtx reg, rtx val) { enum rtx_code code = GET_CODE (x); - rtx temp; const char *fmt; int i, j; @@ -9062,7 +9061,7 @@ known_cond (rtx x, enum rtx_code cond, rtx reg, rtx val) else if (COMPARISON_P (x) || COMMUTATIVE_ARITH_P (x)) { if (rtx_equal_p (XEXP (x, 0), val)) - cond = swap_condition (cond), temp = val, val = reg, reg = temp; + cond = swap_condition (cond), std::swap (val, reg); if (rtx_equal_p (XEXP (x, 0), reg) rtx_equal_p (XEXP (x, 1), val)) { @@ -11454,7 +11453,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) is already a constant integer. */ if (swap_commutative_operands_p (op0, op1)) { - tem = op0, op0 = op1, op1 = tem; + std::swap (op0, op1); code = swap_condition (code); } @@ -12341,7 +12340,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) /* We may have changed the comparison operands. Re-canonicalize. */ if (swap_commutative_operands_p (op0, op1)) { - tem = op0, op0 = op1, op1 = tem; + std::swap (op0, op1); code = swap_condition (code); }
[PATCH][simplify-rtx][trivial] Use std::swap instead of manually swapping
Hi all, This patch replaces in simplify-rtx.c the manual swapping of values with std::swap. Precedents suggest these are considered obvious changes. Bootstrapped and tested on aarch64, x86_64. Will commit as obvious in a couple of days if no one objects Thanks, Kyrill 2015-04-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * simplify-rtx.c (simplify_gen_binary): Use std::swap instead of manually swapping. (simplify_associative_operation): Likewise. (simplify_binary_operation): Likewise. (simplify_plus_minus): Likewise. (simplify_relational_operation): Likewise. (simplify_ternary_operation): Likewise. commit dbbb2823461181853a5605adc3cbbcab04dd00b1 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Apr 24 16:54:49 2015 +0100 [simplify-rtx] Use std::swap instead of manually swapping diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 4a8b56b..2dc2b58 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -216,7 +216,7 @@ simplify_gen_binary (enum rtx_code code, machine_mode mode, rtx op0, /* Put complex operands first and constants second if commutative. */ if (GET_RTX_CLASS (code) == RTX_COMM_ARITH swap_commutative_operands_p (op0, op1)) -tem = op0, op0 = op1, op1 = tem; +std::swap (op0, op1); return gen_rtx_fmt_ee (code, mode, op0, op1); } @@ -1926,9 +1926,7 @@ simplify_associative_operation (enum rtx_code code, machine_mode mode, if (! swap_commutative_operands_p (op1, op0)) return simplify_gen_binary (code, mode, op1, op0); - tem = op0; - op0 = op1; - op1 = tem; + std::swap (op0, op1); } if (GET_CODE (op0) == code) @@ -1977,9 +1975,7 @@ simplify_binary_operation (enum rtx_code code, machine_mode mode, /* Make sure the constant is second. */ if (GET_RTX_CLASS (code) == RTX_COMM_ARITH swap_commutative_operands_p (op0, op1)) -{ - tem = op0, op0 = op1, op1 = tem; -} +std::swap (op0, op1); trueop0 = avoid_constant_pool_reference (op0); trueop1 = avoid_constant_pool_reference (op1); @@ -4274,10 +4270,10 @@ simplify_plus_minus (enum rtx_code code, machine_mode mode, rtx op0, { ncode = MINUS; if (lneg) - tem = lhs, lhs = rhs, rhs = tem; + std::swap (lhs, rhs); } else if (swap_commutative_operands_p (lhs, rhs)) - tem = lhs, lhs = rhs, rhs = tem; + std::swap (lhs, rhs); if ((GET_CODE (lhs) == CONST || CONST_INT_P (lhs)) (GET_CODE (rhs) == CONST || CONST_INT_P (rhs))) @@ -4467,7 +4463,7 @@ simplify_relational_operation (enum rtx_code code, machine_mode mode, /* For the following tests, ensure const0_rtx is op1. */ if (swap_commutative_operands_p (op0, op1) || (op0 == const0_rtx op1 != const0_rtx)) -tem = op0, op0 = op1, op1 = tem, code = swap_condition (code); +std::swap (op0, op1), code = swap_condition (code); /* If op0 is a compare, extract the comparison arguments from it. */ if (GET_CODE (op0) == COMPARE op1 == const0_rtx) @@ -4828,7 +4824,7 @@ simplify_const_relational_operation (enum rtx_code code, /* Make sure the constant is second. */ if (swap_commutative_operands_p (op0, op1)) { - tem = op0, op0 = op1, op1 = tem; + std::swap (op0, op1); code = swap_condition (code); } @@ -5189,7 +5185,7 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, /* Canonicalize the two multiplication operands. */ /* a * -b + c = -b * a + c. */ if (swap_commutative_operands_p (op0, op1)) - tem = op0, op0 = op1, op1 = tem, any_change = true; + std::swap (op0, op1), any_change = true; if (any_change) return gen_rtx_FMA (mode, op0, op1, op2);
[PATCH][AArch64] Add alternative 'extr' pattern, calculate rtx cost properly
Hi all, We currently have a pattern that will recognise a particular combination of shifts and bitwise-or as an extr instruction. However, the order of the shifts inside the IOR doesn't have canonicalisation rules (see rev16 pattern for similar stuff). This means that for code like: unsigned long foo (unsigned long a, unsigned long b) { return (a 16) | (b 48); } we will recognise the extr, but for the equivalent: unsigned long foo (unsigned long a, unsigned long b) { return (b 48) | (a 16); } we won't, and we'll emit three instructions. This patch adds the pattern for the alternative order of shifts and allows us to generate for the above the code: foo: extrx0, x0, x1, 48 ret The zero-extended version is added as well and the rtx costs function is updated to handle all of these cases. I've seen this pattern trigger in the gcc code itself in expmed.c where it eliminated a sequence of orrs and shifts into an extr instruction! Bootstrapped and tested on aarch64-linux. Ok for trunk? Thanks, Kyrill 2015-04-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.md (*extrmode5_insn_alt): New pattern. (*extrsi5_insn_uxtw_alt): Likewise. * config/aarch64/aarch64.c (aarch64_extr_rtx_p): New function. (aarch64_rtx_costs, IOR case): Use above to properly cost extr operations. commit d45e92b3b8c5837328b7b10682565cacfb566e5b Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Mon Mar 2 17:26:38 2015 + [AArch64] Add alternative 'extr' pattern, calculate rtx cost properly diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 860a1dd..ef5a1e4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5438,6 +5438,51 @@ aarch64_frint_unspec_p (unsigned int u) } } +/* Return true iff X is an rtx that will match an extr instruction + i.e. as described in the *extrmode5_insn family of patterns. + OP0 and OP1 will be set to the operands of the shifts involved + on success and will be NULL_RTX otherwise. */ + +static bool +aarch64_extr_rtx_p (rtx x, rtx *res_op0, rtx *res_op1) +{ + rtx op0, op1; + machine_mode mode = GET_MODE (x); + + *res_op0 = NULL_RTX; + *res_op1 = NULL_RTX; + + if (GET_CODE (x) != IOR) +return false; + + op0 = XEXP (x, 0); + op1 = XEXP (x, 1); + + if ((GET_CODE (op0) == ASHIFT GET_CODE (op1) == LSHIFTRT) + || (GET_CODE (op1) == ASHIFT GET_CODE (op0) == LSHIFTRT)) +{ + /* Canonicalise locally to ashift in op0, lshiftrt in op1. */ + if (GET_CODE (op1) == ASHIFT) +std::swap (op0, op1); + + if (!CONST_INT_P (XEXP (op0, 1)) || !CONST_INT_P (XEXP (op1, 1))) +return false; + + unsigned HOST_WIDE_INT shft_amnt_0 = UINTVAL (XEXP (op0, 1)); + unsigned HOST_WIDE_INT shft_amnt_1 = UINTVAL (XEXP (op1, 1)); + + if (shft_amnt_0 GET_MODE_BITSIZE (mode) + shft_amnt_0 + shft_amnt_1 == GET_MODE_BITSIZE (mode)) +{ + *res_op0 = XEXP (op0, 0); + *res_op1 = XEXP (op1, 0); + return true; +} +} + + return false; +} + /* Calculate the cost of calculating (if_then_else (OP0) (OP1) (OP2)), storing it in *COST. Result is true if the total cost of the operation has now been calculated. */ @@ -5968,6 +6013,16 @@ cost_plus: return true; } + + if (aarch64_extr_rtx_p (x, op0, op1)) +{ + *cost += rtx_cost (op0, IOR, 0, speed) + + rtx_cost (op1, IOR, 1, speed); + if (speed) +*cost += extra_cost-alu.shift; + + return true; +} /* Fall through. */ case XOR: case AND: diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1a7f888..17a8755 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3597,6 +3597,21 @@ (define_insn *extrmode5_insn [(set_attr type shift_imm)] ) +;; There are no canonicalisation rules for ashift and lshiftrt inside an ior +;; so we have to match both orderings. +(define_insn *extrmode5_insn_alt + [(set (match_operand:GPI 0 register_operand =r) + (ior:GPI (lshiftrt:GPI (match_operand:GPI 2 register_operand r) + (match_operand 4 const_int_operand n)) + (ashift:GPI (match_operand:GPI 1 register_operand r) + (match_operand 3 const_int_operand n] + UINTVAL (operands[3]) GET_MODE_BITSIZE (MODEmode) +(UINTVAL (operands[3]) + UINTVAL (operands[4]) + == GET_MODE_BITSIZE (MODEmode)) + extr\\t%w0, %w1, %w2, %4 + [(set_attr type shift_imm)] +) + ;; zero_extend version of the above (define_insn *extrsi5_insn_uxtw [(set (match_operand:DI 0 register_operand =r) @@ -3611,6 +3626,19 @@ (define_insn *extrsi5_insn_uxtw [(set_attr type shift_imm)] ) +(define_insn *extrsi5_insn_uxtw_alt + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI + (ior:SI (lshiftrt:SI (match_operand:SI 2 register_operand r) +
Remove redudant tests in icf_handled_component_p
handled_component_p comprises both REALPART_EXPR and IMAGPART_EXPR. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2015-04-27 Eric Botcazou ebotca...@adacore.com * ipa-icf.c (icf_handled_component_p): Remove redundant tests. -- Eric BotcazouIndex: ipa-icf.c === --- ipa-icf.c (revision 222439) +++ ipa-icf.c (working copy) @@ -1806,9 +1806,8 @@ sem_function::icf_handled_component_p (t { tree_code tc = TREE_CODE (t); - return ((handled_component_p (t)) - || tc == ADDR_EXPR || tc == MEM_REF || tc == REALPART_EXPR - || tc == IMAGPART_EXPR || tc == OBJ_TYPE_REF); + return (handled_component_p (t) + || tc == ADDR_EXPR || tc == MEM_REF || tc == OBJ_TYPE_REF); } /* Basic blocks dictionary BB_DICT returns true if SOURCE index BB
Mostly rewrite genrecog
I think it's been the case for a while that parallel builds of GCC tend to serialise around the compilation of insn-recog.c, especially with higher --enable-checking settings. This patch tries to speed that up by replacing most of genrecog with a new algorithm. I think the main problems with the current code are: 1. Vector architectures have added lots of new instructions that have a similar shape and differ only in mode, code or unspec number. The current algorithm doesn't have any way of factoring out those similarities. 2. When matching a particular instruction, the current code examines everything about a SET_DEST before moving on to the SET_SRC. This has two subproblems: 2a. The destination of a SET isn't very distinctive. It's usually just a register_operand, a memory_operand, a nonimmediate_operand or a flags register. We therefore tend to backtrack to the SET_DEST a lot, oscillating between groups of instructions with the same kind of destination. 2b. Backtracking through predicate checks is relatively expensive. It would be good to narrow down the shape of the instruction first and only then check the predicates. (The backtracking is expensive in terms of insn-recog.o compile time too, both because we need to copy into argument registers and out of the result register, and because it adds more sites where spills are needed.) 3. The code keeps one local variable per rtx depth, so it ends up loading the same rtx many times over (mostly when backtracking). This is very expensive in rtl-checking builds because each XEXP includes a code check and a line-specific failure call. In principle the idea of having one local variable per depth is good. But it was originally written that way when all optimisations were done at the rtl level and I imagine each local variable mapped to one pseudo register. These days the statements that reload the value needed on backtracking lead to many more SSA names and phi statements than you'd get with just a single variable per position (loaded once, so naturally SSA). There is still the potential benefit of avoiding having sibling rtxes live at once, but fixing (2) above reduces that problem. Also, the code is all goto-based, which makes it rather hard to step through. The patch deals with these as follows: 1. Detect subpatterns that differ only by mode, code and/or integer (e.g. unspec number) and split them out into a common routine. 2. Match the shape of the instruction first, in terms of codes, integers and vector lengths, and only then check the modes, predicates and dups. When checking the shape, handle SET_SRCs before SET_DESTs. In practice this seems to greatly reduce the amount of backtracking. 3. Have one local variable per rtx position. I tested the patch with and without the change and it helped a lot with rtl-checking builds without seeming to affect release builds much either way. As far as debuggability goes, the new code avoids gotos and just uses natural control flow. The headline stat is that a stage 3 --enable-checking=yes,rtl,df build of insn-recog.c on my box goes from 7m43s to 2m2s (using the same stage 2 compiler). The corresponding --enable-checking=release change is from 49s to 24s (less impressive, as expected). The patch seems to speed up recog. E.g. the time taken to build fold-const.ii goes from 6.74s before the patch to 6.69s after it; not a big speed-up, but reproducible. Here's a comparison of the number of lines of code in insn-recog.c before and after the patch on one target per config/ CPU: aarch64-linux-gnueabi 11552638169 : 33.04% alpha-linux-gnu 2447910740 : 43.87% arm-linux-gnueabi 16920867759 : 40.04% avr-rtems5564722127 : 39.76% bfin-elf 13928 6498 : 46.65% c6x-elf 2992813324 : 44.52% cr16-elf 2650 1419 : 53.55% cris-elf 18669 7257 : 38.87% epiphany-elf 19308 6131 : 31.75% fr30-elf 2204 1112 : 50.45% frv-linux-gnu13541 5950 : 43.94% h8300-elf19584 9327 : 47.63% hppa64-hp-hpux11.23 18299 8549 : 46.72% ia64-linux-gnu 3762917101 : 45.45% iq2000-elf2752 1609 : 58.47% lm32-elf 1536 872 : 56.77% m32c-elf 10040 4145 : 41.28% m32r-elf
[Patch, fortran] Simplify lbound for array subcomponents
Hello, while reviewing Thomas' bound simplification patch, I noticed that the {l,u}bound simplification code wasn't handling array subcomponents. Fixed by the attached patch, regression tested. OK for trunk? Mikael 2015-04-27 Mikael Morin mik...@gcc.gnu.org * simplify.c (simplify_bound_dim): Tighten the check for array fullness by also checking for absence of subreference. (simplify_bound): Don't skip simplification if the array has subreferences. (simplify_cobound): Same. 2015-04-27 Mikael Morin mik...@gcc.gnu.org * gfortran.dg/bound_simplify_4.f90: New. Index: simplify.c === --- simplify.c (révision 221972) +++ simplify.c (copie de travail) @@ -3338,7 +3338,7 @@ simplify_bound_dim (gfc_expr *array, gfc_expr *kin result = gfc_get_constant_expr (BT_INTEGER, k, array-where); /* Then, we need to know the extent of the given dimension. */ - if (coarray || ref-u.ar.type == AR_FULL) + if (coarray || (ref-u.ar.type == AR_FULL !ref-next)) { l = as-lower[d-1]; u = as-upper[d-1]; @@ -3417,11 +3417,8 @@ simplify_bound (gfc_expr *array, gfc_expr *dim, gf case AR_FULL: /* We're done because 'as' has already been set in the previous iteration. */ - if (!ref-next) - goto done; + goto done; - /* Fall through. */ - case AR_UNKNOWN: return NULL; @@ -3556,11 +3553,8 @@ simplify_cobound (gfc_expr *array, gfc_expr *dim, case AR_FULL: /* We're done because 'as' has already been set in the previous iteration. */ - if (!ref-next) - goto done; + goto done; - /* Fall through. */ - case AR_UNKNOWN: return NULL; ! { dg-do run } ! { dg-additional-options -fcoarray=single -fdump-tree-original } ! ! Check that {L,U}{,CO}BOUND intrinsics are properly simplified. ! type :: t integer :: c end type t type(t) :: d(3:8) = t(7) type(t) :: e[5:9,-1:*] if (lbound(d, 1) /= 3) call abort if (lbound(d(3:5), 1) /= 1) call abort if (lbound(d%c,1) /= 1) call abort if (ubound(d, 1) /= 8) call abort if (ubound(d(3:5), 1) /= 3) call abort if (ubound(d%c,1) /= 6) call abort if (lcobound(e, 1) /= 5) call abort if (lcobound(e%c, 1) /= 5) call abort if (lcobound(e, 2) /= -1) call abort if (lcobound(e%c, 2) /= -1) call abort if (ucobound(e, 1) /= 9) call abort if (ucobound(e%c, 1) /= 9) call abort ! no simplification for ucobound(e{,%c}, dim=2) end ! { dg-final { scan-tree-dump-not bound original } } ! { dg-final { scan-tree-dump-not abort original } } ! { dg-final { cleanup-tree-dump original } }
Extend list of codes accepted by address_operand
While testing the patch that I'm about to post, I found that the list of address_operand codes in gensupport.c was incomplete. This patch fills in some obvious missing entries. It isn't exactly clear which codes are allowed at the top level of an address. decompose_address is known to be general enough for all LRA targets, but it also technically accepts any unary operator as the outer code, whereas I think only SIGN_EXTEND and ZERO_EXTEND are used at the top level. The routine accepts ZERO_EXTRACT and SIGN_EXTRACT, but I think in practice they're only used for index values and not full addresses. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * gensupport.c (std_preds): Add missing codes to address_operand entry. Index: gcc/gensupport.c === --- gcc/gensupport.c2015-04-27 10:38:48.0 +0100 +++ gcc/gensupport.c2015-04-27 10:40:59.044635383 +0100 @@ -2800,7 +2800,8 @@ struct std_pred_table static const struct std_pred_table std_preds[] = { {general_operand, false, true, {SUBREG, REG, MEM}}, - {address_operand, true, true, {SUBREG, REG, MEM, PLUS, MINUS, MULT}}, + {address_operand, true, true, {SUBREG, REG, MEM, PLUS, MINUS, MULT, + ZERO_EXTEND, SIGN_EXTEND, AND}}, {register_operand, false, false, {SUBREG, REG}}, {pmode_register_operand, true, false, {SUBREG, REG}}, {scratch_operand, false, false, {SCRATCH, REG}},
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
From: Jeff Law [mailto:l...@redhat.com] Sent: Saturday, April 25, 2015 3:00 AM Do you have a testcase where this change can result in better generated code. If so please add that testcase. It's OK if it's ARM specific. Hi Jeff, Last time I tried I couldn't reduce the code to a small testcase but if I remember well it was mostly due to the problem of finding a good test for creduce (zero extension is not unique enough). I'll try again with a more manual approach and get back to you. Best regards, Thomas
Re: [libstdc++ PATCH] Add support for std::uncaught_exceptions
On 13/04/15 02:24 +0300, Ville Voutilainen wrote: The patch is a bit large since it does the baseline_symbols regeneration and other new-version api-dance. Hence attached gzipped. Tested on Linux x86-64. 2015-04-13 Ville Voutilainen ville.voutilai...@gmail.com Add support for std::uncaught_exceptions. * acinclude.m4: Bump libtool_VERSION. * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Regenerate. * config/abi/pre/gnu.ver: Export the new symbol. * configure: Bump libtool_VERSION. * libsupc++/eh_catch.cc(uncaught_exceptions): New. * libsupc++/exception(uncaught_exceptions): Ditto. * testsuite/18_support/uncaught_exceptions/uncaught_exceptions.cc: New. * testsuite/util/testsuite_abi.cc: Add 3.4.22 as the latest version. Tested x86_64-linux and powerpc64le-linux. Committed to trunk. The baseline_symbols changes aren't needed now and I tweaked the gnu.ver file slightly. I also added the feature-test macro recommended by N4440 and changed the new function to only be declared for -std=gnu++?? and -std=c++1z. It's useful and easy to provide it for pre-C++17 as an extension. commit c37d380f6eea816886147001e0e85932f02b756f Author: Jonathan Wakely jwak...@redhat.com Date: Mon Apr 27 20:46:03 2015 +0100 2015-04-27 Ville Voutilainen ville.voutilai...@gmail.com Add support for std::uncaught_exceptions. * acinclude.m4: Bump libtool_VERSION. * config/abi/pre/gnu.ver: Export the new symbol. * configure: Regenerate. * libsupc++/eh_catch.cc (uncaught_exceptions): New. * libsupc++/exception (uncaught_exceptions): New. * testsuite/18_support/uncaught_exceptions/uncaught_exceptions.cc: New. * testsuite/util/testsuite_abi.cc: Add 3.4.22 as the latest version. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index a1e301f..196b4ff 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -3383,7 +3383,7 @@ changequote([,])dnl fi # For libtool versioning info, format is CURRENT:REVISION:AGE -libtool_VERSION=6:21:0 +libtool_VERSION=6:22:0 # Everything parsed; figure out what files and settings to use. case $enable_symvers in diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 7b82ce8..4ed683c 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -178,7 +178,6 @@ GLIBCXX_3.4 { # std::[A-Zu-z]*; # std::underflow_error::u*; # std::underflow_error::~u*; - std::uncaught_exception*; std::unexpected*; std::[A-Zv-z]*; std::_List_node_base::hook*; @@ -1024,6 +1023,9 @@ GLIBCXX_3.4 { _ZNSt6locale5_Impl19_M_replace_categoryEPKS0_PKPKNS_2idE; _ZNSt6locale5_Impl21_M_replace_categoriesEPKS0_i; +# std::uncaught_exception() +_ZSt18uncaught_exceptionv; + # DO NOT DELETE THIS LINE. Port-specific symbols, if any, will be here. local: @@ -1860,6 +1862,12 @@ GLIBCXX_3.4.21 { } GLIBCXX_3.4.20; +GLIBCXX_3.4.22 { + +# std::uncaught_exception() +_ZSt19uncaught_exceptionsv; + +} GLIBCXX_3.4.21; # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { diff --git a/libstdc++-v3/libsupc++/eh_catch.cc b/libstdc++-v3/libsupc++/eh_catch.cc index 6302465..723ae56 100644 --- a/libstdc++-v3/libsupc++/eh_catch.cc +++ b/libstdc++-v3/libsupc++/eh_catch.cc @@ -139,3 +139,10 @@ std::uncaught_exception() throw() __cxa_eh_globals *globals = __cxa_get_globals (); return globals-uncaughtExceptions != 0; } + +int +std::uncaught_exceptions() throw() +{ + __cxa_eh_globals *globals = __cxa_get_globals (); + return globals-uncaughtExceptions; +} diff --git a/libstdc++-v3/libsupc++/exception b/libstdc++-v3/libsupc++/exception index d6c1855..069b33c 100644 --- a/libstdc++-v3/libsupc++/exception +++ b/libstdc++-v3/libsupc++/exception @@ -126,6 +126,11 @@ namespace std */ bool uncaught_exception() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__pure__)); +#if !defined(__STRICT_ANSI__) || __cplusplus 201402L +#define __cpp_lib_uncaught_exceptions 201411 + int uncaught_exceptions() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__pure__)); +#endif + // @} group exceptions } // namespace std diff --git a/libstdc++-v3/testsuite/18_support/uncaught_exceptions/uncaught_exceptions.cc b/libstdc++-v3/testsuite/18_support/uncaught_exceptions/uncaught_exceptions.cc new file mode 100644 index 000..001903e --- /dev/null +++ b/libstdc++-v3/testsuite/18_support/uncaught_exceptions/uncaught_exceptions.cc @@ -0,0 +1,162 @@ +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY
Re: [RS6000] pr65810, powerpc64 alignment of r2
On Thu, Apr 23, 2015 at 7:24 AM, Alan Modra amo...@gmail.com wrote: Revised patch, supporting linker that aligns the toc base. This fixes a thinko in offsettable_ok_by_alignment. It's not the absolute placement that matters, but the toc-pointer relative offset. So alignment of r2 also needs to be taken into account. Changing offsettable_ok_by_alignment has a ripple effect into the 'm' constraint so we also need to ensure rs6000_legitimize_reload_address does not create invalid toc-relative addresses. As found by gcc.dg/torture/builtin-math-2.c -Os. That's the reason for the use_toc_relative_ref change. I hope the size check along with reg_offset_p is sufficient here. It seems so, but it's difficult to be certain due to how hard it is to get just the right combination of reload conditions to trigger. Bootstrapped and regression tested powerpc64-linux and powerpc64le-linux, both with a new and old linker. OK for mainline? PR target/65810 * config/rs6000/rs6000.c (POWERPC64_TOC_POINTER_ALIGNMENT): Define. (offsettable_ok_by_alignment): Use minimum of decl and toc pointer alignment. Replace dead code with assertion. (use_toc_relative_ref): Add mode arg. Return false in -mcmodel=medium case if size exceeds toc pointer alignment. (rs6000_legitimize_reload_address): Update use_toc_relative_ref call. (rs6000_emit_move): Likewise. * configure.ac: Add linker toc pointer alignment check. * configure: Regenerate. * config.in: Regenerate. Okay. Thanks, David
Add --enable-default-pie option to configure GCC to generate PIE by default.
Hi Can this work be commited to Gcc 6? https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=33cd3712cae4721121bc37aefd09fc5ed7cd146a The work was posted to the patch liste even before Gcc 5 stage1 ended. And diffrent versions of it have been posted to the list of nummer of times. /Magnus G.
Re: [Patch, fortran] PR65792 - allocation of scalar elemental function with structure constructor fails
Dear Steve, Thanks for the review. I THINK that I know what I meant in the comment :-) I will commit tomorrow night. Cheers Paul On 26 April 2015 at 20:53, Steve Kargl s...@troutmask.apl.washington.edu wrote: On Sun, Apr 26, 2015 at 08:35:06PM +0200, Paul Richard Thomas wrote: --- 7062,7091 { if (expr-expr_type != EXPR_STRUCTURE) { + tree dealloc = NULL_TREE; gfc_init_se (se, NULL); gfc_conv_expr (se, expr); gfc_add_block_to_block (block, se.pre); + /* Prevent repeat evaluations in gfc_copy_alloc_comp by fixing the + expression in a temporary variable and deallocate is allocatable + components the copy to the result. */ Can you take a second shot at this comment? The and ... portions seems to be a little muddled. OK with after comment fix. -- Steve -- Outside of a dog, a book is a man's best friend. Inside of a dog it's too dark to read. Groucho Marx
Re: [libstdc++ PATCH] Add support for std::uncaught_exceptions
On 27/04/15 21:40 +0100, Jonathan Wakely wrote: Tested x86_64-linux and powerpc64le-linux. Committed to trunk. The baseline_symbols changes aren't needed now and I tweaked the gnu.ver file slightly. Ville noticed a typo in a comment I added, fixed like so. commit c595cfa88c9d38f333b262635a1c32744e3ce054 Author: Jonathan Wakely jwak...@redhat.com Date: Mon Apr 27 21:51:53 2015 +0100 2015-04-27 Ville Voutilainen ville.voutilai...@gmail.com * config/abi/pre/gnu.ver: Fix comment. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 4ed683c..2da04e4 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -1864,7 +1864,7 @@ GLIBCXX_3.4.21 { GLIBCXX_3.4.22 { -# std::uncaught_exception() +# std::uncaught_exceptions() _ZSt19uncaught_exceptionsv; } GLIBCXX_3.4.21;
Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
On 04/21/2015 11:33 AM, Kyrill Tkachov wrote: On 21/04/15 15:09, Jeff Law wrote: On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. So what should be the action plan here? I can't add an assert on positive result as a negative result is valid. We want to catch the case where this would cause trouble on pa, or change the patch until we're confident that it's fine for pa. That being said, reading the documentation of STACK_GROWS_UPWARD and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case where this would cause trouble on pa. Is the problem that in the function: +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) +return -1; + + return INTVAL (sub); +} for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, so the function should something like the following? So I had to go back and compile some simple examples. References to outgoing arguments will be SP relative. References to the incoming arguments will be ARGP relative. And that brings me to the another issue. Isn't X in this context the incoming argument slot and the destination an outgoing argument slot? If so, the approach of memory_load_overlap simply won't work on a target with calling conventions like the PA. And you might really want to consider punting for these kind of calling conventions If you hadn't already done the work, I'd suggest punting for any case where we have args partially in regs and partially in memory :-) More thoughts when I can get an hour or two to remind myself how all this stuff works on the PA. I will note that testing on the PA is unlikely to show anything simply because it uses 8 parameter passing registers. So it's rare to pass anything in memory at all. Even rarer to have something partially in memory and partially in registers. Jeff
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law writes: On 04/16/2015 05:04 AM, Jiong Wang wrote: This is a rework of https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html After second thinking, I feel it's better to fix this in earlier stage during RTL expand which is more generic, and we also avoid making the already complex combine pass complexer. Currently gcc expand wide mode left shift to some generic complex instruction sequences, while if we have known the high part of wide mode all comes from sign extension, the expand logic could be simplifed. Given the following example, T A = (T) B const_imm_shift We know the high part of A are all comes from sign extension, if * T is the next wider type of word_mode. For example, for aarch64, if type T is 128int (TImode), and B is with type SImode or DImode, then tree analyzer know that the high part of TImode result all comes from sign extension, and kept them in range info. | T | | high | low | |- sizel -| For above example, we could simplify the expand logic into 1. low = low const_imm_shift; 2. high = low (sizel - const_imm_shift) */ We can utilize the arithmetic right shift to do the sign extension. Those reduntant instructions will be optimized out later. For actual .s improvement, AArch64 === __int128_t foo (int data) { return (__int128_t) data 50; } old: sxtwx2, w0 asr x1, x2, 63 lsl x0, x2, 50 lsl x1, x1, 50 orr x1, x1, x2, lsr 14 new: sxtwx1, w0 lsl x0, x1, 50 asr x1, x1, 14 ARM (.fpu softvfp) === long long shift (int data) { return (long long) data 20; } old: stmfd sp!, {r4, r5} mov r5, r0, asr #31 mov r3, r0 mov r0, r0, asl #20 mov r1, r5, asl #20 orr r1, r1, r3, lsr #12 ldmfd sp!, {r4, r5} bx lr new: mov r1, r0 mov r0, r0, asl #20 mov r1, r1, asr #12 bx lr Test x86 bootstrap OK, regression test OK. AArch64 bootstrap OK, regression test on board OK. Regards, Jiong 2015-04-116 Jiong.Wang jiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Take tree range info into account when expanding LSHIFT_EXPR. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Ditto. * gcc.target/aarch64/ashlti3_1.c: Ditto. * gcc.target/arm/ashldisi_1.c: Ditto. Funny, I find myself wanting this transformation in both places :-) Expansion time so that we generate efficient code from the start and combine to catch those cases which are too complex to see at expansion, but due to other optimizations become visible in the combiner. Sadly, it's been fairly common practice for targets to define double-word shift patterns which catch a variety of special cases. Ports will have to choose between using those patterns and exploiting your work. I'd be tempted to generate a double-word shift by the given constant and check its cost vs the single word shifts. What happens if there's an overlap between t_low and low? Won't the lshift clobber low and thus we get the right value for the rshift in that case? Jeff, Sorry, I can't understand the meaning of overlap between t_low and low, assume right in right value means the opposite of left not correct. So what you mean is t_low and low share the same pseudo regiser? or you mean if we are shifting a value across the word boundary? like the following. | double word | | t_high | t_low | |- low -| for above situation, the simplified two instruction sequence do works. t_low = low const_imm_shift ; t_high = low (sizel - const_imm_shift) I attached the expand result for a simple testcase below. I appreicate if you could comment on the RTL. Thanks. __int128_t foo (int data) { return (__int128_t) data 50; } foo.c.188t.optimized === foo (int data) { __int128 _2; __int128 _3; bb 2: _2 = (__int128) data_1(D); _3 = _2 50; return _3; } foo.c.189r.expand === (insn 2 4 3 2 (set (reg/v:SI 76 [ data ]) (reg:SI 0 x0 [ data ])) foo.c:3 -1 (nil)) (insn 6 3 7 2 (set (reg:DI 79) (sign_extend:DI (reg/v:SI 76 [ data ]))) foo.c:4 -1 (nil)) (insn 7 6 8 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 0) (reg:DI 79)) foo.c:4 -1 (nil)) (insn 8 7 9 2 (set (reg:DI 80) (ashiftrt:DI (reg:DI 79) (const_int 63 [0x3f]))) foo.c:4 -1 (nil)) (insn 9 8 10 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 8) (reg:DI 80)) foo.c:4 -1 (nil)) ^ ~ sign extend SImode data into TImode _2 (r78) (insn 10 9 11 2 (set (subreg:DI (reg:TI 77 [D.2677 ]) 0) (ashift:DI (subreg:DI (reg:TI 78 [ D.2677 ]) 0)
Re: [PATCH] libstdc++: Fix exceptions being generated when compiling with -fno-exceptions
On 14/04/15 12:17 -0300, Federico Lenarduzzi wrote: When the libstdc++ is compiled, the compiler sets the std::terminate_handler function with __verbose_terminate_handler() or std::abort() depending on _GLIBCXX_HOSTED _GLIBCXX_VERBOSE being true or false. However, even if we compile with -fno-exceptions, the compiler will use __verbose_terminate_handler(), which uses exceptions. Therefore, the library is not fully exception-free. This patch adds a check for __EXCEPTIONS to the #if _GLIBCXX_HOSTED _GLIBCXX_VERBOSE condition. If __EXCEPTIONS is defined, the compiler will use __verbose_terminate_handler() as a termination function; otherwise it'll use std::abort() which doesn't have exceptions. It also makes std::uncaught_exception() throw() return false if __EXCEPTIONS is not defined. I've committed this slightly revised version using __cpp_exceptions instead, and changing the new std::uncaught_exceptions() function too. Tested x86_64-linux and powerpc64le-linux, and also built with --enable-libstdcxx-flags=-fno-exceptions on x86_64-linux. commit 80bc538bd64c7f6b9a21ec5d1a3bc5b08bc8770b Author: Jonathan Wakely jwak...@redhat.com Date: Mon Apr 27 14:04:50 2015 +0100 2015-04-27 Federico Lenarduzzi federico.lenardu...@tallertechnologies.com Jonathan Wakely jwak...@redhat.com * libsupc++/eh_catch.cc (uncaught_exception, uncaught_exceptions): Return false or zero if the library is built without exceptions. * libsupc++/eh_term_handler.cc: Disable verbose terminate handler if the library is built without exceptions. diff --git a/libstdc++-v3/libsupc++/eh_catch.cc b/libstdc++-v3/libsupc++/eh_catch.cc index 723ae56..44fde79 100644 --- a/libstdc++-v3/libsupc++/eh_catch.cc +++ b/libstdc++-v3/libsupc++/eh_catch.cc @@ -136,13 +136,21 @@ __cxxabiv1::__cxa_end_catch () bool std::uncaught_exception() throw() { +#if __cpp_exceptions __cxa_eh_globals *globals = __cxa_get_globals (); return globals-uncaughtExceptions != 0; +#else + return false; +#endif } int std::uncaught_exceptions() throw() { +#if __cpp_exceptions __cxa_eh_globals *globals = __cxa_get_globals (); return globals-uncaughtExceptions; +#else + return 0; +#endif } diff --git a/libstdc++-v3/libsupc++/eh_term_handler.cc b/libstdc++-v3/libsupc++/eh_term_handler.cc index 46acee8..0d6ea2b 100644 --- a/libstdc++-v3/libsupc++/eh_term_handler.cc +++ b/libstdc++-v3/libsupc++/eh_term_handler.cc @@ -32,7 +32,7 @@ --disable-libstdcxx-verbose and rebuilding the library. In a freestanding environment, we default to this latter approach. */ -#if _GLIBCXX_HOSTED _GLIBCXX_VERBOSE +#if _GLIBCXX_HOSTED _GLIBCXX_VERBOSE __cpp_exceptions /* The current installed user handler. */ std::terminate_handler __cxxabiv1::__terminate_handler = __gnu_cxx::__verbose_terminate_handler;
Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses
I'm sending an updated patch (rebased to recent trunk, bootstrapped and regtested on x86_64-unknown-linux-gnu). On 04/25/2015 02:49 PM, Richard Sandiford wrote: FWIW I think the split between label_rtx and live_label_rtx is good, but I think we should give them different names. The first one is returning only a position in the instruction stream, the second is returning a jump target. I think we should rename both of them to make that distinction clearer. I renamed live_label_rtx to jump_target_rtx. But I'm not sure if it is appropriate (so, perhaps, you could give some advice about the right names for these functions?) I think the eventual aim would be to have rtx_jump_insn member functions that get and set the jump label as an rtx_insn, with JUMP_LABEL_AS_INSN being a stepping stone towards that. In cases like this it might make more sense to ensure old_jump has the right type (rtx_jump_insn) and go straight to the member functions, rather than switching to JUMP_LABEL_AS_INSN now and then having to rewrite it later. I added the member functions. The problem is that JUMP_LABEL does not always satisfy the current invariant of rtx_insn: it can also be an RTL expression of type RETURN or SIMPLE_RETURN. Formatting nit, but the line break should be before ? rather than after. Fixed. This preserves the behaviour of the original code but I'm not sure it's worth it. I doubt the distinction between: gcc_assert (JUMP_P (x)); and: gcc_checking_assert (JUMP_P (x)); was ever very scientific. Seems like we should take this refactoring as an opportunity to make the checking more consistent. Fixed (removed assert_as_a). That seems pretty heavy-weight for LRA-local code. Also, the long-term plan is for INSN_LIST and rtx_insn to be in separate hierarchies, at which point we'd have no alias-safe way to distinguish them. usage_insns isn't a GC structure and isn't live across a GC collection, so I don't think we need these lists to be rtxes at all. OK, reverted changes in LRA code for now. I think this should be a separate patch then. -- Regards, Mikhail Maltsev diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index c2a3be3..ae726e5 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -1745,9 +1745,11 @@ set_edge_can_fallthru_flag (void) continue; if (!any_condjump_p (BB_END (bb))) continue; - if (!invert_jump (BB_END (bb), JUMP_LABEL (BB_END (bb)), 0)) + + rtx_jump_insn *bb_end_jump = as_a rtx_jump_insn * (BB_END (bb)); + if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0)) continue; - invert_jump (BB_END (bb), JUMP_LABEL (BB_END (bb)), 0); + invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0); EDGE_SUCC (bb, 0)-flags |= EDGE_CAN_FALLTHRU; EDGE_SUCC (bb, 1)-flags |= EDGE_CAN_FALLTHRU; } @@ -1902,9 +1904,15 @@ fix_up_fall_thru_edges (void) fall_thru_label = block_label (fall_thru-dest); - if (old_jump JUMP_P (old_jump) fall_thru_label) - invert_worked = invert_jump (old_jump, - fall_thru_label,0); + if (old_jump fall_thru_label) +{ + rtx_jump_insn *old_jump_insn = + dyn_cast rtx_jump_insn * (old_jump); + if (old_jump_insn) +invert_worked = invert_jump (old_jump_insn, + fall_thru_label, 0); +} + if (invert_worked) { fall_thru-flags = ~EDGE_FALLTHRU; @@ -2021,10 +2029,9 @@ fix_crossing_conditional_branches (void) edge succ2; edge crossing_edge; edge new_edge; - rtx_insn *old_jump; rtx set_src; rtx old_label = NULL_RTX; - rtx new_label; + rtx_code_label *new_label; FOR_EACH_BB_FN (cur_bb, cfun) { @@ -2049,7 +2056,7 @@ fix_crossing_conditional_branches (void) if (crossing_edge) { - old_jump = BB_END (cur_bb); + rtx_jump_insn *old_jump = as_a rtx_jump_insn * (BB_END (cur_bb)); /* Check to make sure the jump instruction is a conditional jump. */ @@ -2088,7 +2095,8 @@ fix_crossing_conditional_branches (void) else { basic_block last_bb; - rtx_insn *new_jump; + rtx_insn *old_label_insn; + rtx_jump_insn *new_jump; /* Create new basic block to be dest for conditional jump. */ @@ -2099,9 +2107,10 @@ fix_crossing_conditional_branches (void) emit_label (new_label); gcc_assert (GET_CODE (old_label) == LABEL_REF); - old_label = JUMP_LABEL (old_jump); - new_jump = emit_jump_insn (gen_jump (old_label)); - JUMP_LABEL (new_jump) = old_label; + old_label_insn = old_jump-jump_target (); + new_jump = as_a rtx_jump_insn * +(emit_jump_insn (gen_jump (old_label_insn))); + new_jump-set_jump_target (old_label_insn); last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)-prev_bb; new_bb = create_basic_block (new_label, new_jump, last_bb); diff --git a/gcc/bt-load.c
Re: [PATCH libstdc++] Fix for std::uncaught_exception (PR 62258)
On 02/02/15 03:37 +0100, Michael Hanselmann wrote: Calls to `std::uncaught_exception` after calling `std::rethrow_exception' always return `true' when `std::uncaught_exception' should return `false' unless an exception is in flight. `std::rethrow_exception' does not update `__cxa_eh_globals::uncaughtExceptions' while the following call to `__cxa_begin_catch' decrements it. This fixes PR 62258. The original two-line patch was created by Dmitry Prokoptsev. Michael Hanselmann implemented a testcase. Committed to trunk - thanks for the patch and test. commit 1cfba1875f21f485fe82e8c118b2355c4714bcdc Author: Jonathan Wakely jwak...@redhat.com Date: Mon Apr 27 13:48:23 2015 +0100 2015-04-27 Dmitry Prokoptsev dprokopt...@gmail.com Michael Hanselmann pub...@hansmi.ch PR libstdc++/62258 * libsupc++/eh_ptr.cc (rethrow_exception): Increment count of uncaught exceptions. * testsuite/18_support/exception_ptr/62258.cc: New. diff --git a/libstdc++-v3/libsupc++/eh_ptr.cc b/libstdc++-v3/libsupc++/eh_ptr.cc index 685c9e0..dae9246 100644 --- a/libstdc++-v3/libsupc++/eh_ptr.cc +++ b/libstdc++-v3/libsupc++/eh_ptr.cc @@ -245,6 +245,9 @@ std::rethrow_exception(std::exception_ptr ep) __GXX_INIT_DEPENDENT_EXCEPTION_CLASS(dep-unwindHeader.exception_class); dep-unwindHeader.exception_cleanup = __gxx_dependent_exception_cleanup; + __cxa_eh_globals *globals = __cxa_get_globals (); + globals-uncaughtExceptions += 1; + #ifdef _GLIBCXX_SJLJ_EXCEPTIONS _Unwind_SjLj_RaiseException (dep-unwindHeader); #else diff --git a/libstdc++-v3/testsuite/18_support/exception_ptr/62258.cc b/libstdc++-v3/testsuite/18_support/exception_ptr/62258.cc new file mode 100644 index 000..3734f19 --- /dev/null +++ b/libstdc++-v3/testsuite/18_support/exception_ptr/62258.cc @@ -0,0 +1,61 @@ +// { dg-options -std=gnu++11 } +// { dg-require-atomic-builtins } + +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// PR libstdc++/62258 + +#include exception +#include testsuite_hooks.h + +struct check_on_destruct +{ + ~check_on_destruct(); +}; + +check_on_destruct::~check_on_destruct() +{ + VERIFY(std::uncaught_exception()); +} + +int main () +{ + VERIFY(!std::uncaught_exception()); + + try +{ + check_on_destruct check; + + try +{ + throw 1; +} + catch (...) +{ + VERIFY(!std::uncaught_exception()); + + std::rethrow_exception(std::current_exception()); +} +} + catch (...) +{ + VERIFY(!std::uncaught_exception()); +} + + VERIFY(!std::uncaught_exception()); +}
Re: [PATCH][PR65818][bootstrap,hppa] Return side-effect free result in gimplify_va_arg_internal
On Sat, Apr 25, 2015 at 2:35 PM, Tom de Vries tom_devr...@mentor.com wrote: Hi, this patch fixes PR65818, and hppa bootstrap. When compiling gcc/libiberty/vprintf-support.c, the following va_arg is compiled: ... (void) __builtin_va_arg(ap, double); ... This results in the following ifn_va_arg at gimple level, with a NULL_TREE lhs: ... VA_ARG (ap, 0B); ... We start expanding the ifn_va_arg in expand_ifn_va_arg_1 by calling gimplify_va_arg_internal: ... expr = gimplify_va_arg_internal (ap, type, gimple_location (stmt), pre, post); ... Subsequently, because the lhs is NULL_TREE, we skip generating the assign to the lhs: ... lhs = gimple_call_lhs (stmt); if (lhs != NULL_TREE) { gcc_assert (useless_type_conversion_p (TREE_TYPE (lhs), type)); if (gimple_call_num_args (stmt) == 3) { /* We've transported the size of with WITH_SIZE_EXPR here as the 3rd argument of the internal fn call. Now reinstate it. */ tree size = gimple_call_arg (stmt, 2); expr = build2 (WITH_SIZE_EXPR, TREE_TYPE (expr), expr, size); } /* We use gimplify_assign here, rather than gimple_build_assign, because gimple_assign knows how to deal with variable-sized types. */ gimplify_assign (lhs, expr, pre); } ... We assume here that any side-effects related to updating ap have been generated into pre/post by gimplify_va_arg_internal, and that it's safe to ignore expr. Turns out, that's not the case for hppa. The target hook hppa_gimplify_va_arg_expr (called from gimplify_va_arg_internal) returns an expression that still contains a side-effect: ... *(double *) (ap = ap + 4294967288 4294967288B) ... This patch fixes that by gimplifying the address expression of the mem-ref returned by the target hook (borrowing code from gimplify_expr, case MEM_REF). Bootstrapped and reg-tested on x86_64. Bootstrapped and reg-tested on hppa2.0w-hp-hpux11.11. OK for trunk? Hmm, that assert looks suspicious... Can't you simply always do gimplify_expr (expr, pre_p, post_p, is_gimple_lvalue, fb_lvalue); ? Richard. Thanks, - Tom
Improve LTO type checking during symtab merging
Hi, this patch strengten ODR checking for requiring match on declarations (if both of them are defined in C++ tranlsation unit). Currently ipa-symtab.c will warn only if declarations are incompatible in useless_type_conversion sense. Now we warn more often, i.e. in the following example: t.C: char a; t2.C: #include stdtype.h extern int8_t a; void *b=a; Will lead to warning: t2.C:2:15: warning: type of �a� does not match original declaration extern int8_t a; ^ built-in: note: type name �char� should match type name �signed char� /usr/include/stdint.h:37:22: note: the incompatible type is defined here typedef signed char int8_t; ^ /aux/hubicka/t.C:1:6: note: previously declared here char a; ^ Funnilly enough we do not get warning when t2.C is just signed char or unsigned char because that is builtin type and thus non-ODR. Something I need to deal with incrementally. I also added call to warn_types_mismatch that outputs extra note about what is wrong with the type: in C++ the mismatch is often carried over several levels on declarations and it is hard to work out the reason without extra info. Richard, According to comments, it seems that the code was tailored to not output warning when we can avoid wrong code issues on mismatches based on fact that linker would be also silent. I am particularly entertained by the following hunk which I killed: - if (!types_compatible_p (TREE_TYPE (prevailing_decl), - TREE_TYPE (decl))) - /* If we don't have a merged type yet...sigh. The linker - wouldn't complain if the types were mismatched, so we - probably shouldn't either. Just use the type from - whichever decl appears to be associated with the - definition. If for some odd reason neither decl is, the - older one wins. */ - (void) 0; It is fun we go through the trouble of comparing types and then do nothing ;) Type merging is also completed at this time, so at least the comment looks out of date. I think as QOI issue, we do want to warn in cases the code is inconsistent (it is pretty much surely a bug), but perhaps we may want to have the warnings with -Wodr only and use slightly different warning and different -W switch for warnings that unavoidably leads to wrong code surprises? This should be easy to implement by adding two levels into new warn_type_compatibility_p predicate. I am personaly also OK however with warning always or making all the warnings -Wodr. What do you think? Incrementally I am heading towards proper definition of decl compatibility that I can plug into symtab merging and avoid merging incompatible decls (so FORTIFY_SOURCE works). lto-symtab and ipa-icf both have some knowledge of the problem, I want to get both right and factor out common logic. Other improvement is to preserve the ODR type info when non-ODR variant previals so one can diagnose clash in between C++ units even in mixed language linktimes. Bootstrapped/regtested x86_64-linux with no new ODR warnings hit by -Werror. Honza * ipa-devirt.c (register_odr_type): Be ready for non-odr main variant of odr type. * ipa-utils.h (warn_types_mismatch): New. * lto/lto-symtab.c (warn_type_compatibility_p): Break out from ... (lto_symtab_merge): ... here. (lto_symtab_merge_decls_2): Improve warnings. Index: ipa-devirt.c === --- ipa-devirt.c(revision 222391) +++ ipa-devirt.c(working copy) @@ -2029,10 +2030,14 @@ register_odr_type (tree type) if (in_lto_p) odr_vtable_hash = new odr_vtable_hash_type (23); } - /* Arrange things to be nicer and insert main variants first. */ - if (odr_type_p (TYPE_MAIN_VARIANT (type))) + /* Arrange things to be nicer and insert main variants first. + ??? fundamental prerecorded types do not have mangled names; this + makes it possible that non-ODR type is main_odr_variant of ODR type. + Things may get smoother if LTO FE set mangled name of those types same + way as C++ FE does. */ + if (odr_type_p (main_odr_variant (TYPE_MAIN_VARIANT (type get_odr_type (TYPE_MAIN_VARIANT (type), true); - if (TYPE_MAIN_VARIANT (type) != type) + if (TYPE_MAIN_VARIANT (type) != type odr_type_p (main_odr_variant (type))) get_odr_type (type, true); } Index: ipa-utils.h === --- ipa-utils.h (revision 222391) +++ ipa-utils.h (working copy) @@ -84,6 +84,7 @@ bool types_must_be_same_for_odr (tree, t bool types_odr_comparable (tree, tree, bool strict = false); cgraph_node *try_speculative_devirtualization (tree, HOST_WIDE_INT, ipa_polymorphic_call_context); +void warn_types_mismatch (tree t1, tree t2); /* Return vector containing possible targets of polymorphic call E. If
ping*3: [PATCH, x86] [PR target/64835] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook
Hi, I'd like to re-ping the following patch for GCC 5.2. It fixes the __attribute__ ((__optimize__ (...))) on x86. Testcase is in the patch. thanks Christian On 04/13/2015 04:24 PM, Christian Bruel wrote: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00243.html thanks, Christian
Re: [PATCH] Fix VRP update_value_range and caller (PR tree-optimization/65875)
On Mon, Apr 27, 2015 at 10:05:21AM +0200, Richard Biener wrote: --- gcc/tree-vrp.c.jj 2015-04-20 14:35:39.0 +0200 +++ gcc/tree-vrp.c 2015-04-24 18:10:41.321367440 +0200 @@ -892,7 +892,12 @@ update_value_range (const_tree var, valu UNDEFINED or from VARYING. */ if (new_vr-type == VR_UNDEFINED || old_vr-type == VR_VARYING) - set_value_range_to_varying (old_vr); + { + BITMAP_FREE (new_vr-equiv); + set_value_range_to_varying (old_vr); + set_value_range_to_varying (new_vr); + return true; Actually we didn't change anything here (old_vr-type is VARYING already, so we shouldn't even have visited the stmt defining the SSA name again... eventually your fix below fixes that. On the testcase, old_vr wasn't actually VARYING, but new_vr was UNDEFINED (a result of intersecting disjoint ranges). While for old_vr-type == VR_VARYING I agree we shouldn't have been called on this. And turning a VR_RANGE into VR_UNDEFINED is going in the wrong direction in the lattice. Jakub
Re: [PATCH] Fix VRP update_value_range and caller (PR tree-optimization/65875)
On Mon, 27 Apr 2015, Jakub Jelinek wrote: On Mon, Apr 27, 2015 at 10:05:21AM +0200, Richard Biener wrote: --- gcc/tree-vrp.c.jj 2015-04-20 14:35:39.0 +0200 +++ gcc/tree-vrp.c2015-04-24 18:10:41.321367440 +0200 @@ -892,7 +892,12 @@ update_value_range (const_tree var, valu UNDEFINED or from VARYING. */ if (new_vr-type == VR_UNDEFINED || old_vr-type == VR_VARYING) - set_value_range_to_varying (old_vr); + { + BITMAP_FREE (new_vr-equiv); + set_value_range_to_varying (old_vr); + set_value_range_to_varying (new_vr); + return true; Actually we didn't change anything here (old_vr-type is VARYING already, so we shouldn't even have visited the stmt defining the SSA name again... eventually your fix below fixes that. On the testcase, old_vr wasn't actually VARYING, but new_vr was UNDEFINED (a result of intersecting disjoint ranges). While for old_vr-type == VR_VARYING I agree we shouldn't have been called on this. And turning a VR_RANGE into VR_UNDEFINED is going in the wrong direction in the lattice. Ah, I misread the || for a . The patch is ok with just dropping the == VR_VARYING check. Thanks, Richard.
Fix librayr name of __builtin_allocal_with_align
Hi, build_common_builtin_nodes declares both __builtin_alloca and __builtin_alloca_with_align to have library name alloca. This actually triggers warning in an updated ODR violation detector on alloca being declared twice. __builtin_alloca_with_align IMO do not have library equivalent and I think this is a pasto (__builtin_alloca_with_align is not documented in extend.texi). It is not clear to me if there was some intention behind this oddity though. I have bootstrapped/regtested x86_64 with the following. OK? Honza * tree.c (build_common_builtin_nodes): Do not build __builtin_alloca_with_align as equivalent of library alloca. Index: tree.c === --- tree.c (revision 222391) +++ tree.c (working copy) @@ -10088,7 +10098,8 @@ build_common_builtin_nodes (void) ftype = build_function_type_list (ptr_type_node, size_type_node, size_type_node, NULL_TREE); local_define_builtin (__builtin_alloca_with_align, ftype, - BUILT_IN_ALLOCA_WITH_ALIGN, alloca, + BUILT_IN_ALLOCA_WITH_ALIGN, + __builtin_alloca_with_align, ECF_MALLOC | ECF_NOTHROW | ECF_LEAF); /* If we're checking the stack, `alloca' can throw. */
Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection
On 22-04-15 15:50, Richard Biener wrote: On Wed, Apr 22, 2015 at 3:38 PM, Tom de Vries tom_devr...@mentor.com wrote: On 22-04-15 10:06, Richard Biener wrote: On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, this patch fixes PR65823. SNIP The patches fixes the problem by using operand_equal_p to do the equality test. Bootstrapped and reg-tested on x86_64. Did minimal non-bootstrap build on arm and reg-tested. OK for trunk? Hmm, ok for now. Committed. But I wonder if we can't fix things to not require that odd extra copy. Agreed, that would be good. In fact that we introduce ap.1 looks completely bogus to me AFAICT, it's introduced by gimplify_arg ('argp') because argp (a PARM_DECL) is not addressable. (and we don't in this case for arm). Note that the pointer compare obviously fails because we unshare the expression. So ... what breaks if we simply remove this odd fixup? [ Originally mentioned at https://gcc.gnu.org/ml/gcc/2015-02/msg00011.html . ] I've committed gcc.target/x86_64/abi/callabi/vaarg-6.c specifically as a minimal version of this problem. If we remove the ap_copy fixup, at original we have: ... ;; Function do_cpy2 (null) { char * e; char * e; e = VA_ARG_EXPR argp; e = VA_ARG_EXPR argp; if (e != b) { abort (); } } ... and after gimplify we have: ... do_cpy2 (char * argp) { char * argp.1; char * argp.2; char * b.3; char * e; argp.1 = argp; e = VA_ARG (argp.1, 0B); argp.2 = argp; e = VA_ARG (argp.2, 0B); b.3 = b; if (e != b.3) goto D.1373; else goto D.1374; D.1373: abort (); D.1374: } ... The second VA_ARG uses argp.2, which is a copy of argp, which is unmodified by the first VA_ARG. Using attached _proof-of-concept_ patch, I get callabi.exp working without the ap_copy, still at the cost of one 'argp.1 = argp' copy though: ... do_cpy2 (char * argp) { char * argp.1; char * b.2; char * e; argp.1 = argp; e = VA_ARG (argp.1, 0B); e = VA_ARG (argp.1, 0B); b.2 = b; if (e != b.2) goto D.1372; else goto D.1373; D.1372: abort (); D.1373: } ... But perhaps there's an easier way? Hum, simply Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 222320) +++ gcc/gimplify.c (working copy) @@ -9419,6 +9419,7 @@ gimplify_va_arg_expr (tree *expr_p, gimp } /* Transform a VA_ARG_EXPR into an VA_ARG internal function. */ + mark_addressable (valist); ap = build_fold_addr_expr_loc (loc, valist); tag = build_int_cst (build_pointer_type (type), 0); *expr_p = build_call_expr_internal_loc (loc, IFN_VA_ARG, type, 2, ap, tag); That sort of works, but causes other problems. Filed PR65887 - 'remove va_arg ap copies' to track this issue. Thanks, - Tom
Re: Fix librayr name of __builtin_allocal_with_align
On Mon, 27 Apr 2015, Jan Hubicka wrote: Hi, build_common_builtin_nodes declares both __builtin_alloca and __builtin_alloca_with_align to have library name alloca. This actually triggers warning in an updated ODR violation detector on alloca being declared twice. __builtin_alloca_with_align IMO do not have library equivalent and I think this is a pasto (__builtin_alloca_with_align is not documented in extend.texi). It is not clear to me if there was some intention behind this oddity though. I have bootstrapped/regtested x86_64 with the following. OK? Yes. Thanks, Richard. Honza * tree.c (build_common_builtin_nodes): Do not build __builtin_alloca_with_align as equivalent of library alloca. Index: tree.c === --- tree.c(revision 222391) +++ tree.c(working copy) @@ -10088,7 +10098,8 @@ build_common_builtin_nodes (void) ftype = build_function_type_list (ptr_type_node, size_type_node, size_type_node, NULL_TREE); local_define_builtin (__builtin_alloca_with_align, ftype, - BUILT_IN_ALLOCA_WITH_ALIGN, alloca, + BUILT_IN_ALLOCA_WITH_ALIGN, + __builtin_alloca_with_align, ECF_MALLOC | ECF_NOTHROW | ECF_LEAF); /* If we're checking the stack, `alloca' can throw. */
Re: [PATCH] Fix VRP update_value_range and caller (PR tree-optimization/65875)
On Fri, 24 Apr 2015, Jakub Jelinek wrote: Hi! In vrp_visit_assignment_or_call we try to return SSA_PROP_VARYING if update_value_range returned true and the new range is VR_VARYING, but vrp_visit_phi_node fails to do that. Another thing is that if update_value_range decides to set_value_range_to_varying (old_vr); it doesn't update new_vr, so the caller doesn't know the new vr is varying (and calling get_value_range again sounds unnecessary). The following patch fixes it, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 5.2? 2015-04-24 Jakub Jelinek ja...@redhat.com PR tree-optimization/65875 * tree-vrp.c (update_value_range): If in is_new case setting old_vr to VR_VARYING, also set new_vr to it. (vrp_visit_phi_node): Return SSA_PROP_VARYING instead of SSA_PROP_INTERESTING if update_value_range returned true, but new range is VR_VARYING. * gcc.c-torture/compile/pr65875.c: New test. --- gcc/tree-vrp.c.jj 2015-04-20 14:35:39.0 +0200 +++ gcc/tree-vrp.c2015-04-24 18:10:41.321367440 +0200 @@ -892,7 +892,12 @@ update_value_range (const_tree var, valu UNDEFINED or from VARYING. */ if (new_vr-type == VR_UNDEFINED || old_vr-type == VR_VARYING) - set_value_range_to_varying (old_vr); + { + BITMAP_FREE (new_vr-equiv); + set_value_range_to_varying (old_vr); + set_value_range_to_varying (new_vr); + return true; Actually we didn't change anything here (old_vr-type is VARYING already, so we shouldn't even have visited the stmt defining the SSA name again... eventually your fix below fixes that. So I'd prefer to simply drop the hunk... (you might want to do some digging which rev. introduced it and if it came with a testcase...). Otherwise ok. Thanks, Richard. + } else set_value_range (old_vr, new_vr-type, new_vr-min, new_vr-max, new_vr-equiv); @@ -8941,6 +8946,9 @@ update_range: fprintf (dump_file, \n); } + if (vr_result.type == VR_VARYING) + return SSA_PROP_VARYING; + return SSA_PROP_INTERESTING; } --- gcc/testsuite/gcc.c-torture/compile/pr65875.c.jj 2015-04-24 18:20:47.650595581 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr65875.c 2015-04-24 18:20:30.0 +0200 @@ -0,0 +1,24 @@ +/* PR tree-optimization/65875 */ + +int a, b, c, d, e, f, g; + +void +foo (void) +{ + long h = 0, i; + if (g 0) +i = -g; + for (; b;) +for (; c;) + if (e) + h = 1; + for (; f;) +if (a) + break; + if (h i) +while (h i) + { + d = 0; + h--; + } +} Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook
On Wed, Feb 4, 2015 at 2:21 PM, Christian Bruel christian.br...@st.com wrote: While trying to reduce the PR64835 case for ARM and x86, I noticed that the alignment flags are cleared for x86 when attribute optimized is used. With the attached testcases, the visible effects are twofold : 1) Functions compiled in with attribute optimize (-O2) are not aligned as if they were with the -O2 flag. 2) can_inline_edge_p fails because opts_for_fn (caller-decl) != opts_for_fn (callee-decl)) even-though they are compiled with the same optimization level. 2015-02-06 Christian Bruel christian.br...@st.com PR target/64835 * config/i386/i386.c (ix86_default_align): New function. (ix86_override_options_after_change): Call ix86_default_align. (TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): New hook. (ix86_override_options_after_change): New function. 2015-02-06 Christian Bruel christian.br...@st.com PR target/64835 * gcc.dg/ipa/iinline-attr.c: New test. * gcc.target/i386/iinline-attr-2.c: New test. OK for mainline. Thanks, Uros
Re: Fix xstormy16 handling of HImode predicates
Hi Richard, gcc/ * config/stormy16/predicates.md (xs_hi_general_operand): Delete. (xs_hi_nonmemory_operand): Remove error. * config/stormy16/stormy16.md (movhi, movhi_internal): Use general_operand rather than xs_hi_general_operand. Approved - please apply. Cheers Nick
Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
On 2015-04-27 6:12 AM, Kyrill Tkachov wrote: On 22/04/15 12:51, Kyrill Tkachov wrote: On 21/04/15 18:33, Kyrill Tkachov wrote: On 21/04/15 15:09, Jeff Law wrote: On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. So what should be the action plan here? I can't add an assert on positive result as a negative result is valid. We want to catch the case where this would cause trouble on pa, or change the patch until we're confident that it's fine for pa. That being said, reading the documentation of STACK_GROWS_UPWARD and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case where this would cause trouble on pa. Is the problem that in the function: +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) +return -1; + + return INTVAL (sub); +} for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, so the function should something like the following? static int memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) { #ifdef ARGS_GROW_DOWNWARD rtx tmp = plus_constant (Pmode, x, -size); #else rtx tmp = plus_constant (Pmode, x, size); #endif rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); if (!CONST_INT_P (sub)) return -1; #ifdef ARGS_GROW_DOWNWARD return INTVAL (-sub); #else return INTVAL (sub); #endif } now, say for x == sp + 4, y == sp + 8, size == 16: This would be a problematic case for arm, so this code on arm (where ARGS_GROW_DOWNWARD is *not* defined) would return 12, which is the number of bytes that overlap. On a target where ARGS_GROW_DOWNWARD is defined this would return -20, meaning that no overlap occurs (because we read in the descending direction from x, IIUC). Hi Jeff, Here's an attempt to make this more concrete. Only the memory_load_overlap function has changed. This time I tried to take into account the case when ARGS_GROW_DOWNWARD. Take the case where x == sp, y == sp + 8, size == 16. For arm, this would return 8 as that is the number of bytes that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it would return -1 as we're reading down from x rather than up towards y. In the case when x == sp + 8, y == sp, size == 16 This would return -1 on arm since we're reading upwards from x and thefore no overlap would happen. On pa, this would return 8, which I think is the right thing. But again, I don't have access to any pa means of testing. What do you think of this approach? Hi Dave, Would it be possible for you to test this patch on a 64-bit hppa or at least bootstrap it? https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html I started a build and test with your patch on hppa64-hp-hpux11.11 this morning. There is a concern that it may potentially affect the passing of complex arguments partially on the stack and partially in regs on pa because of the way the args and stack grow on that target. Unfortunately I don't have access to any hardware or simulators. It would help a lot with getting this patch in. If you write to linux-par...@vger.kernel.org, arrangements can be made for an account on a Debian parisc linux machine for development testing. Helge Deller has arranged for some new machines since we took over the Debian buildd infrastructure for parisc. More info is here: https://parisc.wiki.kernel.org/index.php/Main_Page Thanks, Kyrill Thanks, Kyrill P.S. I've included the testcase from Honggyu in the patch. 2015-04-22 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/65358 * expr.c (memory_load_overlap): New function. (emit_push_insn): When pushing partial args to the stack would clobber the register part load the overlapping part into a pseudo and put it into the hard reg after pushing. 2015-04-22 Honggyu Kim hong.gyu@lge.com PR target/65358 * gcc.dg/pr65358.c: New test.
[PATCH][AArch64] Fix operand costing logic for MINUS
Hi all, This fixes a case in aarch64 costs where we forgot to account for one of the operands in the MINUS case. This is important to get right as the mult synthesis code can ask for the cost of a shift+sub operation and put the shift part in any of the MINUS operands, expecting the cost function to break it down into a separate shift operation if the two cannot be combined. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill 2015-04-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_rtx_costs, MINUS): Properly account for both operand costs in simple case. commit 4859566692e4b9195b975632ed105b9c4b6ab765 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Mon Mar 2 10:18:22 2015 + [AArch64] Properly cost both operands of MINUS. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 7ffa7ee..5a3f887 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5844,7 +5844,8 @@ cost_minus: return true; } - *cost += rtx_cost (new_op1, MINUS, 1, speed); + *cost += rtx_cost (new_op1, MINUS, 1, speed) + + rtx_cost (op0, MINUS, 0, speed); if (speed) {
Re: [PATCH] Fix LTO option streaming
On Thu, 23 Apr 2015, Richard Biener wrote: On Thu, 23 Apr 2015, Jan Hubicka wrote: It looks like when transitioning to using target and optimization option nodes for compile-time to link-time option streaming you didn't adjust lto-opts.c nor lto-wrapper.c. The following fixes Yep, I assumed that lto-wrapper's merging is now redundant for optimization options, while it is still needed for global functions (-fPIC) that was not covered by my changes and since this was all bit late, it seemed I can just keep it for next stage1. the target option side (for SWITCHABLE_TARGET). Do not record any target options in the lto_opts section. Honza - I suppose we don't have any testcase that this works, I'll try to come up with sth. This also looks like a correctness issue to me. Thanks, that would be a good idea. So with SWITCHABLE_TARGET we have problem that after switching the options are misinterpretted? Ok, so with t.c --- extern int have_fma; extern double x, y, z; void do_fma (void); int main() { if (have_fma) do_fma (); else x = x + y * z; } t2.c --- int have_fma; volatile double x, y, z; void do_fma (void) { x = x + y * z; } and doing gcc -c -Ofast -flto t.c gcc -c -Ofast -mfma -flto t2.c gcc t.o t2.o I get -mfma on the lto1 command-line for the link. But I also get for main() target_option_node 0x76ac40d8 ix86_isa_flags (0x180022012) ix86_fpmath (0x2) target_flags (0xa0004c3) arch = 18 (k8) tune = 0 (generic) branch_cost = 3 -m64 -msse2 -msse -mmmx -mfxsr -m128bit-long-double -m80387 -mfp-ret-in-387 -mtls-direct-seg-refs -mvzeroupper -mavx256-split-unaligned-load -mavx256-split-unaligned-store -mfpmath=sse while for do_fma() target_option_node 0x76ac2e58 ix86_isa_flags (0x117809224000112) ix86_fpmath (0x2) target_flags (0xa0004c3) arch = 18 (k8) tune = 0 (generic) branch_cost = 3 -m64 -mfma -msse4.2 -msse4.1 -mssse3 -msse3 -msse2 -msse -mmmx -mfxsr -mpopcnt -mxsave -m128bit-long-double -m80387 -mfp-ret-in-387 -mtls-direct-seg-refs -mvzeroupper -mavx256-split-unaligned-load -mavx256-split-unaligned-store -mfpmath=sse and for some reason main () ends up _not_ having -mfma enabled. I'm not sure what disables it or how we arrive at the above set of options in the first place... does the TARGET_OPTION_NODE store the complete state of all target options? I had the impression it only stores modification? So it seems to work (and simply ignore the global -mfma flag). Not sure on whether by design or by luck ;) Richard. I think Jakub made patch to simply skip target-option-node streaming here, how this is supposed to work. Not all targets support target attributes and I think those that support them are those that are SWITCHABLE_TARGET - but maybe I am mistaken? We can do similar changes for optimize options, now, for all targets, no? I am not quite sure about this; not all options are per-function nature. I need to audit all the target options for LTO safety, I did some work on this for optimization options, but the target options are still very wild. For example I am not quite convinced mixing -malign-double -mno-align-double units will work as expected. Well, but once types are laid out specifying sth else at link time won't help. But yes, if we don't stream the option we have no chance to complain in lto-wrapper. It's just that we seem to not enter the merging machinery for single-file testcases so I chose to fix lto option streaming rather than lto-wrapper ... I'll have a closer look to lto-wrapper. But to complain about target option mismatches we'd have to call a target hook from lto-wrapper (or finally re-structure all this and merge options in lto1 itself). Richard. Honza Thanks, Richard. 2015-04-23 Richard Biener rguent...@suse.de * lto-opts.c (lto_write_options): Do not record target options if we are targeting a SWITCHABLE_TARGET target. We use target options on functions to transfer target flags from compile to link time. Index: gcc/lto-opts.c === *** gcc/lto-opts.c(revision 222360) --- gcc/lto-opts.c(working copy) *** lto_write_options (void) *** 219,224 --- 219,230 lto_stream_offload_p) continue; + /* Do not store target-specific options if we target a + SWITCHABLE_TARGET target. */ + if ((cl_options[option-opt_index].flags CL_TARGET) +SWITCHABLE_TARGET) + continue; + /* Drop options created from the gcc driver that will be rejected when passed on to the driver again. */ if (cl_options[option-opt_index].cl_reject_driver) -- Richard Biener rguent...@suse.de SUSE LINUX
Re: [PATCH, rs6000, testsuite, PR65456] Changes for unaligned vector load/store support on POWER8
On Mon, 2015-04-27 at 14:23 +0800, Bin.Cheng wrote: On Mon, Mar 30, 2015 at 1:42 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Index: gcc/testsuite/gcc.dg/vect/vect-33.c === --- gcc/testsuite/gcc.dg/vect/vect-33.c (revision 221118) +++ gcc/testsuite/gcc.dg/vect/vect-33.c (working copy) @@ -36,9 +36,10 @@ int main (void) return main1 (); } +/* vect_hw_misalign { ! vect64 } */ /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */ -/* { dg-final { scan-tree-dump Vectorizing an unaligned access vect { target { vect_hw_misalign { {! vect64} || vect_multiple_sizes } } } } } */ +/* { dg-final { scan-tree-dump Vectorizing an unaligned access vect { target { { { ! powerpc*-*-* } vect_hw_misalign } { { ! vect64 } || vect_multiple_sizes } } } } } */ /* { dg-final { scan-tree-dump Alignment of access forced using peeling vect { target { vector_alignment_reachable { vect64 {! vect_multiple_sizes} } } } } } */ /* { dg-final { scan-tree-dump-times Alignment of access forced using versioning 1 vect { target { { {! vector_alignment_reachable} || {! vect64} } {! vect_hw_misalign} } } } } */ /* { dg-final { cleanup-tree-dump vect } } */ Hi Bill, With this change, the test case is skipped on aarch64 now. Since it passed before, Is it expected to act like this on 64bit platforms? Hi Bin, No, that's a mistake on my part -- thanks for the report! That first added line was not intended to be part of the patch: +/* vect_hw_misalign { ! vect64 } */ Please try removing that line and verify that the patch succeeds again for ARM. Assuming so, I'll prepare a patch to fix this. It looks like this mistake was introduced only in this particular test, but please let me know if you see any other anomalies. Thanks very much! Bill PASS-NA: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects scan-tree-dump-times vect Vectorizing an unaligned access 0 PASS-NA: gcc.dg/vect/vect-33.c scan-tree-dump-times vect Vectorizing an unaligned access 0 Thanks, bin
Re: [PR64164] drop copyrename, integrate into expand
On Fri, Apr 24, 2015 at 3:56 AM, Alexandre Oliva aol...@redhat.com wrote: On Apr 6, 2015, Jeff Law l...@redhat.com wrote: So the bulk of the changes into this routine are really about picking a good leader, which presumably is how we're able to get the desired effects on debuginfo that we used to get from tree-ssa-copyrename.c? This has nothing to do with debuginfo, I'm afraid. We just had to keep track of parm and result decls to avoid coalescing them together, and parm decl default defs to promote them to leaders, because expand copies incoming REGs to pseudos in PARM_DECL's DECL_RTL. We should fill that in with the RTL created for the default def for the PARM_DECL. At the end, I believe we also copy the RESULT_DECL DECL_RTL to the actual return register or rtl. I didn't want to tackle the reworking of these expanders to avoid problems out of copying incoming parms to one pseudo and then reading it from another, as I observed before I made this change. I'm now tackling that, so that we can refrain from touching the base vars altogether, and not refrain from coalescing parms and results. Hmmm, so the real issue here is the expansion setup of parms and results. I hadn't pondered that aspect. I'd encourage fixing the expansion code too if you can see a path for that. That was the trickiest bit of the patch: getting assign_parms to use the out-of-SSA-chosen RTL for the (default def of the) param instead of creating a pseudo or stack slot of its own, especially when we create a .result_ptr decl and there is an incoming by-ref result_decl, in which case we ought to use the same SSA-assigned pseudo for both. Another case worth mentioning is that in which a param is unused: there is no default def for it, but in the non-optimized case, we have to assign it to the same location. I've used the DECL_RTL itself to carry the information in this case, at least in the non-optimized case, in which we know all SSA_NAMEs associated with each param *will* be assigned to the same partition, and use the same RTL. If we do optimize, the param may get multiple locations, and DECL_RTL will be cleared. That's fine: the incoming value of the param will end up being copied to a separate pseudo, so that there's no risk of messing with any other default def (there's a new testcase for this), and the copy is likely to be optimized out. The other tricky bit was to fix all expander bits that required SSA_NAMEs to have a associated decl. I've removed all such cases, so we can now expand anonymous SSA decls directly, without having to create an ignored decl. Doing that, we can coalesce variables and expand each partition without worrying about choosing a preferred partition leader. We just have to make sure we don't consider pairs of variables eligible for coalescing if they should get different promoted modes, or a different register-or-stack choice, and then expansion of partitions is streamlined: we just expand each leader, and then adjust all SSA_NAMEs to associate the RTL with their base variables, if any. In this revision of the patch, I have retained -ftree-coalesce-vars, so that its negated form can be used in testcases that formerly expected no coalescing across user variables, but that explicitly disabled VTA. As for testcases, while investigating test regressions, I found out various guality failures had to do with VT's lack of awareness of custom calling conventions. Caller's variables saved in registers that are normally call-clobbered, but that are call-saved in custom conventions set up for a callee, would end up invalidating the entry-point location associations. I've arranged for var-tracking to use custom calling conventions for register invalidation at call insns, and this fixed not only a few guality regressions due to changes in register assignment, but a number of other long-standing guality failures. Yay! This could be split out into a standalone patch. On Mar 31, 2015, Richard Biener richard.guent...@gmail.com wrote: Did you do any statistics on how the number of basevars changes with your patch compared to trunk? In this version of the patch, we no longer touch the base vars at all. We just associate the piece of RTL generated for the partition with a list of decls, if needed. (I've just realized that I never noticed a list of decls show up anywhere, and looking into this, I saw a bug in the leader_merge function, that causes it to fail to go from a single entry to a list: it creates the list, but then returns the original non-list entry; that's why I never saw them! I won't delay posting the patch just because of this; I'm not even sure we want decl lists in REG or MEM attrs begin with) I have collected some statistics on the effects of the patch in compiling stage3-gcc/, before and after the patch, with and without -fno-tree-coalesce-vars. I counted, per function: b/a: before the patch, or after the
Re: [PATCH] Properly valueize values we value-number to
On Mon, 27 Apr 2015, Richard Biener wrote: On Fri, 24 Apr 2015, Jeff Law wrote: On 02/17/2015 07:58 AM, Richard Biener wrote: [ Restarting a old thread... ] On a closer look the record_const_or_copy_1 hunk is redundant (record_equality is really a bit obfuscated...). Agreed. I'm not entirely sure how it got to this point. And record_equality is where the SSA_NAME_VALUEs might end up forming chains? At least because we might record a SSA_NAME_VALUE for sth that might be the target of a SSA_NAME_VALUE, as in a_1 = b_2; SSA_NAME_VALUE (a_1) == b_2; if (b_2 == i_3(D)) ... temporarily SSA_NAME_VALUE (b_2) == i_3(D) thus under if (b_2 == i_3(D)) SSA_NAME_VALUE (a_1) == b_2 and SSA_NAME_VALUE (SSA_NAME_VALUE (a_1)) == i_3(D)? I guess we can't easily avoid that because we don't keep track of a reverse mapping (nor would we want to update that). Right. In general, the fact that we're in SSA form means that we ought not get loops in the SSA_NAME_VALUE chain since there's a single static assignment and we'll visit it once. That nice model breaks down in two ways. First we derive equivalences from equality conditions like you've shown above. Second, when threading we can thread through a loop latch and start reprocessing a block. The interaction between those two can result in loops in the value chain. What I'm hoping to do in GCC6 is eliminate all this stuff with a threader that is independent of the dominator walk (and optimizer). Instead of walking forward through the dominator tree recording key nuggets, then removing those nuggets as we pop back up the tree, we instead we start with the conditional and walk up the use-def chains, simplifying as we go, with the goal of simplifying to a constant, of course. You can see the beginnings of that with the FSM bits from Sebastian. Obviously until those bits are in place, we should try to clean up any warts in the current implementation. Btw, in record_equality, the == part of the = check for the loop depth will just randomly swap x and y while we should prefer IL canonical order. If we can't rely on the input being in IL canonical order we should prepend the whole function with if (tree_swap_operands_p (x, y, false)) std::swap (x, y); and change = to for the loop-depth check. Agreed. Makes perfect sense. I'm now re-bootstrapping and testing the following. Committed as follows, with XFAILing and re-opening PR65217. Richard. 2015-04-27 Richard Biener rguent...@suse.de * tree-ssa-dom.c (record_equivalences_from_phis): Valueize PHI arg. (record_equivalences_from_stmt): Valueize rhs. (record_equality): Canonicalize x and y order via tree_swap_operands_p. Do not swap operands for same loop depth. * gcc.target/i386/pr65217.c: XFAIL. Index: gcc/tree-ssa-dom.c === *** gcc/tree-ssa-dom.c (revision 222360) --- gcc/tree-ssa-dom.c (working copy) *** record_equivalences_from_phis (basic_blo *** 1519,1524 --- 1519,1531 if (lhs == t) continue; + /* Valueize t. */ + if (TREE_CODE (t) == SSA_NAME) + { + tree tmp = SSA_NAME_VALUE (t); + t = tmp ? tmp : t; + } + /* If we have not processed an alternative yet, then set RHS to this alternative. */ if (rhs == NULL) *** record_equality (tree x, tree y) *** 1752,1757 --- 1759,1767 { tree prev_x = NULL, prev_y = NULL; + if (tree_swap_operands_p (x, y, false)) + std::swap (x, y); + if (TREE_CODE (x) == SSA_NAME) prev_x = SSA_NAME_VALUE (x); if (TREE_CODE (y) == SSA_NAME) *** record_equality (tree x, tree y) *** 1766,1772 else if (is_gimple_min_invariant (x) /* ??? When threading over backedges the following is important for correctness. See PR61757. */ ! || (loop_depth_of_name (x) = loop_depth_of_name (y))) prev_x = x, x = y, y = prev_x, prev_x = prev_y; else if (prev_x is_gimple_min_invariant (prev_x)) x = y, y = prev_x, prev_x = prev_y; --- 1776,1782 else if (is_gimple_min_invariant (x) /* ??? When threading over backedges the following is important for correctness. See PR61757. */ ! || (loop_depth_of_name (x) loop_depth_of_name (y))) prev_x = x, x = y, y = prev_x, prev_x = prev_y; else if (prev_x is_gimple_min_invariant (prev_x)) x = y, y = prev_x, prev_x = prev_y; *** record_equivalences_from_stmt (gimple st *** 2128,2145 if (may_optimize_p (TREE_CODE (rhs) == SSA_NAME || is_gimple_min_invariant (rhs))) ! { ! if (dump_file (dump_flags TDF_DETAILS)) ! { !
Re: [PATCH][PR65818][bootstrap,hppa] Return side-effect free result in gimplify_va_arg_internal
On 27-04-15 10:17, Richard Biener wrote: This patch fixes that by gimplifying the address expression of the mem-ref returned by the target hook (borrowing code from gimplify_expr, case MEM_REF). Bootstrapped and reg-tested on x86_64. Bootstrapped and reg-tested on hppa2.0w-hp-hpux11.11. OK for trunk? Hmm, that assert looks suspicious... Can't you simply always do gimplify_expr (expr, pre_p, post_p, is_gimple_lvalue, fb_lvalue); It's a bit counter-intuitive for me, using is_gimple_lvalue for something (the result of va_arg) we use as rvalue. But, it seems to work (with in front of expr). OK if bootstrap and reg-test on x86_64 succeeds? Thanks, - Tom Return side-effect free result in gimplify_va_arg_internal 2015-04-27 Tom de Vries t...@codesourcery.com PR tree-optimization/65818 * gimplify.c (gimplify_va_arg_internal): Ensure that only side-effect free values are returned. --- gcc/gimplify.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index c68bd47..4a68c87 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -9352,7 +9352,9 @@ gimplify_va_arg_internal (tree valist, tree type, location_t loc, else gimplify_expr (valist, pre_p, post_p, is_gimple_min_lval, fb_lvalue); - return targetm.gimplify_va_arg_expr (valist, type, pre_p, post_p); + tree expr = targetm.gimplify_va_arg_expr (valist, type, pre_p, post_p); + gimplify_expr (expr, pre_p, post_p, is_gimple_lvalue, fb_lvalue); + return expr; } /* Gimplify __builtin_va_arg, aka VA_ARG_EXPR, which is not really a -- 1.9.1
Re: [PATCH, ARM] Alternatives and type attributes fixes.
Hi Yvan, On 27/04/15 13:25, Yvan Roux wrote: Hi, This is a follow-up of PR64208 where an LRA loop was due to redundancy in insn's alternatives. I've checked all the insns in the arm backend to avoid latent problems and this patch fixes the issues I've spotted. Only thumb2_movsicc_insn contained duplicated alternatives, and the rest of the changes are related to the type attribute, which were not accurate or used accordingly to their definition. Notice that the modifications have only a limited impact as in most of the pipeline descriptions, multiple/mov_reg/alu_reg/.. types are used the same way, only cortex a8 seems to have a real difference between alu or multiple instruction and mov. Bootstrapped and regtested on arm-linux-gnueabihf. Ok for trunk ? Thanks, Yvan 2015-04-27 Yvan Rouxyvan.r...@linaro.org * config/arm/arm.mb (*arm_movt): Fix type attribute. (*movsi_compare0): Likewise. (*cmpsi_shiftsi): Likewise. (*cmpsi_shiftsi_swp): Likewise. (*movsicc_insn): Likewise. (*cond_move): Likewise. (*if_plus_move): Likewise. (*if_move_plus): Likewise. (*if_arith_move): Likewise. (*if_move_arith): Likewise. (*if_shift_move): Likewise. (*if_move_shift): Likewise. * config/arm/thumb2.md (*thumb2_movsi_insn): Fix type attribute. (*thumb2_movsicc_insn): Fix alternative redundancy. (*thumb2_addsi_short): Split register and immediate alternatives. (thumb2_addsi3_compare0): Likewise. insn.diff diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13..ee23deb 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5631,7 +5631,7 @@ [(set_attr predicable yes) (set_attr predicable_short_it no) (set_attr length 4) - (set_attr type mov_imm)] + (set_attr type alu_sreg)] ) For some context, this is the *arm_movt pattern that generates a movt instruction. The documentation in types.md says: This excludes MOV and MVN but includes MOVT. So using alu_sreg is correct according to that logic. However, don't you want to also update the arm_movtas_ze pattern that generates movt but erroneously has the type 'mov_imm'? (define_insn *arm_movsi_insn @@ -5919,7 +5919,7 @@ cmp%?\\t%0, #0 sub%.\\t%0, %1, #0 [(set_attr conds set) - (set_attr type alus_imm,alus_imm)] + (set_attr type alus_sreg,alus_sreg)] ) Why change that? They are instructions with immediate operands so alus_imm should be gorrect? @@ -486,12 +486,12 @@ ) (define_insn_and_split *thumb2_movsicc_insn - [(set (match_operand:SI 0 s_register_operand =l,l,r,r,r,r,r,r,r,r,r) + [(set (match_operand:SI 0 s_register_operand =l,l,r,r,r,r,r,r,r,r,r,r) (if_then_else:SI (match_operator 3 arm_comparison_operator [(match_operand 4 cc_register ) (const_int 0)]) -(match_operand:SI 1 arm_not_operand 0 ,lPy,0 ,0,rI,K,rI,rI,K ,K,r) -(match_operand:SI 2 arm_not_operand lPy,0 ,rI,K,0 ,0,rI,K ,rI,K,r)))] +(match_operand:SI 1 arm_not_operand 0 ,lPy,0 ,0,rI,K,I ,r,rI,K ,K,r) +(match_operand:SI 2 arm_not_operand lPy,0 ,rI,K,0 ,0,rI,I,K ,rI,K,r)))] TARGET_THUMB2 @ it\\t%D3\;mov%D3\\t%0, %2 @@ -504,12 +504,14 @@ # # # + # # ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 - ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 - ; alt 8: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 - ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 - ; alt 10: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 + ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 + ; alt 8: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 + ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 + ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 + ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 reload_completed [(const_int 0)] { @@ -540,8 +542,8 @@ operands[2]))); DONE; } - [(set_attr length 4,4,6,6,6,6,10,10,10,10,6) - (set_attr enabled_for_depr_it yes,yes,no,no,no,no,no,no,no,no,yes) + [(set_attr length 4,4,6,6,6,6,10,8,10,10,10,6) + (set_attr enabled_for_depr_it yes,yes,no,no,no,no,no,no,no,no,no,yes) (set_attr conds use) (set_attr type multiple)] ) So, I think here for the first 6 alternatives we can give it a more specific type, like mov_immm or mov_reg, since you're cleaning this stuff up. The instructions in those alternatives are just a mov instruction enclosed in an IT block, so they always have to stick together anyway. @@ -1161,9 +1163,9 @@ ) (define_insn *thumb2_addsi_short - [(set (match_operand:SI 0 low_register_operand =l,l) - (plus:SI (match_operand:SI 1 low_register_operand l,0) -(match_operand:SI 2 low_reg_or_int_operand lPt,Ps))) + [(set (match_operand:SI 0 low_register_operand =l,l,l) + (plus:SI (match_operand:SI 1 low_register_operand l,l,0) +
Re: Hide _S_n_primes from user code
On 22/04/15 22:11 +0200, François Dumont wrote: +constexpr auto __n_primes + = sizeof(__prime_list) / sizeof(unsigned long) - 1; Normally I'd say sizeof(__prime_list) / sizeof(*__prime_list) - 1 would be better, but since it's very unlikely we'll change the element type in the array what you have should be safe. OK for trunk.
Re: [Patch] Add regex_constants::__polynomial
On 26/04/15 17:55 -0700, Tim Shen wrote: I didn't test with _GLIBCXX_DEBUG though. Updated the patch for removing DFS restriction for ECMAScript. OK for trunk.
[patch] Add a cross-reference link to libstdc++ manual
Committed to trunk. commit 85b6429ce3410f9e204f7d926f67707556c6a4bf Author: Jonathan Wakely jwak...@redhat.com Date: Mon Apr 27 13:27:27 2015 +0100 * doc/xml/manual/extensions.xml: Add cross-reference. * doc/html/manual/ext_compile_checks.html: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml b/libstdc++-v3/doc/xml/manual/extensions.xml index c4120c9..41b1a80 100644 --- a/libstdc++-v3/doc/xml/manual/extensions.xml +++ b/libstdc++-v3/doc/xml/manual/extensions.xml @@ -82,7 +82,8 @@ extensions, be aware of two things: They can be enabled at configure time with link linkend=manual.intro.setup.configureliteral--enable-concept-checks/literal/link. You can enable them on a per-translation-unit basis with - code#define _GLIBCXX_CONCEPT_CHECKS/code for GCC 3.4 and higher + link linkend=manual.intro.using.macroscode#define + _GLIBCXX_CONCEPT_CHECKS/code/link for GCC 3.4 and higher (or with code#define _GLIBCPP_CONCEPT_CHECKS/code for versions 3.1, 3.2 and 3.3). /para
Re: niter_base simplification
On 22/04/15 22:10 +0200, François Dumont wrote: Hello I don't know if I am missing something but I think __niter_base could be simplified to remove usage of _Iter_base. Additionally I overload it to also remove __normal_iterator layer even if behind a reverse_iterator or move_iterator, might help compiler to optimize code, no ? If not, might allow other algo optimization in the future... I prefered to provide a __make_reverse_iterator to allow the latter in C++11 and not only in C++14. Is it fine to do it this way or do you prefer to simply get rid of all this part ? It's fine to add __make_reverse_iterator but see my comment below. * include/bits/cpp_type_traits.h (__gnu_cxx::__normal_iterator): Delete. You're removing __is_normal_iterator not __normal_iterator. * include/bits/stl_algobase.h (std::__niter_base): Adapt. * include/bits/stl_iterator.h (__make_reverse_iterator): New in C++11. (std::__niter_base): Overloads for std::reverse_iterator, __gnu_cxx::__normal_iterator and std::move_iterator. Tested under Linux x86_64. I checked that std::copy still ends up calling __builtin_memmove when used on vector iterators. François diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h index 0bcb133..73eea6b 100644 --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -270,17 +270,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __a; } - // If _Iterator is a __normal_iterator return its base (a plain pointer, - // normally) otherwise return it untouched. See copy, fill, ... + // Fallback implementation of the function used to remove the + // __normal_iterator wrapper. See copy, fill, ... It's a bit strange to have a function with no other overloads visible described as a fallback. It would be good to say that the other definition is in bits/stl_iterator.h templatetypename _Iterator -struct _Niter_base -: _Iter_base_Iterator, __is_normal_iterator_Iterator::__value -{ }; - - templatetypename _Iterator -inline typename _Niter_base_Iterator::iterator_type +inline _Iterator __niter_base(_Iterator __it) -{ return std::_Niter_base_Iterator::_S_base(__it); } +{ return __it; } // Likewise, for move_iterator. This comment no longer makes sense, because you've removed the comment on _Niter_base that it referred to. Please restore the original text of the _Niter_base comment for _Miter_base. (Alternatively, could the same simplification be made for __miter_base? Do we need _Miter_base or just two overloads of __miter_base()?) templatetypename _Iterator diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 4a9189e..3aad9f3 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -390,7 +390,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return __y.base() - __x.base(); } //@} -#if __cplusplus 201103L +#if __cplusplus == 201103L + templatetypename _Iterator +inline reverse_iterator_Iterator +__make_reverse_iterator(_Iterator __i) +{ return reverse_iterator_Iterator(__i); } + +# define _GLIBCXX_MAKE_REVERSE_ITERATOR(_Iter) \ + std::__make_reverse_iterator(_Iter) +#elif __cplusplus 201103L #define __cpp_lib_make_reverse_iterator 201402 // _GLIBCXX_RESOLVE_LIB_DEFECTS @@ -400,6 +408,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline reverse_iterator_Iterator make_reverse_iterator(_Iterator __i) { return reverse_iterator_Iterator(__i); } + +# define _GLIBCXX_MAKE_REVERSE_ITERATOR(_Iter) \ + std::make_reverse_iterator(_Iter) +#endif + +#if __cplusplus = 201103L + templatetypename _Iterator +auto +__niter_base(reverse_iterator_Iterator __it) +- decltype(_GLIBCXX_MAKE_REVERSE_ITERATOR(__niter_base(__it.base( +{ return _GLIBCXX_MAKE_REVERSE_ITERATOR(__niter_base(__it.base())); } #endif It might be simpler to just add __make_reverse_iterator for = 201103L and then always use std::__make_reverse_iterator instead of a macro. That's similar to what we do for std:__addressof and std:addressof.
[PATCH, ARM] Alternatives and type attributes fixes.
Hi, This is a follow-up of PR64208 where an LRA loop was due to redundancy in insn's alternatives. I've checked all the insns in the arm backend to avoid latent problems and this patch fixes the issues I've spotted. Only thumb2_movsicc_insn contained duplicated alternatives, and the rest of the changes are related to the type attribute, which were not accurate or used accordingly to their definition. Notice that the modifications have only a limited impact as in most of the pipeline descriptions, multiple/mov_reg/alu_reg/.. types are used the same way, only cortex a8 seems to have a real difference between alu or multiple instruction and mov. Bootstrapped and regtested on arm-linux-gnueabihf. Ok for trunk ? Thanks, Yvan 2015-04-27 Yvan Roux yvan.r...@linaro.org * config/arm/arm.mb (*arm_movt): Fix type attribute. (*movsi_compare0): Likewise. (*cmpsi_shiftsi): Likewise. (*cmpsi_shiftsi_swp): Likewise. (*movsicc_insn): Likewise. (*cond_move): Likewise. (*if_plus_move): Likewise. (*if_move_plus): Likewise. (*if_arith_move): Likewise. (*if_move_arith): Likewise. (*if_shift_move): Likewise. (*if_move_shift): Likewise. * config/arm/thumb2.md (*thumb2_movsi_insn): Fix type attribute. (*thumb2_movsicc_insn): Fix alternative redundancy. (*thumb2_addsi_short): Split register and immediate alternatives. (thumb2_addsi3_compare0): Likewise. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13..ee23deb 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5631,7 +5631,7 @@ [(set_attr predicable yes) (set_attr predicable_short_it no) (set_attr length 4) - (set_attr type mov_imm)] + (set_attr type alu_sreg)] ) (define_insn *arm_movsi_insn @@ -5919,7 +5919,7 @@ cmp%?\\t%0, #0 sub%.\\t%0, %1, #0 [(set_attr conds set) - (set_attr type alus_imm,alus_imm)] + (set_attr type alus_sreg,alus_sreg)] ) ;; Subroutine to store a half word from a register into memory. @@ -6886,7 +6886,7 @@ [(set_attr conds set) (set_attr shift 1) (set_attr arch 32,a,a) - (set_attr type alus_shift_imm,alu_shift_reg,alus_shift_imm)]) + (set_attr type alus_shift_imm,alus_shift_reg,alus_shift_imm)]) (define_insn *cmpsi_shiftsi_swp [(set (reg:CC_SWP CC_REGNUM) @@ -6899,7 +6899,7 @@ [(set_attr conds set) (set_attr shift 1) (set_attr arch 32,a,a) - (set_attr type alus_shift_imm,alu_shift_reg,alus_shift_imm)]) + (set_attr type alus_shift_imm,alus_shift_reg,alus_shift_imm)]) (define_insn *arm_cmpsi_negshiftsi_si [(set (reg:CC_Z CC_REGNUM) @@ -7492,10 +7492,10 @@ (const_string mov_imm) (const_string mov_reg)) (const_string mvn_imm) - (const_string mov_reg) - (const_string mov_reg) - (const_string mov_reg) - (const_string mov_reg)])] + (const_string multiple) + (const_string multiple) + (const_string multiple) + (const_string multiple)])] ) (define_insn *movsfcc_soft_insn @@ -8653,7 +8653,14 @@ return \\; [(set_attr conds use) - (set_attr type mov_reg,mov_reg,multiple) + (set_attr_alternative type + [(if_then_else (match_operand 2 const_int_operand ) +(const_string mov_imm) +(const_string mov_reg)) + (if_then_else (match_operand 1 const_int_operand ) +(const_string mov_imm) +(const_string mov_reg)) + (const_string multiple)]) (set_attr length 4,4,8)] ) @@ -9449,8 +9456,8 @@ (const_string alu_imm ) (const_string alu_sreg)) (const_string alu_imm) - (const_string alu_sreg) - (const_string alu_sreg)])] + (const_string multiple) + (const_string multiple)])] ) (define_insn *ifcompare_move_plus @@ -9487,7 +9494,13 @@ sub%D4\\t%0, %2, #%n3\;mov%d4\\t%0, %1 [(set_attr conds use) (set_attr length 4,4,8,8) - (set_attr type alu_sreg,alu_imm,multiple,multiple)] + (set_attr_alternative type + [(if_then_else (match_operand 3 const_int_operand ) +(const_string alu_imm ) +(const_string alu_sreg)) + (const_string alu_imm) + (const_string multiple) + (const_string multiple)])] ) (define_insn *ifcompare_arith_arith @@ -9582,7
[PATCH][AArch64] Improve spill code - swap order in shl pattern
Various instructions are supported as integer operations as well as SIMD on AArch64. When register pressure is high, lra-constraints inserts spill code without taking the allocation class into account, and basically chooses the first available pattern that matches. Since this instruction has the SIMD version first it is usually chosen eventhough some of the operands are eventually allocated to integer registers. The result is inefficient code not only due to the higher latency of SIMD instructions but also due to the extra int-FP moves. Placing the integer variant first in the shl pattern generates far more optimal spill code. A few more patterns are the wrong way around, which I'll address in a separate patch. I'm also looking into fixing lra-constraints to generate the expected code by taking the allocno class into account in the cost calculations during spilling. 2015-04-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3): Place integer variant first. --- gcc/config/aarch64/aarch64.md | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 7163025..baef56a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3334,17 +3334,17 @@ ;; Logical left shift using SISD or Integer instruction (define_insn *aarch64_ashl_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w) (ashift:GPI - (match_operand:GPI 1 register_operand w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_mode rUscmode,Uscmode,w)))] @ + lsl\t%w0, %w1, %w2 shl\t%rtn0vas, %rtn1vas, %2 - ushl\t%rtn0vas, %rtn1vas, %rtn2vas - lsl\t%w0, %w1, %w2 - [(set_attr simd yes,yes,no) - (set_attr type neon_shift_immq, neon_shift_regq,shift_reg)] + ushl\t%rtn0vas, %rtn1vas, %rtn2vas + [(set_attr simd no,yes,yes) + (set_attr type shift_reg,neon_shift_immq, neon_shift_regq)] ) ;; Logical right shift using SISD or Integer instruction -- 1.9.1
Re: [PATCH][PR65818][bootstrap,hppa] Return side-effect free result in gimplify_va_arg_internal
On Mon, Apr 27, 2015 at 3:06 PM, Tom de Vries tom_devr...@mentor.com wrote: On 27-04-15 10:17, Richard Biener wrote: This patch fixes that by gimplifying the address expression of the mem-ref returned by the target hook (borrowing code from gimplify_expr, case MEM_REF). Bootstrapped and reg-tested on x86_64. Bootstrapped and reg-tested on hppa2.0w-hp-hpux11.11. OK for trunk? Hmm, that assert looks suspicious... Can't you simply always do gimplify_expr (expr, pre_p, post_p, is_gimple_lvalue, fb_lvalue); It's a bit counter-intuitive for me, using is_gimple_lvalue for something (the result of va_arg) we use as rvalue. Yeah, choosing that was done because you could assert it's a MEM_REF and tree-stdarg eventually builds a WITH_SIZE_EXPR around it. It would possibly be easier to simply inline gimplify_va_arg_internal at its use and only gimplify the assignment. Though in that case the original code probably worked - just in the lhs == NULL case it didn't, which hints at a better place for the fix - in expand_ifn_va_arg_1 do if (lhs != NULL_TREE) { ... } else gimplify_expr (expr, pre, post, is_gimple_val, fb_either); So ... if you can re-try that one it's pre-approved. Thanks, Richard. But, it seems to work (with in front of expr). OK if bootstrap and reg-test on x86_64 succeeds? Thanks, - Tom
RE: [PATCH][AArch64] Fix Cortex-A53 shift costs
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 05 March 2015 14:49 To: gcc-patches@gcc.gnu.org Subject: [PATCH][AArch64] Fix Cortex-A53 shift costs This patch fixes the shift costs for Cortex-A53 so they are more accurate - immediate shifts use SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle. Bootstrap and regression OK. ChangeLog: 2015-03-05 Wilco Dijkstra wdijk...@arm.com * gcc/config/arm/aarch-cost-tables.h (cortexa53_extra_costs): Make Cortex-A53 shift costs more accurate. --- gcc/config/arm/aarch-cost-tables.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h index 05e96a9..6bb8ede 100644 --- a/gcc/config/arm/aarch-cost-tables.h +++ b/gcc/config/arm/aarch-cost-tables.h @@ -130,12 +130,12 @@ const struct cpu_cost_table cortexa53_extra_costs = 0, /* arith. */ 0, /* logical. */ COSTS_N_INSNS (1), /* shift. */ -COSTS_N_INSNS (2), /* shift_reg. */ +0, /* shift_reg. */ COSTS_N_INSNS (1), /* arith_shift. */ -COSTS_N_INSNS (2), /* arith_shift_reg. */ +COSTS_N_INSNS (1), /* arith_shift_reg. */ COSTS_N_INSNS (1), /* log_shift. */ -COSTS_N_INSNS (2), /* log_shift_reg. */ -0, /* extend. */ +COSTS_N_INSNS (1), /* log_shift_reg. */ +COSTS_N_INSNS (1), /* extend. */ COSTS_N_INSNS (1), /* extend_arith. */ COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ -- 1.9.1
Re: [PATCH 5/13] microblaze musl support
On 20/04/15 19:54, Szabolcs Nagy wrote: Set up dynamic linker name for microblaze. Patch v2. (undef MUSL_DYNAMIC_LINKER that comes from config/linux.h) gcc/Changelog: 2015-04-24 Gregor Richards gregor.richa...@uwaterloo.ca * config/microblaze/linux.h (MUSL_DYNAMIC_LINKER): Define. (DYNAMIC_LINKER): Change. diff --git a/gcc/config/microblaze/linux.h b/gcc/config/microblaze/linux.h index a7faa7d..3e08138 100644 --- a/gcc/config/microblaze/linux.h +++ b/gcc/config/microblaze/linux.h @@ -25,7 +25,22 @@ #undef TLS_NEEDS_GOT #define TLS_NEEDS_GOT 1 -#define DYNAMIC_LINKER /lib/ld.so.1 +#if TARGET_BIG_ENDIAN_DEFAULT == 0 /* LE */ +#define MUSL_DYNAMIC_LINKER_E %{mbig-endian:;:el} +#else +#define MUSL_DYNAMIC_LINKER_E %{mlittle-endian:el} +#endif + +#undef MUSL_DYNAMIC_LINKER +#define MUSL_DYNAMIC_LINKER /lib/ld-musl-microblaze MUSL_DYNAMIC_LINKER_E .so.1 +#define GLIBC_DYNAMIC_LINKER /lib/ld.so.1 + +#if DEFAULT_LIBC == LIBC_MUSL +#define DYNAMIC_LINKER MUSL_DYNAMIC_LINKER +#else +#define DYNAMIC_LINKER GLIBC_DYNAMIC_LINKER +#endif + #undef SUBTARGET_EXTRA_SPECS #define SUBTARGET_EXTRA_SPECS \ { dynamic_linker, DYNAMIC_LINKER }
Re: [PATCH 6/13] mips musl support
On 21/04/15 15:59, Matthew Fortune wrote: Rich Felker dal...@libc.org writes: On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote: There does however appear to be both soft and hard float variants Patch v2. Now all the ABI variants musl plans to support are represented. gcc/Changelog: 2015-04-27 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define. (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define. (GNU_USER_DYNAMIC_LINKERN32): Update. diff --git a/gcc/config/mips/linux.h b/gcc/config/mips/linux.h index 91df261..6038498 100644 --- a/gcc/config/mips/linux.h +++ b/gcc/config/mips/linux.h @@ -37,7 +37,13 @@ along with GCC; see the file COPYING3. If not see #define UCLIBC_DYNAMIC_LINKERN32 \ %{mnan=2008:/lib32/ld-uClibc-mipsn8.so.0;:/lib32/ld-uClibc.so.0} +#undef MUSL_DYNAMIC_LINKER32 +#define MUSL_DYNAMIC_LINKER32 /lib/ld-musl-mips%{EL:el}%{msoft-float:-sf}.so.1 +#undef MUSL_DYNAMIC_LINKER64 +#define MUSL_DYNAMIC_LINKER64 /lib/ld-musl-mips64%{EL:el}.so.1 +#define MUSL_DYNAMIC_LINKERN32 /lib/ld-musl-mipsn32%{EL:el}.so.1 + #define BIONIC_DYNAMIC_LINKERN32 /system/bin/linker32 #define GNU_USER_DYNAMIC_LINKERN32 \ CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKERN32, UCLIBC_DYNAMIC_LINKERN32, \ - BIONIC_DYNAMIC_LINKERN32) + BIONIC_DYNAMIC_LINKERN32, MUSL_DYNAMIC_LINKERN32)
Re: [PATCH][PR65818][bootstrap,hppa] Return side-effect free result in gimplify_va_arg_internal
On 27-04-15 15:40, Richard Biener wrote: On Mon, Apr 27, 2015 at 3:06 PM, Tom de Vries tom_devr...@mentor.com wrote: On 27-04-15 10:17, Richard Biener wrote: This patch fixes that by gimplifying the address expression of the mem-ref returned by the target hook (borrowing code from gimplify_expr, case MEM_REF). Bootstrapped and reg-tested on x86_64. Bootstrapped and reg-tested on hppa2.0w-hp-hpux11.11. OK for trunk? Hmm, that assert looks suspicious... Can't you simply always do gimplify_expr (expr, pre_p, post_p, is_gimple_lvalue, fb_lvalue); It's a bit counter-intuitive for me, using is_gimple_lvalue for something (the result of va_arg) we use as rvalue. Yeah, choosing that was done because you could assert it's a MEM_REF and tree-stdarg eventually builds a WITH_SIZE_EXPR around it. It would possibly be easier to simply inline gimplify_va_arg_internal at its use and only gimplify the assignment. Though in that case the original code probably worked - just in the lhs == NULL case it didn't, which hints at a better place for the fix - in expand_ifn_va_arg_1 do if (lhs != NULL_TREE) { ... } else gimplify_expr (expr, pre, post, is_gimple_val, fb_either); So ... if you can re-try that one it's pre-approved. Using that proposed patch, we run into the following problem. Consider this testcase with ignored-result-va_arg: ... #include stdarg.h void f2 (int i, ...) { va_list ap; va_start (ap, i); va_arg (ap, long); va_end (ap); } ... after gimplify_va_arg_internal we have: ... (gdb) call debug_generic_expr (expr) *(long int * {ref-all}) addr.2 ... In a bit more detail: ... (gdb) call debug_tree (expr) mem_ref 0x765ea1b8 type integer_type 0x764a77e0 long int DI size integer_cst 0x764a3ca8 constant 64 unit size integer_cst 0x764a3cc0 constant 8 align 64 symtab 0 alias set -1 canonical type 0x764a77e0 precision 64 min integer_cst 0x764a3f30 -9223372036854775808 max integer_cst 0x764a3f48 9223372036854775807 pointer_to_this pointer_type 0x765e0498 arg 0 nop_expr 0x765cbb60 type pointer_type 0x765e0498 type integer_type 0x764a77e0 long int public static unsigned DI size integer_cst 0x764a3ca8 64 unit size integer_cst 0x764a3cc0 8 align 64 symtab 0 alias set -1 canonical type 0x765e0498 arg 0 var_decl 0x765e32d0 addr.2 type pointer_type 0x764c2150 used unsigned ignored DI file stdarg-1.c line 4 col 1 size integer_cst 0x764a3ca8 64 unit size integer_cst 0x764a3cc0 8 align 64 context function_decl 0x765c21b0 f2 arg 1 integer_cst 0x765e64e0 type pointer_type 0x765e0498 constant 0 ... During 'gimplify_expr (expr, pre, post, is_gimple_val, fb_either)' we ICE here: ... 8886 gcc_assert ((*gimple_test_f) (*expr_p)); ... At this point expr is: ... (gdb) call debug_generic_expr (*expr_p) *addr.2 ... In more detail: ... (gdb) call debug_tree (*expr_p) mem_ref 0x765ea1e0 type void_type 0x764c2000 void VOID align 8 symtab 0 alias set -1 canonical type 0x764c2000 pointer_to_this pointer_type 0x764c2150 arg 0 var_decl 0x765e32d0 addr.2 type pointer_type 0x764c2150 type void_type 0x764c2000 void sizes-gimplified public unsigned DI size integer_cst 0x764a3ca8 constant 64 unit size integer_cst 0x764a3cc0 constant 8 align 64 symtab 0 alias set -1 canonical type 0x764c2150 pointer_to_this pointer_type 0x764c9bd0 used unsigned ignored DI file stdarg-1.c line 4 col 1 size integer_cst 0x764a3ca8 64 unit size integer_cst 0x764a3cc0 8 align 64 context function_decl 0x765c21b0 f2 arg 1 integer_cst 0x764c10c0 type pointer_type 0x764c2150 constant 0 ... The memref is now VOID type. And that's not a gimple_val: ... (gdb) p is_gimple_val (*expr_p) $1 = false ... Attached patch uses is_gimple_lvalue, but in expand_ifn_va_arg_1. I'll bootstrap and reg-test on x86_64 and commit, unless further comments of course. Thanks, - Tom Evaluate side-effects in expand_ifn_va_arg_1 2015-04-27 Tom de Vries t...@codesourcery.com PR tree-optimization/65818 * tree-stdarg.c (expand_ifn_va_arg_1): Ensure that side-effects are evaluated. --- gcc/tree-stdarg.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/tree-stdarg.c b/gcc/tree-stdarg.c index 16a9e2c..1356374 100644 --- a/gcc/tree-stdarg.c +++ b/gcc/tree-stdarg.c @@ -1078,6 +1078,8 @@ expand_ifn_va_arg_1 (function *fun) types. */ gimplify_assign (lhs, expr, pre); } + else + gimplify_expr (expr, pre, post, is_gimple_lvalue, fb_lvalue); pop_gimplify_context (NULL); -- 1.9.1
Re: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS
On 27/04/15 14:53, Kyrill Tkachov wrote: Hi Wilco, On 27/04/15 14:43, Wilco Dijkstra wrote: ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 04 March 2015 15:38 To: GCC Patches Subject: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS Include the cost of op0 and op1 in all cases in PLUS and MINUS in aarch64_rtx_costs. Bootstrap regression OK. ChangeLog: 2015-03-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_rtx_costs): Calculate cost of op0 and op1 in PLUS and MINUS cases. --- gcc/config/aarch64/aarch64.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 39921a7..e22d72e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5794,6 +5794,8 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, op1 = XEXP (x, 1); cost_minus: + *cost += rtx_cost (op0, MINUS, 0, speed); + /* Detect valid immediates. */ if ((GET_MODE_CLASS (mode) == MODE_INT || (GET_MODE_CLASS (mode) == MODE_CC @@ -5801,13 +5803,10 @@ cost_minus: CONST_INT_P (op1) aarch64_uimm12_shift (INTVAL (op1))) { - *cost += rtx_cost (op0, MINUS, 0, speed); - if (speed) /* SUB(S) (immediate). */ *cost += extra_cost-alu.arith; return true; - } /* Look for SUB (extended register). */ @@ -5832,7 +5831,6 @@ cost_minus: *cost += aarch64_rtx_mult_cost (new_op1, MULT, (enum rtx_code) code, speed); - *cost += rtx_cost (op0, MINUS, 0, speed); return true; } @@ -5879,6 +5877,8 @@ cost_plus: return true; } + *cost += rtx_cost (op1, PLUS, 1, speed); + I don't think this is correct. In the code directly below (when the aarch64_rtx_arith_op_extract_p condition is true) we have a shift/extend operation by a constant, so we don't want to take into account the cost of operand 1 (which is the extend+shift rtx). So, I looked at this code myself recently and found that the wrong logic is addressed by my patch at: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01617.html which fixes the case where we forget to cost operand0 in the MINUS case. Never mind, I misread the code. The case is indeed missing the costing of operand 1, so your patch is a superset of mine. I think it's correct, but I can't approve. If this goes in, then I withdraw my patch at: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01617.html Sorry for the confusion, Kyrill What do you think? Kyrill /* Look for ADD (extended register). */ if (aarch64_rtx_arith_op_extract_p (op0, mode)) { @@ -5900,12 +5900,10 @@ cost_plus: { *cost += aarch64_rtx_mult_cost (new_op0, MULT, PLUS, speed); - *cost += rtx_cost (op1, PLUS, 1, speed); return true; } - *cost += (rtx_cost (new_op0, PLUS, 0, speed) - + rtx_cost (op1, PLUS, 1, speed)); + *cost += rtx_cost (new_op0, PLUS, 0, speed); if (speed) { -- 1.9.1
Re: [C/C++ PATCH] Improve -Wlogical-op (PR c/63357)
On Mon, Apr 27, 2015 at 04:47:12PM +0200, Marek Polacek wrote: On Sat, Apr 25, 2015 at 10:18:59PM +0200, Gerald Pfeifer wrote: In case this example feels too contrived (even though it is an excerpt of Wine code), we now also warn about the following where the two types and variables are defined in different places and the size of one is set implicitly: typedef int r_fun_t (int); r_fun_t * text_funcs[] = {0,0,0}; int report (unsigned t) { typedef int s_fun_t (long, char); static s_fun_t * GUI_funcs[3]; return (t sizeof text_funcs / sizeof text_funcs[0] t sizeof GUI_funcs / sizeof GUI_funcs[0]); } (I also filed this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65891 so that we keep track.) I'm afraid there isn't an easy solution to this; the problem is that we fold sizeof early, so the warning sees t sizeof 4 t 4 and warns. Maybe using SIZEOF_EXPR would help... That said, this isn't something that regressed with my patch. Consider e.g.: struct S { int i; }; struct T { int i; }; int foo (unsigned int t) { return t sizeof (struct S) t sizeof (struct T); } with -Wlogical-op: w2.c: In function ‘foo’: w2.c:6:32: warning: logical ‘and’ of mutually exclusive tests is always false [-Wlogical-op] return t sizeof (struct S) t sizeof (struct T); ^
Re: [PATCH, ARM] Alternatives and type attributes fixes.
On 27 April 2015 at 15:20, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi Yvan, On 27/04/15 13:25, Yvan Roux wrote: Hi, This is a follow-up of PR64208 where an LRA loop was due to redundancy in insn's alternatives. I've checked all the insns in the arm backend to avoid latent problems and this patch fixes the issues I've spotted. Only thumb2_movsicc_insn contained duplicated alternatives, and the rest of the changes are related to the type attribute, which were not accurate or used accordingly to their definition. Notice that the modifications have only a limited impact as in most of the pipeline descriptions, multiple/mov_reg/alu_reg/.. types are used the same way, only cortex a8 seems to have a real difference between alu or multiple instruction and mov. Bootstrapped and regtested on arm-linux-gnueabihf. Ok for trunk ? Thanks, Yvan 2015-04-27 Yvan Rouxyvan.r...@linaro.org * config/arm/arm.mb (*arm_movt): Fix type attribute. (*movsi_compare0): Likewise. (*cmpsi_shiftsi): Likewise. (*cmpsi_shiftsi_swp): Likewise. (*movsicc_insn): Likewise. (*cond_move): Likewise. (*if_plus_move): Likewise. (*if_move_plus): Likewise. (*if_arith_move): Likewise. (*if_move_arith): Likewise. (*if_shift_move): Likewise. (*if_move_shift): Likewise. * config/arm/thumb2.md (*thumb2_movsi_insn): Fix type attribute. (*thumb2_movsicc_insn): Fix alternative redundancy. (*thumb2_addsi_short): Split register and immediate alternatives. (thumb2_addsi3_compare0): Likewise. insn.diff diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13..ee23deb 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5631,7 +5631,7 @@ [(set_attr predicable yes) (set_attr predicable_short_it no) (set_attr length 4) - (set_attr type mov_imm)] + (set_attr type alu_sreg)] ) For some context, this is the *arm_movt pattern that generates a movt instruction. The documentation in types.md says: This excludes MOV and MVN but includes MOVT. So using alu_sreg is correct according to that logic. However, don't you want to also update the arm_movtas_ze pattern that generates movt but erroneously has the type 'mov_imm'? Ha, yes I missed this one ! I'll will update it. (define_insn *arm_movsi_insn @@ -5919,7 +5919,7 @@ cmp%?\\t%0, #0 sub%.\\t%0, %1, #0 [(set_attr conds set) - (set_attr type alus_imm,alus_imm)] + (set_attr type alus_sreg,alus_sreg)] ) Why change that? They are instructions with immediate operands so alus_imm should be gorrect? Oups, I certainly misread #0 and %0 this one is correct. @@ -486,12 +486,12 @@ ) (define_insn_and_split *thumb2_movsicc_insn - [(set (match_operand:SI 0 s_register_operand =l,l,r,r,r,r,r,r,r,r,r) + [(set (match_operand:SI 0 s_register_operand =l,l,r,r,r,r,r,r,r,r,r,r) (if_then_else:SI (match_operator 3 arm_comparison_operator [(match_operand 4 cc_register ) (const_int 0)]) -(match_operand:SI 1 arm_not_operand 0 ,lPy,0 ,0,rI,K,rI,rI,K ,K,r) -(match_operand:SI 2 arm_not_operand lPy,0 ,rI,K,0 ,0,rI,K ,rI,K,r)))] +(match_operand:SI 1 arm_not_operand 0 ,lPy,0 ,0,rI,K,I ,r,rI,K ,K,r) +(match_operand:SI 2 arm_not_operand lPy,0 ,rI,K,0 ,0,rI,I,K ,rI,K,r)))] TARGET_THUMB2 @ it\\t%D3\;mov%D3\\t%0, %2 @@ -504,12 +504,14 @@ # # # + # # ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 - ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 - ; alt 8: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 - ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 - ; alt 10: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 + ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 + ; alt 8: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 + ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 + ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 + ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 reload_completed [(const_int 0)] { @@ -540,8 +542,8 @@ operands[2]))); DONE; } - [(set_attr length 4,4,6,6,6,6,10,10,10,10,6) - (set_attr enabled_for_depr_it yes,yes,no,no,no,no,no,no,no,no,yes) + [(set_attr length 4,4,6,6,6,6,10,8,10,10,10,6) + (set_attr enabled_for_depr_it yes,yes,no,no,no,no,no,no,no,no,no,yes) (set_attr conds use) (set_attr type multiple)] ) So, I think here for the first 6 alternatives we can give it a more specific type, like mov_immm or mov_reg, since you're cleaning this stuff up. The instructions in those alternatives are just a mov instruction enclosed in an IT block, so they always have to stick together anyway. OK I'll change that. @@ -1161,9 +1163,9 @@ ) (define_insn *thumb2_addsi_short - [(set
Re: [PATCH] [AArch32] Additional bics patterns.
On 24/04/15 16:41, Alex Velenko wrote: Hi, This patch adds rtl patterns to generate bics instructions with shift. Added attribute predicable_short_it since last respin. Done full regression run on arm-none-eabi and arm-none-gnueabihf. Bootstrapped on arm-none-gnueabihf. Is this patch ok? gcc/config 2015-04-24 Alex Velenko alex.vele...@arm.com * arm/arm.md (andsi_not_shiftsi_si_scc): New pattern. * (andsi_not_shiftsi_si_scc_no_reuse): New pattern. gcc/testsuite 2015-04-24 Alex Velenko alex.vele...@arm.com * gcc.target/arm/bics_1.c : New testcase. * gcc.target/arm/bics_2.c : New testcase. * gcc.target/arm/bics_3.c : New testcase. * gcc.target/arm/bics_4.c : New testcase. --- gcc/config/arm/arm.md | 44 +++ gcc/testsuite/gcc.target/arm/bics_1.c | 54 + gcc/testsuite/gcc.target/arm/bics_2.c | 57 +++ gcc/testsuite/gcc.target/arm/bics_3.c | 41 + gcc/testsuite/gcc.target/arm/bics_4.c | 49 ++ 5 files changed, 245 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/bics_1.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_2.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_3.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_4.c diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13..9e774c1 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2768,6 +2768,50 @@ (const_string logic_shift_reg)))] ) +(define_insn andsi_not_shiftsi_si_scc_no_reuse + [(set (reg:CC_NOOV CC_REGNUM) +(compare:CC_NOOV +(and:SI (not:SI (match_operator:SI 0 shift_operator +[(match_operand:SI 1 s_register_operand r) + (match_operand:SI 2 arm_rhs_operand rM)])) +(match_operand:SI 3 s_register_operand r)) +(const_int 0))) + (clobber (match_scratch:SI 4 =r))] + TARGET_32BIT + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) + +(define_insn andsi_not_shiftsi_si_scc + [(parallel [(set (reg:CC_NOOV CC_REGNUM) +(compare:CC_NOOV +(and:SI (not:SI (match_operator:SI 0 shift_operator +[(match_operand:SI 1 s_register_operand r) + (match_operand:SI 2 arm_rhs_operand rM)])) +(match_operand:SI 3 s_register_operand r)) +(const_int 0))) +(set (match_operand:SI 4 s_register_operand =r) + (and:SI (not:SI (match_op_dup 0 + [(match_dup 1) + (match_dup 2)])) + (match_dup 3)))])] + TARGET_32BIT + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) + (define_insn *andsi_notsi_si_compare0 [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV diff --git a/gcc/testsuite/gcc.target/arm/bics_1.c b/gcc/testsuite/gcc.target/arm/bics_1.c new file mode 100644 index 000..173eb89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/bics_1.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target arm32 } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+ 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a ~(b 3); + + /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl \#3 1 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + +x = bics_si_test2 (35, ~4, 5); + if (x != ((35 ~(~4 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) + abort (); + + return 0; +} +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/bics_2.c b/gcc/testsuite/gcc.target/arm/bics_2.c new file mode 100644 index 000..740d7c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/bics_2.c @@ -0,0 +1,57 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target arm32 } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-not bics\tr\[0-9\]+, r\[0-9\]+,
Re: [PATCH 2/13] musl libc config
On 22/04/15 16:42, Szabolcs Nagy wrote: yes, i didn't realize that this depends on the target specific parts i will prepare an updated patch that works if the target has no musl dynamic linker name defined Updated the patch, this time I tested it separately. (The test revealed that -mmusl was not wired up properly in linux.opt through the Negative directive, now fixed). A default dynamic linker name is defined for musl: /dev/null. (The arch specific headers have to be updated to undef the MUSL_DYNAMIC_LINKER* macros). (Other options were considered such as print an error with %e, but then the user cannot override it with -Wl,-dynamic-linker) The other parts of the patch (config, include path reordering) are not changed. gcc/Changelog 2015-04-27 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (MUSL_DYNAMIC_LINKER, MUSL_DYNAMIC_LINKER32,) (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERX32,) (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * configure: Regenerate. diff --git a/gcc/config.gcc b/gcc/config.gcc index a1df043..0176ca2 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -575,7 +575,7 @@ case ${target} in esac # Common C libraries. -tm_defines=$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 +tm_defines=$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 LIBC_MUSL=4 # 32-bit x86 processors supported by --with-arch=. Each processor # MUST be separated by exactly one space. @@ -725,6 +725,9 @@ case ${target} in *-*-*uclibc*) tm_defines=$tm_defines DEFAULT_LIBC=LIBC_UCLIBC ;; +*-*-*musl*) + tm_defines=$tm_defines DEFAULT_LIBC=LIBC_MUSL + ;; *) tm_defines=$tm_defines DEFAULT_LIBC=LIBC_GLIBC ;; diff --git a/gcc/config/linux.h b/gcc/config/linux.h index 857389a..b551c56 100644 --- a/gcc/config/linux.h +++ b/gcc/config/linux.h @@ -32,10 +32,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define OPTION_GLIBC (DEFAULT_LIBC == LIBC_GLIBC) #define OPTION_UCLIBC (DEFAULT_LIBC == LIBC_UCLIBC) #define OPTION_BIONIC (DEFAULT_LIBC == LIBC_BIONIC) +#define OPTION_MUSL (DEFAULT_LIBC == LIBC_MUSL) #else #define OPTION_GLIBC (linux_libc == LIBC_GLIBC) #define OPTION_UCLIBC (linux_libc == LIBC_UCLIBC) #define OPTION_BIONIC (linux_libc == LIBC_BIONIC) +#define OPTION_MUSL (linux_libc == LIBC_MUSL) #endif #define GNU_USER_TARGET_OS_CPP_BUILTINS() \ @@ -50,21 +52,25 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see } while (0) /* Determine which dynamic linker to use depending on whether GLIBC or - uClibc or Bionic is the default C library and whether - -muclibc or -mglibc or -mbionic has been passed to change the default. */ + uClibc or Bionic or musl is the default C library and whether + -muclibc or -mglibc or -mbionic or -mmusl has been passed to change + the default. */ -#define CHOOSE_DYNAMIC_LINKER1(LIBC1, LIBC2, LIBC3, LD1, LD2, LD3) \ - %{ LIBC2 : LD2 ;:%{ LIBC3 : LD3 ;: LD1 }} +#define CHOOSE_DYNAMIC_LINKER1(LIBC1, LIBC2, LIBC3, LIBC4, LD1, LD2, LD3, LD4) \ + %{ LIBC2 : LD2 ;:%{ LIBC3 : LD3 ;:%{ LIBC4 : LD4 ;: LD1 }}} #if DEFAULT_LIBC == LIBC_GLIBC -#define CHOOSE_DYNAMIC_LINKER(G, U, B) \ - CHOOSE_DYNAMIC_LINKER1 (mglibc, muclibc, mbionic, G, U, B) +#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \ + CHOOSE_DYNAMIC_LINKER1 (mglibc, muclibc, mbionic, mmusl, G, U, B, M) #elif DEFAULT_LIBC == LIBC_UCLIBC -#define CHOOSE_DYNAMIC_LINKER(G, U, B) \ - CHOOSE_DYNAMIC_LINKER1 (muclibc, mglibc, mbionic, U, G, B) +#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \ + CHOOSE_DYNAMIC_LINKER1 (muclibc, mglibc, mbionic, mmusl, U, G, B, M) #elif DEFAULT_LIBC == LIBC_BIONIC -#define CHOOSE_DYNAMIC_LINKER(G, U, B) \ - CHOOSE_DYNAMIC_LINKER1 (mbionic, mglibc, muclibc, B, G, U) +#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \ + CHOOSE_DYNAMIC_LINKER1 (mbionic, mglibc, muclibc, mmusl, B, G, U, M) +#elif DEFAULT_LIBC == LIBC_MUSL +#define CHOOSE_DYNAMIC_LINKER(G, U, B, M) \ + CHOOSE_DYNAMIC_LINKER1 (mmusl, mglibc, muclibc, mbionic, M, G, U, B) #else #error Unsupported DEFAULT_LIBC #endif /* DEFAULT_LIBC */ @@ -81,24 +87,100 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define BIONIC_DYNAMIC_LINKER32 /system/bin/linker #define BIONIC_DYNAMIC_LINKER64 /system/bin/linker64 #define BIONIC_DYNAMIC_LINKERX32 /system/bin/linkerx32 +/* Should be redefined for each target that supports musl. */ +#define MUSL_DYNAMIC_LINKER /dev/null
Re: [PATCH][combine][obvious] Use std::swap instead of manually swapping
On Mon, Apr 27, 2015 at 10:55:17AM +0100, Kyrill Tkachov wrote: Precedents suggest these changes are considered obvious. So I'll commit this in a couple of days unless someone objects. Yes it's obvious. One tiny thing... @@ -9062,7 +9061,7 @@ known_cond (rtx x, enum rtx_code cond, rtx reg, rtx val) else if (COMPARISON_P (x) || COMMUTATIVE_ARITH_P (x)) { if (rtx_equal_p (XEXP (x, 0), val)) - cond = swap_condition (cond), temp = val, val = reg, reg = temp; + cond = swap_condition (cond), std::swap (val, reg); if (rtx_equal_p (XEXP (x, 0), reg) rtx_equal_p (XEXP (x, 1), val)) { Might as well write this as two statements, like everywhere else, e.g. @@ -11454,7 +11453,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) is already a constant integer. */ if (swap_commutative_operands_p (op0, op1)) { - tem = op0, op0 = op1, op1 = tem; + std::swap (op0, op1); code = swap_condition (code); } Thanks, Segher
Re: [PATCH 7/13] powerpc musl support
On 20/04/15 19:55, Szabolcs Nagy wrote: Set up dynamic linker name for powerpc. Musl only supports powerpc with secure plt, so appropriate options are passed to the linker by default. Patch v2. Powerpc does not include the top level linux.h, so I had to repeat the include order fixes from there in rs6000/sysv4.h. I corrected the endianness handling (the le suffix should be added correctly now). gcc/Changelog: 2015-04-24 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * config.gcc (secure_plt): Add *-linux*-musl*. * config/rs6000/linux64.h (MUSL_DYNAMIC_LINKER32): Define. (MUSL_DYNAMIC_LINKER64): Define. (GNU_USER_DYNAMIC_LINKER32): Update. (GNU_USER_DYNAMIC_LINKER64): Update. (CHOOSE_DYNAMIC_LINKER): Update. * config/rs6000/secureplt.h (LINK_SECURE_PLT_DEFAULT_SPEC): Define. * config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Update. (MUSL_DYNAMIC_LINKER, MUSL_DYNAMIC_LINKER_E,) (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. (LINK_SECURE_PLT_DEFAULT_SPEC): Define. (CHOOSE_DYNAMIC_LINKER, LINK_TARGET_SPEC, LINK_OS_LINUX_SPEC): Update. * config/rs6000/sysv4le.h (MUSL_DYNAMIC_LINKER_E): Define. diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h index 0879e7e..a6ad842 100644 --- a/gcc/config/rs6000/linux64.h +++ b/gcc/config/rs6000/linux64.h @@ -365,17 +365,21 @@ extern int dot_symbols; #endif #define UCLIBC_DYNAMIC_LINKER32 /lib/ld-uClibc.so.0 #define UCLIBC_DYNAMIC_LINKER64 /lib/ld64-uClibc.so.0 +#define MUSL_DYNAMIC_LINKER32 /lib/ld-musl-powerpc MUSL_DYNAMIC_LINKER_E .so.1 +#define MUSL_DYNAMIC_LINKER64 /lib/ld-musl-powerpc64 MUSL_DYNAMIC_LINKER_E .so.1 #if DEFAULT_LIBC == LIBC_UCLIBC -#define CHOOSE_DYNAMIC_LINKER(G, U) %{mglibc: G ;: U } +#define CHOOSE_DYNAMIC_LINKER(G, U, M) %{mglibc: G ;:%{mmusl: M ;: U }} #elif DEFAULT_LIBC == LIBC_GLIBC -#define CHOOSE_DYNAMIC_LINKER(G, U) %{muclibc: U ;: G } +#define CHOOSE_DYNAMIC_LINKER(G, U, M) %{muclibc: U ;:%{mmusl: M ;: G }} +#elif DEFAULT_LIBC == LIBC_MUSL +#define CHOOSE_DYNAMIC_LINKER(G, U, M) %{mglibc: G ;:%{muclibc: U ;: M }} #else #error Unsupported DEFAULT_LIBC #endif #define GNU_USER_DYNAMIC_LINKER32 \ - CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER32, UCLIBC_DYNAMIC_LINKER32) + CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER32, UCLIBC_DYNAMIC_LINKER32, MUSL_DYNAMIC_LINKER32) #define GNU_USER_DYNAMIC_LINKER64 \ - CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER64, UCLIBC_DYNAMIC_LINKER64) + CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKER64, UCLIBC_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKER64) #undef DEFAULT_ASM_ENDIAN #if (TARGET_DEFAULT MASK_LITTLE_ENDIAN) diff --git a/gcc/config/rs6000/secureplt.h b/gcc/config/rs6000/secureplt.h index b463463..77edf2a 100644 --- a/gcc/config/rs6000/secureplt.h +++ b/gcc/config/rs6000/secureplt.h @@ -18,3 +18,4 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ #define CC1_SECURE_PLT_DEFAULT_SPEC -msecure-plt +#define LINK_SECURE_PLT_DEFAULT_SPEC --secure-plt diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h index 9917c2f..94c4263 100644 --- a/gcc/config/rs6000/sysv4.h +++ b/gcc/config/rs6000/sysv4.h @@ -537,6 +537,9 @@ ENDIAN_SELECT( -mbig, -mlittle, DEFAULT_ASM_ENDIAN) #ifndef CC1_SECURE_PLT_DEFAULT_SPEC #define CC1_SECURE_PLT_DEFAULT_SPEC #endif +#ifndef LINK_SECURE_PLT_DEFAULT_SPEC +#define LINK_SECURE_PLT_DEFAULT_SPEC +#endif /* Pass -G xxx to the compiler. */ #undef CC1_SPEC @@ -586,7 +589,8 @@ ENDIAN_SELECT( -mbig, -mlittle, DEFAULT_ASM_ENDIAN) /* Override the default target of the linker. */ #define LINK_TARGET_SPEC \ - ENDIAN_SELECT(, --oformat elf32-powerpcle, ) + ENDIAN_SELECT(, --oformat elf32-powerpcle, ) \ + %{!mbss-plt: %{!msecure-plt: %(link_secure_plt_default)}} /* Any specific OS flags. */ #define LINK_OS_SPEC \ @@ -762,17 +766,22 @@ ENDIAN_SELECT( -mbig, -mlittle, DEFAULT_ASM_ENDIAN) #define LINK_START_LINUX_SPEC +#define MUSL_DYNAMIC_LINKER_E ENDIAN_SELECT(,le,) + #define GLIBC_DYNAMIC_LINKER /lib/ld.so.1 #define UCLIBC_DYNAMIC_LINKER /lib/ld-uClibc.so.0 +#define MUSL_DYNAMIC_LINKER /lib/ld-musl-powerpc MUSL_DYNAMIC_LINKER_E .so.1 #if DEFAULT_LIBC == LIBC_UCLIBC -#define CHOOSE_DYNAMIC_LINKER(G, U) %{mglibc: G ;: U } +#define CHOOSE_DYNAMIC_LINKER(G, U, M) %{mglibc: G ;:%{mmusl: M ;: U }} +#elif DEFAULT_LIBC == LIBC_MUSL +#define CHOOSE_DYNAMIC_LINKER(G, U, M) %{mglibc: G ;:%{muclibc: U ;: M }} #elif !defined (DEFAULT_LIBC) || DEFAULT_LIBC == LIBC_GLIBC -#define CHOOSE_DYNAMIC_LINKER(G, U) %{muclibc: U ;: G } +#define CHOOSE_DYNAMIC_LINKER(G, U, M) %{muclibc: U ;:%{mmusl: M ;: G }} #else #error Unsupported DEFAULT_LIBC #endif #define GNU_USER_DYNAMIC_LINKER \ -
RE: [PATCH][combine][obvious] Use std::swap instead of manually swapping
On 27/04/15 15:37, Segher Boessenkool wrote: On Mon, Apr 27, 2015 at 10:55:17AM +0100, Kyrill Tkachov wrote: Precedents suggest these changes are considered obvious. So I'll commit this in a couple of days unless someone objects. Yes it's obvious. One tiny thing... @@ -9062,7 +9061,7 @@ known_cond (rtx x, enum rtx_code cond, rtx reg, rtx val) else if (COMPARISON_P (x) || COMMUTATIVE_ARITH_P (x)) { if (rtx_equal_p (XEXP (x, 0), val)) -cond = swap_condition (cond), temp = val, val = reg, reg = temp; +cond = swap_condition (cond), std::swap (val, reg); if (rtx_equal_p (XEXP (x, 0), reg) rtx_equal_p (XEXP (x, 1), val)) { Might as well write this as two statements, like everywhere else, e.g. @@ -11454,7 +11453,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) is already a constant integer. */ if (swap_commutative_operands_p (op0, op1)) { - tem = op0, op0 = op1, op1 = tem; + std::swap (op0, op1); code = swap_condition (code); } Thanks, Thanks, here's what I committed with r222468. Kyrill Segher combine-swap.patch Description: Binary data
Re: [PATCH][simplify-rtx][trivial] Use std::swap instead of manually swapping
On 04/27/2015 03:55 AM, Kyrill Tkachov wrote: Hi all, This patch replaces in simplify-rtx.c the manual swapping of values with std::swap. Precedents suggest these are considered obvious changes. Bootstrapped and tested on aarch64, x86_64. Will commit as obvious in a couple of days if no one objects Thanks, Kyrill 2015-04-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * simplify-rtx.c (simplify_gen_binary): Use std::swap instead of manually swapping. (simplify_associative_operation): Likewise. (simplify_binary_operation): Likewise. (simplify_plus_minus): Likewise. (simplify_relational_operation): Likewise. (simplify_ternary_operation): Likewise. OK. And just to be explicit, please consider any patch which just converts manual swapping to std::swap as pre-approved. Just post them for archival purposes. jeff
RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 18:06 To: GCC Patches Subject: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable This patch makes aarch64_min_divisions_for_recip_mul configurable for float and double. This allows CPUs with really fast or multiple dividers to return 3 (or even 4) if that happens to be faster overall. No code generation change - bootstrap regression OK. ChangeLog: 2015-03-03 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64-protos.h (tune_params): Add min_div_recip_mul_sf and min_div_recip_mul_df fields. * gcc/config/aarch64/aarch64.c (aarch64_min_divisions_for_recip_mul): Return value depending on target. (generic_tunings): Initialize new target settings. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 2 ++ gcc/config/aarch64/aarch64.c| 26 +++--- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 59c5824..4331e5c 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -177,6 +177,8 @@ struct tune_params const int int_reassoc_width; const int fp_reassoc_width; const int vec_reassoc_width; + const int min_div_recip_mul_sf; + const int min_div_recip_mul_df; }; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e22d72e..42a96f6 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -353,7 +353,9 @@ static const struct tune_params generic_tunings = 4, /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1 /* vec_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2 /* min_div_recip_mul_df. */ }; static const struct tune_params cortexa53_tunings = @@ -371,7 +373,9 @@ static const struct tune_params cortexa53_tunings = 4, /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1 /* vec_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2 /* min_div_recip_mul_df. */ }; static const struct tune_params cortexa57_tunings = @@ -389,7 +393,9 @@ static const struct tune_params cortexa57_tunings = 4, /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1 /* vec_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2 /* min_div_recip_mul_df. */ }; static const struct tune_params thunderx_tunings = @@ -406,7 +412,9 @@ static const struct tune_params thunderx_tunings = 8, /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1 /* vec_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2 /* min_div_recip_mul_df. */ }; static const struct tune_params xgene1_tunings = @@ -423,7 +431,9 @@ static const struct tune_params xgene1_tunings = 16,/* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1 /* vec_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2 /* min_div_recip_mul_df. */ }; /* A processor implementing AArch64. */ @@ -512,9 +522,11 @@ static const char * const aarch64_condition_codes[] = }; static unsigned int -aarch64_min_divisions_for_recip_mul (enum machine_mode mode ATTRIBUTE_UNUSED) +aarch64_min_divisions_for_recip_mul (enum machine_mode mode) { - return 2; + if (GET_MODE_UNIT_SIZE (mode) == 4) +return aarch64_tune_params-min_div_recip_mul_sf; + return aarch64_tune_params-min_div_recip_mul_df; } static int -- 1.9.1
RE: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 04 March 2015 15:38 To: GCC Patches Subject: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS Include the cost of op0 and op1 in all cases in PLUS and MINUS in aarch64_rtx_costs. Bootstrap regression OK. ChangeLog: 2015-03-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_rtx_costs): Calculate cost of op0 and op1 in PLUS and MINUS cases. --- gcc/config/aarch64/aarch64.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 39921a7..e22d72e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5794,6 +5794,8 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, op1 = XEXP (x, 1); cost_minus: + *cost += rtx_cost (op0, MINUS, 0, speed); + /* Detect valid immediates. */ if ((GET_MODE_CLASS (mode) == MODE_INT || (GET_MODE_CLASS (mode) == MODE_CC @@ -5801,13 +5803,10 @@ cost_minus: CONST_INT_P (op1) aarch64_uimm12_shift (INTVAL (op1))) { - *cost += rtx_cost (op0, MINUS, 0, speed); - if (speed) /* SUB(S) (immediate). */ *cost += extra_cost-alu.arith; return true; - } /* Look for SUB (extended register). */ @@ -5832,7 +5831,6 @@ cost_minus: *cost += aarch64_rtx_mult_cost (new_op1, MULT, (enum rtx_code) code, speed); - *cost += rtx_cost (op0, MINUS, 0, speed); return true; } @@ -5879,6 +5877,8 @@ cost_plus: return true; } + *cost += rtx_cost (op1, PLUS, 1, speed); + /* Look for ADD (extended register). */ if (aarch64_rtx_arith_op_extract_p (op0, mode)) { @@ -5900,12 +5900,10 @@ cost_plus: { *cost += aarch64_rtx_mult_cost (new_op0, MULT, PLUS, speed); - *cost += rtx_cost (op1, PLUS, 1, speed); return true; } - *cost += (rtx_cost (new_op0, PLUS, 0, speed) - + rtx_cost (op1, PLUS, 1, speed)); + *cost += rtx_cost (new_op0, PLUS, 0, speed); if (speed) { -- 1.9.1
Re: [PATCH 8/13] sh musl support
On 20/04/15 21:29, Oleg Endo wrote: On Mon, 2015-04-20 at 19:56 +0100, Szabolcs Nagy wrote: Set up dynamic linker name for sh. The SH parts are OK for trunk. Patch v2 It turns out we have to distinguish between hard float and nofpu ABI. I simply listed the options that turn on nofpu and hardfloat code generation (the single precision modes are not relevant to musl, currently musl only supports sh4 and up because it assumes either gusa or llsc atomics is available and requires an mmu, but I listed all the options for the nofpu check for completeness). gcc/Changelog: 2015-04-27 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * config/sh/linux.h (MUSL_DYNAMIC_LINKER): Define. (MUSL_DYNAMIC_LINKER_E, MUSL_DYNAMIC_LINKER_FP): Define. diff --git a/gcc/config/sh/linux.h b/gcc/config/sh/linux.h index 0f5d614..4c167c6 100644 --- a/gcc/config/sh/linux.h +++ b/gcc/config/sh/linux.h @@ -43,6 +43,28 @@ along with GCC; see the file COPYING3. If not see #define TARGET_ASM_FILE_END file_end_indicate_exec_stack +#if TARGET_ENDIAN_DEFAULT == MASK_LITTLE_ENDIAN +#define MUSL_DYNAMIC_LINKER_E %{mb:eb} +#else +#define MUSL_DYNAMIC_LINKER_E %{!ml:eb} +#endif + +#if TARGET_CPU_DEFAULT ( MASK_HARD_SH2A_DOUBLE | MASK_SH4 ) +/* -nofpu if any nofpu option is specified */ +#define MUSL_DYNAMIC_LINKER_FP \ + %{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu| \ + m4-340|m4-400|m4-500|m4al|m5-32media-nofpu|m5-64media-nofpu| \ + m5-compact-nofpu:-nofpu} +#else +/* -nofpu if none of the hard fpu options are specified */ +#define MUSL_DYNAMIC_LINKER_FP \ + %{m2a|m4|m4-100|m4-200|m4-300|m4a|m5-32media|m5-64media|m5-compact:;:-nofpu} +#endif + +#undef MUSL_DYNAMIC_LINKER +#define MUSL_DYNAMIC_LINKER \ + /lib/ld-musl-sh MUSL_DYNAMIC_LINKER_E MUSL_DYNAMIC_LINKER_FP .so.1 + #define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.2 #undef SUBTARGET_LINK_EMUL_SUFFIX
Re: [PATCH 9/13] x86 musl support
On 20/04/15 19:57, Szabolcs Nagy wrote: Set up dynamic linker name for x86. Patch v2. (undef MUSL_DYNAMIC_LINKER* before defining it). gcc/Changelog: 2015-04-27 Gregor Richards gregor.richa...@uwaterloo.ca * config/i386/linux.h (MUSL_DYNAMIC_LINKER): Define. * config/i386/linux64.h (MUSL_DYNAMIC_LINKER32): Define. (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERX32): Define. diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h index a100963..9c47fcb 100644 --- a/gcc/config/i386/linux.h +++ b/gcc/config/i386/linux.h @@ -21,3 +21,5 @@ along with GCC; see the file COPYING3. If not see #define GNU_USER_LINK_EMULATION elf_i386 #define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.2 +#undef MUSL_DYNAMIC_LINKER +#define MUSL_DYNAMIC_LINKER /lib/ld-musl-i386.so.1 diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h index a27d3be..e300480 100644 --- a/gcc/config/i386/linux64.h +++ b/gcc/config/i386/linux64.h @@ -30,3 +30,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define GLIBC_DYNAMIC_LINKER32 /lib/ld-linux.so.2 #define GLIBC_DYNAMIC_LINKER64 /lib64/ld-linux-x86-64.so.2 #define GLIBC_DYNAMIC_LINKERX32 /libx32/ld-linux-x32.so.2 + +#undef MUSL_DYNAMIC_LINKER32 +#define MUSL_DYNAMIC_LINKER32 /lib/ld-musl-i386.so.1 +#undef MUSL_DYNAMIC_LINKER64 +#define MUSL_DYNAMIC_LINKER64 /lib/ld-musl-x86_64.so.1 +#undef MUSL_DYNAMIC_LINKERX32 +#define MUSL_DYNAMIC_LINKERX32 /lib/ld-musl-x32.so.1
RE: [PATCH][AArch64] Use conditional negate for abs expansion
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 16:19 To: GCC Patches Subject: [PATCH][AArch64] Use conditional negate for abs expansion Expand abs into a compare and conditional negate. This is the most obvious expansion, enables merging of the comparison into ALU instructions and is faster on all implementations. Bootstrapped regression tested. int f(int x) { return abs (x + 1); } Before: add w0, w0, 1 sxtwx0, w0 eor x1, x0, x0, asr 63 sub x1, x1, x0, asr 63 mov x0, x1 ret After: addsw0, w0, 1 csneg w0, w0, w0, pl ret ChangeLog: 2015-03-03 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (absdi2): optimize abs expansion. (csneg3mode_insn): enable expansion of pattern. * gcc/testsuite/gcc.target/aarch64/abs_1.c (abs64): update test for new abs expansion. (abs64_in_dreg): likewise. --- gcc/config/aarch64/aarch64.md| 33 +++- gcc/testsuite/gcc.target/aarch64/abs_1.c | 5 ++--- 2 files changed, 9 insertions(+), 29 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1f4169e..46b7a63 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2172,35 +2172,16 @@ [(set_attr type alu_ext)] ) -(define_insn_and_split absdi2 - [(set (match_operand:DI 0 register_operand =r,w) - (abs:DI (match_operand:DI 1 register_operand r,w)))] +(define_expand absmode2 + [(match_operand:GPI 0 register_operand ) + (match_operand:GPI 1 register_operand )] - @ - # - abs\\t%d0, %d1 - reload_completed -GP_REGNUM_P (REGNO (operands[0])) -GP_REGNUM_P (REGNO (operands[1])) - [(const_int 0)] { -emit_insn (gen_rtx_SET (VOIDmode, operands[0], - gen_rtx_XOR (DImode, - gen_rtx_ASHIFTRT (DImode, -operands[1], -GEN_INT (63)), - operands[1]))); -emit_insn (gen_rtx_SET (VOIDmode, - operands[0], - gen_rtx_MINUS (DImode, -operands[0], -gen_rtx_ASHIFTRT (DImode, - operands[1], - GEN_INT (63); +rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx); +rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx); +emit_insn (gen_csneg3mode_insn (operands[0], x, operands[1], operands[1])); DONE; } - [(set_attr type alu_sreg) - (set_attr simd no,yes)] ) (define_insn negmode2 @@ -2879,7 +2860,7 @@ [(set_attr type csel)] ) -(define_insn *csneg3mode_insn +(define_insn csneg3mode_insn [(set (match_operand:GPI 0 register_operand =r) (if_then_else:GPI (match_operand 1 aarch64_comparison_operation ) diff --git a/gcc/testsuite/gcc.target/aarch64/abs_1.c b/gcc/testsuite/gcc.target/aarch64/abs_1.c index 938bc84..11f1095 100644 --- a/gcc/testsuite/gcc.target/aarch64/abs_1.c +++ b/gcc/testsuite/gcc.target/aarch64/abs_1.c @@ -7,15 +7,14 @@ extern void abort (void); long long abs64 (long long a) { - /* { dg-final { scan-assembler eor\t } } */ - /* { dg-final { scan-assembler sub\t } } */ + /* { dg-final { scan-assembler csneg\t } } */ return llabs (a); } long long abs64_in_dreg (long long a) { - /* { dg-final { scan-assembler abs\td\[0-9\]+, d\[0-9\]+ } } */ + /* { dg-final { scan-assembler csneg\t } } */ register long long x asm (d8) = a; register long long y asm (d9); asm volatile ( : : w (x)); -- 1.9.1
Re: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS
Hi Wilco, On 27/04/15 14:43, Wilco Dijkstra wrote: ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 04 March 2015 15:38 To: GCC Patches Subject: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS Include the cost of op0 and op1 in all cases in PLUS and MINUS in aarch64_rtx_costs. Bootstrap regression OK. ChangeLog: 2015-03-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_rtx_costs): Calculate cost of op0 and op1 in PLUS and MINUS cases. --- gcc/config/aarch64/aarch64.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 39921a7..e22d72e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5794,6 +5794,8 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, op1 = XEXP (x, 1); cost_minus: + *cost += rtx_cost (op0, MINUS, 0, speed); + /* Detect valid immediates. */ if ((GET_MODE_CLASS (mode) == MODE_INT || (GET_MODE_CLASS (mode) == MODE_CC @@ -5801,13 +5803,10 @@ cost_minus: CONST_INT_P (op1) aarch64_uimm12_shift (INTVAL (op1))) { - *cost += rtx_cost (op0, MINUS, 0, speed); - if (speed) /* SUB(S) (immediate). */ *cost += extra_cost-alu.arith; return true; - } /* Look for SUB (extended register). */ @@ -5832,7 +5831,6 @@ cost_minus: *cost += aarch64_rtx_mult_cost (new_op1, MULT, (enum rtx_code) code, speed); - *cost += rtx_cost (op0, MINUS, 0, speed); return true; } @@ -5879,6 +5877,8 @@ cost_plus: return true; } + *cost += rtx_cost (op1, PLUS, 1, speed); + I don't think this is correct. In the code directly below (when the aarch64_rtx_arith_op_extract_p condition is true) we have a shift/extend operation by a constant, so we don't want to take into account the cost of operand 1 (which is the extend+shift rtx). So, I looked at this code myself recently and found that the wrong logic is addressed by my patch at: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01617.html which fixes the case where we forget to cost operand0 in the MINUS case. What do you think? Kyrill /* Look for ADD (extended register). */ if (aarch64_rtx_arith_op_extract_p (op0, mode)) { @@ -5900,12 +5900,10 @@ cost_plus: { *cost += aarch64_rtx_mult_cost (new_op0, MULT, PLUS, speed); - *cost += rtx_cost (op1, PLUS, 1, speed); return true; } - *cost += (rtx_cost (new_op0, PLUS, 0, speed) - + rtx_cost (op1, PLUS, 1, speed)); + *cost += rtx_cost (new_op0, PLUS, 0, speed); if (speed) { -- 1.9.1
Re: [PATCH 3/13] aarch64 musl support
On 21/04/15 15:16, pins...@gmail.com wrote: I don't think you need to check if defaulting to little or big-endian here are the specs always have one or the other passing through. Also if musl does not support ilp32, you might want to error out. Or even define the dynamic linker name even before support goes into musl. Patch v2. (now _ilp32 abi variant is included too) gcc/Changelog: 2015-04-27 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * config/aarch64/aarch64-linux.h (MUSL_DYNAMIC_LINKER): Define. diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index 9abb252..2d647c3 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -23,6 +23,9 @@ #define GLIBC_DYNAMIC_LINKER /lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1 +#undef MUSL_DYNAMIC_LINKER +#define MUSL_DYNAMIC_LINKER /lib/ld-musl-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1 + #undef ASAN_CC1_SPEC #define ASAN_CC1_SPEC %{%:sanitize(address):-funwind-tables}
Re: [C/C++ PATCH] Improve -Wlogical-op (PR c/63357)
On Sat, Apr 25, 2015 at 10:18:59PM +0200, Gerald Pfeifer wrote: In case this example feels too contrived (even though it is an excerpt of Wine code), we now also warn about the following where the two types and variables are defined in different places and the size of one is set implicitly: typedef int r_fun_t (int); r_fun_t * text_funcs[] = {0,0,0}; int report (unsigned t) { typedef int s_fun_t (long, char); static s_fun_t * GUI_funcs[3]; return (t sizeof text_funcs / sizeof text_funcs[0] t sizeof GUI_funcs / sizeof GUI_funcs[0]); } (I also filed this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65891 so that we keep track.) I'm afraid there isn't an easy solution to this; the problem is that we fold sizeof early, so the warning sees t sizeof 4 t 4 and warns. Maybe using SIZEOF_EXPR would help... Marek
RE: [PATCH] Fix IRA register preferencing
Jeff Law wrote: On 12/10/14 06:26, Wilco Dijkstra wrote: If recomputing is best does that mean that record_reg_classes should not give a boost to the preferred class in the 2nd pass? Perhaps. I haven't looked deeply at this part of IRA. I was relaying my experiences with (ab)using the ira-reload callbacks to handle allocation after splitting -- where getting the costs and classes updated in a reasonable manner was clearly important to getting good code. One could probably argue I should have kept testcases from that work :-) I don't understand what purpose this has - if the preferred class is from the first pass, it is already correct, so all it does is boost the preferred class further. And if the preferred class is wrong (eg. after live range splitting), it will boost the incorrect class even harder, so in the end you never get a different class. It may be historical from the old regclass code, not really sure. From what you're saying, recomputing seems best, and I'd be happy to submit a patch to remove all the preferred class code from record_reg_classes. Recomputing certainly helped the cases I was looking at. However there is still the possibility the preferred class is queried before the recomputation happens (I think that is a case Renlin fixed). Either these should be faulted and fixed by forcing recomputation, or we need to provide a correct preferred class. That should be a copy of the original class. I believe I had copied the original classes, then recomputed them to avoid any uninitialized memory reads and the like. But looking at the old branch, I don't see the recomputation for classes (though I do see it for costs). Of course all the backwards walk stuff isn't there either, so there's clearly code I worked on extensively, but never committed... Well, looking into this further it does look like the preferences are improved during the 2nd pass (in particular fewer cases of the bad ALL_REGS preference that causes all the inefficient code), so it looks like recomputing preferences is not beneficial and my original patch is the right fix for now. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00829.html Wilco
update docs for --enable-languages
I noticed this while working on my mostlyclean patch. The list of languages in the docs for --enable-languages is incomplete. It is missing jit and lto. I also noticed that the grep command matches boot_language= in addition to language= which is a little confusing, so I added the ^. The sentence I added for lto is awkward. It isn't a default language, but it is built by default. Maybe this would make more sense if we talked about boot languages, but then that gets us into another mess describing exactly when languages are boot languages. C is always a boot language. C++ is a boot language if bootstrapping. And lto is a boot language if --enable-lto which is the default. Jim Index: ChangeLog === --- ChangeLog (revision 222491) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2015-04-27 Jim Wilson jim.wil...@linaro.org + + * install.texi (--enable-languages): Add missing jit and lto info. + Add ^ to grep command. + 2015-04-27 Yoshinori Sato ys...@users.sourceforge.jp * config.gcc: Add h8300-*-linux. Index: doc/install.texi === --- doc/install.texi (revision 222491) +++ doc/install.texi (working copy) @@ -1544,15 +1544,17 @@ @var{langN} you can issue the following command in the @file{gcc} directory of your GCC source tree:@* @smallexample -grep language= */config-lang.in +grep ^language= */config-lang.in @end smallexample Currently, you can use any of the following: @code{all}, @code{ada}, @code{c}, @code{c++}, @code{fortran}, -@code{go}, @code{java}, @code{objc}, @code{obj-c++}. +@code{go}, @code{java}, @code{jit}, @code{lto}, @code{objc}, @code{obj-c++}. Building the Ada compiler has special requirements, see below. If you do not pass this flag, or specify the option @code{all}, then all default languages available in the @file{gcc} sub-tree will be configured. -Ada, Go and Objective-C++ are not default languages; the rest are. +Ada, Go, Jit, and Objective-C++ are not default languages. LTO is not a +default language, but is built by default because @option{--enable-lto} is +enabled by default. The other languages are default languages. @item --enable-stage1-languages=@var{lang1},@var{lang2},@dots{} Specify that a particular subset of compilers and their runtime
[PATCH] libstdc++: Fix list.cc xmethods test.
Hi. While we should eventually get the xmethods to handle cxx11, this patch fixes the current failure. The xmethod matcher doesn't currently handle __cxx11 in the type name. Adding cxx11 support can be a follow up patch. Ok to commit? 2015-04-27 Doug Evans d...@google.com * testsuite/libstdc++-xmethods/list.cc (_GLIBCXX_USE_CXX11_ABI): Define to zero. Index: testsuite/libstdc++-xmethods/list.cc === --- testsuite/libstdc++-xmethods/list.cc(revision 222477) +++ testsuite/libstdc++-xmethods/list.cc(working copy) @@ -18,6 +18,9 @@ // with this library; see the file COPYING3. If not see // http://www.gnu.org/licenses/. +// List xmethods only recognize the non cxx11 std::list for now. +#define _GLIBCXX_USE_CXX11_ABI 0 + #include list int
[PATCH] [libstdc++/65839] whatis support for xmethods
Hi. This patch is the counterpart to this patch to fix libstdc++/65839, gdb/18285. https://sourceware.org/ml/gdb-patches/2015-04/msg00947.html Regression tested on amd64-linux with/without a patched gdb. Without a patched gdb the new tests fail, but that's good. 2015-04-27 Doug Evans d...@google.com PR libstdc++/65839 * python/libstdcxx/v6/xmethods.py (get_bool_type): New function. Replace all lookups of bool with this. (get_std_size_type): New function. Replace all lookups of std::size_t with this. (ArrayWorkerBase): Rename arg valtype to elem_type for consistency, and save it in self._elem_type. (DequeWorkerBase): Save elemtype. (ListWorkerBase): New arg elem_type. (ListMethodsMatcher): Update call to method.worker_class. (UniquePtrGetWorker.__init__): New arg ptr_type. Delete setting of name, enabled. (UniquePtrDerefWorker.__init__): New arg ptr_type. Delete setting of name. (UniquePtrMethodsMatcher): Rewrite for consistency with all other libstdc++ xmethod matchers. (*Worker): New method get_result_type. * testsuite/libstdc++-xmethods/array.cc: Add whatis tests. * testsuite/libstdc++-xmethods/associative-containers.cc: Ditto. * testsuite/libstdc++-xmethods/deque.cc: Ditto. * testsuite/libstdc++-xmethods/forwardlist.cc: Ditto. * testsuite/libstdc++-xmethods/unique_ptr.cc: Ditto. * testsuite/libstdc++-xmethods/vector.cc: Ditto. * testsuite/libstdc++-xmethods/list.cc: Ditto. Index: python/libstdcxx/v6/xmethods.py === --- python/libstdcxx/v6/xmethods.py (revision 222477) +++ python/libstdcxx/v6/xmethods.py (working copy) @@ -21,6 +21,12 @@ import re matcher_name_prefix = 'libstdc++::' +def get_bool_type(): +return gdb.lookup_type('bool') + +def get_std_size_type(): +return gdb.lookup_type('std::size_t') + class LibStdCxxXMethod(gdb.xmethod.XMethod): def __init__(self, name, worker_class): gdb.xmethod.XMethod.__init__(self, name) @@ -29,13 +35,13 @@ class LibStdCxxXMethod(gdb.xmethod.XMethod): # Xmethods for std::array class ArrayWorkerBase(gdb.xmethod.XMethodWorker): -def __init__(self, valtype, size): -self._valtype = valtype +def __init__(self, elem_type, size): +self._elem_type = elem_type self._size = size def null_value(self): nullptr = gdb.parse_and_eval('(void *) 0') -return nullptr.cast(self._valtype.pointer()).dereference() +return nullptr.cast(self._elem_type.pointer()).dereference() class ArraySizeWorker(ArrayWorkerBase): def __init__(self, valtype, size): @@ -44,6 +50,9 @@ class ArraySizeWorker(ArrayWorkerBase): def get_arg_types(self): return None +def get_result_type(self, obj): +return get_std_size_type() + def __call__(self, obj): return self._size @@ -54,6 +63,9 @@ class ArrayEmptyWorker(ArrayWorkerBase): def get_arg_types(self): return None +def get_result_type(self, obj): +return get_bool_type() + def __call__(self, obj): return (int(self._size) == 0) @@ -64,6 +76,9 @@ class ArrayFrontWorker(ArrayWorkerBase): def get_arg_types(self): return None +def get_result_type(self, obj): +return self._elem_type + def __call__(self, obj): if int(self._size) 0: return obj['_M_elems'][0] @@ -77,6 +92,9 @@ class ArrayBackWorker(ArrayWorkerBase): def get_arg_types(self): return None +def get_result_type(self, obj): +return self._elem_type + def __call__(self, obj): if int(self._size) 0: return obj['_M_elems'][self._size - 1] @@ -88,8 +106,11 @@ class ArrayAtWorker(ArrayWorkerBase): ArrayWorkerBase.__init__(self, valtype, size) def get_arg_types(self): -return gdb.lookup_type('std::size_t') +return get_std_size_type() +def get_result_type(self, obj, index): +return self._elem_type + def __call__(self, obj, index): if int(index) = int(self._size): raise IndexError('Array index %d should not be = %d.' % @@ -101,8 +122,11 @@ class ArraySubscriptWorker(ArrayWorkerBase): ArrayWorkerBase.__init__(self, valtype, size) def get_arg_types(self): -return gdb.lookup_type('std::size_t') +return get_std_size_type() +def get_result_type(self, obj, index): +return self._elem_type + def __call__(self, obj, index): if int(self._size) 0: return obj['_M_elems'][index] @@ -140,6 +164,7 @@ class ArrayMethodsMatcher(gdb.xmethod.XMethodMatch class DequeWorkerBase(gdb.xmethod.XMethodWorker): def __init__(self, elemtype): +self._elemtype = elemtype self._bufsize = (512 / elemtype.sizeof) or
Re: [PATCH] libstdc++: Fix list.cc xmethods test.
On 27 April 2015 at 23:33, Doug Evans wrote: Hi. While we should eventually get the xmethods to handle cxx11, this patch fixes the current failure. The xmethod matcher doesn't currently handle __cxx11 in the type name. Adding cxx11 support can be a follow up patch. Agreed. And when that's done we should backport it to the gcc-5-branch too. Ok to commit? OK for trunk, thanks
[libstdc++, committed] fix PR65909
This patch works around a libstdc++ testsuite issue: the generated program that tests for the availability of named locales dereferences invalid pointers on targets that don't allow command-line arguments to be passed. (In particular, I ran into this with a simulator for an embedded target that produces a ton of verbose error messages when it hits an invalid memory reference.) Jonathan already approved the fix in the issue, so I've gone ahead and checked it in. -Sandra 2015-04-27 Sandra Loosemore san...@codesourcery.com PR libstdc++/65909 libstdc++-v3/ * testsuite/lib/libstdc++.exp (check_v3_target_namedlocale): Make the generated test program fail gracefully if the target doesn't support passing command-line arguments. Index: libstdc++-v3/testsuite/lib/libstdc++.exp === --- libstdc++-v3/testsuite/lib/libstdc++.exp (revision 222496) +++ libstdc++-v3/testsuite/lib/libstdc++.exp (working copy) @@ -901,6 +901,11 @@ proc check_v3_target_namedlocale { args puts $f using namespace std; puts $f int main (int argc, char** argv) puts $f { + puts $f if (argc 2) + puts $f { + puts $f printf(\locale support test not supported\\n\); + puts $f return 1; + puts $f } puts $f try puts $f { puts $f locale(*(argv + 1));
Re: [Patch] Add regex_constants::__polynomial
On Mon, Apr 27, 2015 at 4:57 AM, Jonathan Wakely jwak...@redhat.com wrote: OK for trunk. Committed. -- Regards, Tim Shen
PR65217.c, improve canonicalization of implied copy from equality comparison
Richi recently changed tree-ssa-dom.c::record_equality to use tree_swap_operands_p to canonicalize the implied copy we record for equality comparisons. This is a good thing. However, there is a case where tree_swap_operands_p gives us operands in an undesirable order for this routine. Specifically when both operands are SSA_NAMEs, but just one has a single use (in the equality test). In that case, we want the single use SSA_NAME on the LHS of the recorded copy. That, in effect, prevents DOM from destroying the single use nature of that SSA_NAME. And if we're ultimately able to remove the comparison, the statements that fed the single use SSA_NAME can be removed as they'll be dead. Given this issue is so closely tied to DOM, Richi and I agreed to fix this in DOM rather than in tree_swap_operands_p. Had we tried to address this in tree_swap_operands_p, we'll end up regressing elsewhere, which is clearly not good. Anyway, bootstrapped and regression tested on x86_64-linux-gnu. Fixes the recently XFAILed 65217 test. Installed on the trunk. Jeff commit a787980e8012128b408ca8b716028e71d1b21373 Author: root root@localhost.localdomain Date: Mon Apr 27 21:58:52 2015 -0600 PR tree-optimization/65217 * tree-ssa-dom.c (record_equality): Given two SSA_NAMEs, if just one of them has a single use, make sure it is the LHS of the implied copy. PR tree-optimization/65217 * gcc.target/i386/pr65217.c: Remove XFAIL. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fff0015..1a82f17 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2015-04-27 Jeff Law l...@redhat.com + + PR tree-optimization/65217 + * tree-ssa-dom.c (record_equality): Given two SSA_NAMEs, if just one + of them has a single use, make sure it is the LHS of the implied + copy. + 2015-04-28 Alan Modra amo...@gmail.com PR target/65810 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 67bdd69..0fc2384 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2015-04-27 Jeff Law l...@redhat.com + +PR tree-optimization/65217 + * gcc.target/i386/pr65217.c: Remove XFAIL. + 2015-04-27 Andre Vehreschild ve...@gmx.de PR fortran/60322 diff --git a/gcc/testsuite/gcc.target/i386/pr65217.c b/gcc/testsuite/gcc.target/i386/pr65217.c index 3f495b2..d5c9be5 100644 --- a/gcc/testsuite/gcc.target/i386/pr65217.c +++ b/gcc/testsuite/gcc.target/i386/pr65217.c @@ -1,7 +1,7 @@ /* { dg-do compile } */ /* { dg-options -O } */ -/* { dg-final { scan-assembler-not negl { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-not andl { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not negl } } */ +/* { dg-final { scan-assembler-not andl } } */ int test(int n) diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index a67b4e4..c7d427b 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1762,6 +1762,20 @@ record_equality (tree x, tree y) if (tree_swap_operands_p (x, y, false)) std::swap (x, y); + /* Most of the time tree_swap_operands_p does what we want. But there's + cases where we we know one operand is better for copy propagation than + the other. Given no other code cares about ordering of equality + comparison operators for that purpose, we just handle the special cases + here. */ + if (TREE_CODE (x) == SSA_NAME TREE_CODE (y) == SSA_NAME) +{ + /* If one operand is a single use operand, then make it +X. This will preserve its single use properly and if this +conditional is eliminated, the computation of X can be +eliminated as well. */ + if (has_single_use (y) ! has_single_use (x)) + std::swap (x, y); +} if (TREE_CODE (x) == SSA_NAME) prev_x = SSA_NAME_VALUE (x); if (TREE_CODE (y) == SSA_NAME)
Re: Improve LTO type checking during symtab merging
Hi, actually I bootstrapped/regtested without fortran (as I frogot the setting from LTO bootstrap). There are several new warnings in the fortran's testsuite. As far as I can tell, they represent real mismatches. I wonder, do we want to fix the testcases (some fortran-fu would be needed), disable warnings on those or explicitely allow excess warnings (bcause we still have no way to match link-time warnings)? Honza spawn /aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../gfortran -B/aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../ -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/ f_lto_20091028-1_0.o f_lto_20091028-1_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -flto -flto-partition=none -fuse-linker-plugin -r -nostdlib -finline-functions -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libatomic/.libs -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -o gfortran-dg-lto-20091028-1-01.exe /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/20091028-1_0.f90:7:0: warning: type of 'int_gen_ti_header_c' does not match original declaration lto1: note: return value type mismatch lto1: note: type 'int' should match type 'void' /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/20091028-1_1.c:3:5: note: previously declared here spawn /aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../gfortran -B/aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../ -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/ f_lto_pr40724_0.o f_lto_pr40724_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -flto -flto-partition=none -fuse-linker-plugin -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libatomic/.libs -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -o gfortran-dg-lto-pr40724-01.exe /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/pr40724_1.f:2:0: warning: type of 'f' does not match original declaration lto1: note: types have different parameter counts /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/pr40724_0.f:1:0: note: previously declared here spawn /aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../gfortran -B/aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../ -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/ f_lto_pr41069_0.o f_lto_pr41069_1.o f_lto_pr41069_2.o -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -flto -flto-partition=none -fuse-linker-plugin -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libatomic/.libs -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libquadmath/.libs -o gfortran-dg-lto-pr41069-01.exe /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/pr41069_1.f90:4:0: warning: type of 'mltfftsg' does not match original declaration lto1: note: types have different parameter counts /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/pr41069_1.f90:4:0: warning: type of 'mltfftsg' does not match original declaration lto1: note: types have different parameter counts /aux/hubicka/trunk-8/gcc/testsuite/gfortran.dg/lto/pr41069_0.f90:2:0: note: previously declared here spawn /aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../gfortran -B/aux/hubicka/trunk-8/build5/gcc/testsuite/gfortran3/../../ -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/ f_lto_pr41521_0.o f_lto_pr41521_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -g -O -flto -B/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libgfortran/.libs -L/aux/hubicka/trunk-8/build5/x86_64-unknown-linux-gnu/./libatomic/.libs
Re: [PATCH] libmpx: remove AC_PROG_CXX/GCC_LIBSTDCXX_RAW_CXX_FLAGS from configure.ac
2015-04-23 12:53 GMT+03:00 Steven Noonan ste...@uplinklabs.net: This causes libmpx to fail the configure stage on my build hosts because 'xg++' was invoked with the bogus '-funconfigured-libstdc++-v3' flag, and the fallback preprocessor '/lib/cpp' did not exist on my systems. Since libmpx has no C++ code, requiring a C++ preprocessor and compiler at configure time doesn't make any sense. Simply remove the directives from configure.ac. I tested this change on my build environments and the libmpx configure failures I was seeing went away. Recommend this be applied to trunk and the gcc-5 branch. Thank you for the patch. I applied it to trunk and will apply to gcc-5 branch later. Ilya ChangeLog 2015-04-23 Steven Noonan ste...@uplinklabs.net * configure.ac: Drop AC_PROG_CXX and friends, since libmpx has no C++ sources. * configure: Regenerate. --- libmpx/configure.ac | 2 -- 1 file changed, 2 deletions(-) diff --git a/libmpx/configure.ac b/libmpx/configure.ac index 3f8b50f..463e855 100644 --- a/libmpx/configure.ac +++ b/libmpx/configure.ac @@ -23,7 +23,6 @@ AC_MSG_RESULT($version_specific_libs) AC_CANONICAL_SYSTEM target_alias=${target_alias-$host_alias} AC_SUBST(target_alias) -GCC_LIBSTDCXX_RAW_CXX_FLAGS # See if supported. unset LIBMPX_SUPPORTED @@ -94,7 +93,6 @@ AC_SUBST(toolexeclibdir) m4_rename([_AC_ARG_VAR_PRECIOUS],[real_PRECIOUS]) m4_define([_AC_ARG_VAR_PRECIOUS],[]) AC_PROG_CC -AC_PROG_CXX m4_rename_force([real_PRECIOUS],[_AC_ARG_VAR_PRECIOUS]) AM_PROG_CC_C_O -- 2.3.6
Re: [PATCH v2 3/4] libcc1: Add 'set compile-gcc'
On 04/23/2015 09:38 PM, Jan Kratochvil wrote: - configury triplet prefix to the compiler. + configury triplet prefix to the compiler; TRIPLET_REGEXP can be + also absolute filename to the computer. to the computer is probably a typo for to the compiler. Also, double space after filename. The arguments are copied by GCC. ARGV need not be Thanks, Pedro Alves
Re: [PATCH] Make wider use of v constraint in i386.md
On 17 Apr 10:09, Uros Bizjak wrote: On Thu, Mar 19, 2015 at 10:24 AM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, There were some discussion about x constraints being too conservative for some patterns in i386.md. Patch below fixes it. This is probably stage1 material. ChangeLog: gcc/ 2015-03-19 Ilya Tocar ilya.to...@intel.com * config/i386/i386.h (EXT_SSE_REG_P): New. * config/i386/i386.md (*cmpiFPCMP:unordMODEF:mode_mixed): Use v constraint. (*cmpiFPCMP:unordMODEF:mode_sse): Ditto. (*movxi_internal_avx512f): Ditto. (define_split): Check for xmm16+, when splitting scalar float_extend. (*extendsfdf2_mixed): Use v constraint. (*extendsfdf2_sse): Ditto. (define_split): Check for xmm16+, when splitting scalar float_truncate. (*truncdfsf_fast_sse): Use v constraint. (fix_truncMODEF:modeSWI48:mode_sse): Ditto. (*floatSWI48:modeMODEF:mode2_sse): Ditto. (define_peephole2): Check for xmm16+, when converting scalar float_truncate. (define_peephole2): Check for xmm16+, when converting scalar float_extend. (*fop_mode_comm_mixed): Use v constraint. (*fop_mode_comm_sse): Ditto. (*fop_mode_1_mixed): Ditto. (*sqrtmode2_sse): Ditto. (*ieee_sieee_maxminmode3): Ditto. I wonder if there are also changes needed in mmx.md. There are a couple of patterns that operate on xmm registers, so they should be reviewed if they need to change their constraint to v to accept extended xmm register set. Doesn't look like it. At first glance non-v stuff in mmx.md isn't even avx enabled, so v is not relevant. Moreover we don't allow mmx modes (v2sf etc.) in xmm16+ via ix86_hard_regno_mode_ok.
Re: [C/C++ PATCH] Improve -Wlogical-op (PR c/63357)
On 04/27/2015 08:47 AM, Marek Polacek wrote: On Sat, Apr 25, 2015 at 10:18:59PM +0200, Gerald Pfeifer wrote: In case this example feels too contrived (even though it is an excerpt of Wine code), we now also warn about the following where the two types and variables are defined in different places and the size of one is set implicitly: typedef int r_fun_t (int); r_fun_t * text_funcs[] = {0,0,0}; int report (unsigned t) { typedef int s_fun_t (long, char); static s_fun_t * GUI_funcs[3]; return (t sizeof text_funcs / sizeof text_funcs[0] t sizeof GUI_funcs / sizeof GUI_funcs[0]); } (I also filed this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65891 so that we keep track.) I'm afraid there isn't an easy solution to this; the problem is that we fold sizeof early, so the warning sees t sizeof 4 t 4 and warns. Maybe using SIZEOF_EXPR would help... Can you file a bug for this for future reference? We may not tackle this specific issue in the current iteration of delayed folding, but it certainly helps to have BZs for specific issues that we'd like to fix once we can delay folding. jeff
Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)
On Sat, Apr 25, 2015 at 08:13:08PM +, Joseph Myers wrote: On Sat, 25 Apr 2015, Marek Polacek wrote: + pedwarn (location, OPT_Wshift_negative_value, +left shift of negative value); Use of pedwarn is always suspect for something only undefined at runtime; it must not produce an error with -pedantic-errors in any context where a constant expression is not required. Makes sense. So how about moving the warning into -Wextra? This way it won't trigger by default. One change is that we reject programs that use shift with undefined behavior in a context where a constant expression is required, thus e.g. enum E { A = -1 0 }; But I hope that's reasonable. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-04-27 Marek Polacek pola...@redhat.com PR c/65179 * c-common.c (c_fully_fold_internal): Warn when left shifting a negative value. * c.opt (Wshift-negative-value): New option. * c-typeck.c (build_binary_op): Warn when left shifting a negative value. * typeck.c (cp_build_binary_op): Warn when left shifting a negative value. * doc/invoke.texi: Document -Wshift-negative-value. * c-c++-common/Wshift-negative-value-1.c: New test. * c-c++-common/Wshift-negative-value-2.c: New test. * c-c++-common/Wshift-negative-value-3.c: New test. * c-c++-common/Wshift-negative-value-4.c: New test. * gcc.dg/c99-const-expr-7.c: Add dg-error. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index 9797e17..f2fe65e 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -1361,6 +1361,15 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands, !TREE_OVERFLOW_P (op0) !TREE_OVERFLOW_P (op1)) overflow_warning (EXPR_LOCATION (expr), ret); + if (code == LSHIFT_EXPR + TREE_CODE (orig_op0) != INTEGER_CST + TREE_CODE (TREE_TYPE (orig_op0)) == INTEGER_TYPE + TREE_CODE (op0) == INTEGER_CST + c_inhibit_evaluation_warnings == 0 + tree_int_cst_sgn (op0) 0 + flag_isoc99) + warning_at (loc, OPT_Wshift_negative_value, + left shift of negative value); if ((code == LSHIFT_EXPR || code == RSHIFT_EXPR) TREE_CODE (orig_op1) != INTEGER_CST TREE_CODE (op1) == INTEGER_CST diff --git gcc/c-family/c.opt gcc/c-family/c.opt index 983f4a8..c61dfb1 100644 --- gcc/c-family/c.opt +++ gcc/c-family/c.opt @@ -781,6 +781,10 @@ Wshift-count-overflow C ObjC C++ ObjC++ Var(warn_shift_count_overflow) Init(1) Warning Warn if shift count = width of type +Wshift-negative-value +C ObjC C++ ObjC++ Var(warn_shift_negative_value) Warning EnabledBy(Wextra) +Warn if left shifting a negative value + Wsign-compare C ObjC C++ ObjC++ Var(warn_sign_compare) Warning LangEnabledBy(C++ ObjC++,Wall) Warn about signed-unsigned comparisons diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 91735b5..36cebc6 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -10707,6 +10707,15 @@ build_binary_op (location_t location, enum tree_code code, code1 == INTEGER_TYPE) { doing_shift = true; + if (TREE_CODE (op0) == INTEGER_CST + tree_int_cst_sgn (op0) 0 + flag_isoc99) + { + int_const = false; + if (c_inhibit_evaluation_warnings == 0) + warning_at (location, OPT_Wshift_negative_value, + left shift of negative value); + } if (TREE_CODE (op1) == INTEGER_CST) { if (tree_int_cst_sgn (op1) 0) diff --git gcc/cp/typeck.c gcc/cp/typeck.c index 91db32a..3cb5a8a 100644 --- gcc/cp/typeck.c +++ gcc/cp/typeck.c @@ -4326,11 +4326,21 @@ cp_build_binary_op (location_t location, } else if (code0 == INTEGER_TYPE code1 == INTEGER_TYPE) { + tree const_op0 = fold_non_dependent_expr (op0); + if (TREE_CODE (const_op0) != INTEGER_CST) + const_op0 = op0; tree const_op1 = fold_non_dependent_expr (op1); if (TREE_CODE (const_op1) != INTEGER_CST) const_op1 = op1; result_type = type0; doing_shift = true; + if (TREE_CODE (const_op0) == INTEGER_CST + tree_int_cst_sgn (const_op0) 0 + (complain tf_warning) + c_inhibit_evaluation_warnings == 0 + cxx_dialect = cxx11) + warning (OPT_Wshift_negative_value, +left shift of negative value); if (TREE_CODE (const_op1) == INTEGER_CST) { if (tree_int_cst_lt (const_op1, integer_zero_node)) diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index 7d2f6e5..dcfe4cf 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -271,7 +271,7 @@ Objective-C and Objective-C++ Dialects}. -Wpointer-arith -Wno-pointer-to-int-cast @gol
Re: [PATCH][combine][obvious] Use std::swap instead of manually swapping
On 04/27/2015 08:37 AM, Segher Boessenkool wrote: On Mon, Apr 27, 2015 at 10:55:17AM +0100, Kyrill Tkachov wrote: Precedents suggest these changes are considered obvious. So I'll commit this in a couple of days unless someone objects. Yes it's obvious. One tiny thing... @@ -9062,7 +9061,7 @@ known_cond (rtx x, enum rtx_code cond, rtx reg, rtx val) else if (COMPARISON_P (x) || COMMUTATIVE_ARITH_P (x)) { if (rtx_equal_p (XEXP (x, 0), val)) - cond = swap_condition (cond), temp = val, val = reg, reg = temp; + cond = swap_condition (cond), std::swap (val, reg); if (rtx_equal_p (XEXP (x, 0), reg) rtx_equal_p (XEXP (x, 1), val)) { Might as well write this as two statements, like everywhere else, e.g. Agreed. I really dislike using ',' like is shown above from a readability standpoint. jeff
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Fri, 2015-04-24 at 11:40 -0400, David Edelsohn wrote: On Fri, Apr 24, 2015 at 11:34 AM, Segher Boessenkool All looks great to me now, thanks for the changes, Okay for me. In back porting my patch, I see that the patch doesn't apply cleanly to the 4.8/4.9 branch code. Looking at why, I see that it's because the 4.8/4.9 code doesn't have the following change from Segher. I think we want this patch back ported too, don't we? 2014-09-11 Segher Boessenkool seg...@kernel.crashing.org * config/rs6000/htm.md (tabort, tabortdc, tabortdci, tabortwc, tabortwci, tbegin, tcheck, tend, trechkpt, treclaim, tsr): Use xor instead of minus. * config/rs6000/vector.md (cr6_test_for_zero_reverse, cr6_test_for_lt_reverse): Ditto. If so, I can back port it and test it along with mine. Otherwise, I'll just work around it. Let me know what you want me to do. Peter
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
Hi! Ping. On Tue, 10 Mar 2015 13:37:47 +0100, I wrote: Ping. On Thu, 19 Feb 2015 09:39:52 +0100, I wrote: On Wed, 18 Feb 2015 18:04:21 +0100, I wrote: On Mon, 13 Oct 2014 14:33:11 +0400, Ilya Verbin iver...@gmail.com wrote: On 13 Oct 12:19, Jakub Jelinek wrote: On Sat, Oct 11, 2014 at 06:49:00PM +0400, Ilya Verbin wrote: 2. -foffload-abi=[lp64|ilp32] This option is supposed to tell mkoffload (and offload compiler) which ABI is used in streamed GIMPLE. This option is desirable, because host and offload compilers must have the same ABI. The option is generated by the host compiler automatically, it should not be specified by user. But I'd like to understand why is this one needed. Why should the compilers care? Aggregates layout and alignment of integral/floating types must match between host and offload compilers, sure, but isn't that something streamed already in the LTO bytecode? Or is LTO streamer not streaming some types like long_type_node? I'd expect if host and offload compiler disagree on long type size that you'd just use a different integral type with the same size as long on the host. Different sized pointers are of course a bigger problem, but can't you just error out on that during reading of the LTO, or even handle it (just use some integral type for when is the pointer stored in memory, and just convert to pointer after reads from memory, and convert back before storing to memory). Erroring out during LTO streaming in sounds just fine to me though. Actually this option was developed by Bernd, so I think PTX team is going to use it somehow. In MIC's case we're planning just to check in mkoffload that host and target compiler's ABI are the same. Without this check we will crash in LTO streamer with ICE, so I'd like to issue an error message, rather than crashing. In gcc/config/i386/intelmic-mkoffload.c, this option is now parsed to initialize the target_ilp32 variable, which will then be used (target_ilp32 ? -m32 : -m64) when invoking different tools. In nvptx, we've been using the following approach: --- gcc/config/nvptx/nvptx.h +++ gcc/config/nvptx/nvptx.h @@ -54,24 +54,28 @@ /* Type Layout. */ +#define TARGET_64BIT \ + (flag_offload_abi == OFFLOAD_ABI_UNSET ? TARGET_ABI64 \ + : flag_offload_abi == OFFLOAD_ABI_LP64 ? true : false) + #define DEFAULT_SIGNED_CHAR 1 #define SHORT_TYPE_SIZE 16 #define INT_TYPE_SIZE 32 -#define LONG_TYPE_SIZE (TARGET_ABI64 ? 64 : 32) +#define LONG_TYPE_SIZE (TARGET_64BIT ? 64 : 32) #define LONG_LONG_TYPE_SIZE 64 #define FLOAT_TYPE_SIZE 32 #define DOUBLE_TYPE_SIZE 64 #define LONG_DOUBLE_TYPE_SIZE 64 #undef SIZE_TYPE -#define SIZE_TYPE (TARGET_ABI64 ? long unsigned int : unsigned int) +#define SIZE_TYPE (TARGET_64BIT ? long unsigned int : unsigned int) #undef PTRDIFF_TYPE -#define PTRDIFF_TYPE (TARGET_ABI64 ? long int : int) +#define PTRDIFF_TYPE (TARGET_64BIT ? long int : int) -#define POINTER_SIZE (TARGET_ABI64 ? 64 : 32) +#define POINTER_SIZE (TARGET_64BIT ? 64 : 32) -#define Pmode (TARGET_ABI64 ? DImode : SImode) +#define Pmode (TARGET_64BIT ? DImode : SImode) /* Registers. Since ptx is a virtual target, we just define a few hard registers for special purposes and leave pseudos unallocated. */ Should we settle on one of the two, that is, either pass -m[...] from mkoffload, or handle flag_offload_abi in the respective backend? I think I prefer the intelmic-mkoffload.c approach; this seems cleaner to me: mkoffload configures the offloading compiler. (Also, the flag 32-bit vs. 64-bit flag may in fact be needed for tools other than the offloading compiler). Well, for instance, when in -m32 mode, we also need to use it in gcc/config/nvptx/mkoffload.c:compile_native -- otherwise, later on, the linker will not be happy with the request to link a 64-bit object file with 32-bit object files... Bernd, is there any specific reason for the approach you had chosen? OK to do the following instead? (Coding style/code copied from gcc/config/i386/intelmic-mkoffload.c for consistency.) --- gcc/config/nvptx/mkoffload.c +++ gcc/config/nvptx/mkoffload.c @@ -126,6 +126,9 @@ static id_map *var_ids, **vars_tail = var_ids; static const char *ptx_name; static const char *ptx_cfile_name; +/* Shows if we should compile binaries for i386 instead of x86-64. */ +bool target_ilp32 = false; + /* Delete tempfiles. */ /* Unlink a temporary file unless requested otherwise. */ @@ -883,6 +886,7 @@ compile_native (const char *infile, const char *outfile, const char *compiler) struct obstack argv_obstack;
Re: [PATCH][AArch64] Improve spill code - swap order in shl pattern
On Mon, Apr 27, 2015 at 02:37:12PM +0100, Wilco Dijkstra wrote: Various instructions are supported as integer operations as well as SIMD on AArch64. When register pressure is high, lra-constraints inserts spill code without taking the allocation class into account, and basically chooses the first available pattern that matches. Since this instruction has the SIMD version first it is usually chosen eventhough some of the operands are eventually allocated to integer registers. The result is inefficient code not only due to the higher latency of SIMD instructions but also due to the extra int-FP moves. Placing the integer variant first in the shl pattern generates far more optimal spill code. A few more patterns are the wrong way around, which I'll address in a separate patch. I'm also looking into fixing lra-constraints to generate the expected code by taking the allocno class into account in the cost calculations during spilling. 2015-04-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3): Place integer variant first. OK, thanks for the fix. Cheers, James
Re: Offloading compilers' libgcc installation
On Mon, Apr 27, 2015 at 06:14:36PM +0200, Thomas Schwinge wrote: Hi! Ping. (Or can Bernd just commit this patch, http://news.gmane.org/find-root.php?message_id=%3C54E5ACCE.7080502%40codesourcery.com%3E (with my review comment addressed, http://news.gmane.org/find-root.php?message_id=%3C87h9uhlypt.fsf%40schwinge.name%3E), given his nvptx architecture maintainership?) Ok for trunk. Jakub
RE: [PATCH][AArch64] Use conditional negate for abs expansion
James Greenhalgh wrote: On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote: -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 16:19 To: GCC Patches Subject: [PATCH][AArch64] Use conditional negate for abs expansion Expand abs into a compare and conditional negate. This is the most obvious expansion, enables merging of the comparison into ALU instructions and is faster on all implementations. Bootstrapped regression tested. int f(int x) { return abs (x + 1); } Before: add w0, w0, 1 sxtwx0, w0 eor x1, x0, x0, asr 63 sub x1, x1, x0, asr 63 mov x0, x1 ret After: addsw0, w0, 1 csneg w0, w0, w0, pl ret ChangeLog: 2015-03-03 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (absdi2): optimize abs expansion. (csneg3mode_insn): enable expansion of pattern. * gcc/testsuite/gcc.target/aarch64/abs_1.c (abs64): update test for new abs expansion. (abs64_in_dreg): likewise. This looks like it breaks support for abs in a D register (for example at the end of a loop, or extracted from Neon Intrinsics, etc). e.g. (totally contrived...) int64x1_t abs_max (int64x2_t x, int64_t *wb) { int64_t y = vgetq_lane_s64 (x, 0); if (y 0) y = -y; return vdup_n_s64 (y); } Which currently generates: abs_max: abs d0, d0 ret I suppose we don't need to worry too much about that (and the current implementation doesn't seem to catch it reliably anyway), but it would be good if we could keep the support - even if it is rarely used. Well it may be possible to write a peephole for this rare case, but I hope people would use a vabs if that's what they wanted... Wilco
Re: [PATCH] Properly valueize values we value-number to
On 04/27/2015 06:46 AM, Richard Biener wrote: On Mon, 27 Apr 2015, Richard Biener wrote: On Fri, 24 Apr 2015, Jeff Law wrote: On 02/17/2015 07:58 AM, Richard Biener wrote: [ Restarting a old thread... ] On a closer look the record_const_or_copy_1 hunk is redundant (record_equality is really a bit obfuscated...). Agreed. I'm not entirely sure how it got to this point. And record_equality is where the SSA_NAME_VALUEs might end up forming chains? At least because we might record a SSA_NAME_VALUE for sth that might be the target of a SSA_NAME_VALUE, as in a_1 = b_2; SSA_NAME_VALUE (a_1) == b_2; if (b_2 == i_3(D)) ... temporarily SSA_NAME_VALUE (b_2) == i_3(D) thus under if (b_2 == i_3(D)) SSA_NAME_VALUE (a_1) == b_2 and SSA_NAME_VALUE (SSA_NAME_VALUE (a_1)) == i_3(D)? I guess we can't easily avoid that because we don't keep track of a reverse mapping (nor would we want to update that). Right. In general, the fact that we're in SSA form means that we ought not get loops in the SSA_NAME_VALUE chain since there's a single static assignment and we'll visit it once. That nice model breaks down in two ways. First we derive equivalences from equality conditions like you've shown above. Second, when threading we can thread through a loop latch and start reprocessing a block. The interaction between those two can result in loops in the value chain. What I'm hoping to do in GCC6 is eliminate all this stuff with a threader that is independent of the dominator walk (and optimizer). Instead of walking forward through the dominator tree recording key nuggets, then removing those nuggets as we pop back up the tree, we instead we start with the conditional and walk up the use-def chains, simplifying as we go, with the goal of simplifying to a constant, of course. You can see the beginnings of that with the FSM bits from Sebastian. Obviously until those bits are in place, we should try to clean up any warts in the current implementation. Btw, in record_equality, the == part of the = check for the loop depth will just randomly swap x and y while we should prefer IL canonical order. If we can't rely on the input being in IL canonical order we should prepend the whole function with if (tree_swap_operands_p (x, y, false)) std::swap (x, y); and change = to for the loop-depth check. Agreed. Makes perfect sense. I'm now re-bootstrapping and testing the following. Committed as follows, with XFAILing and re-opening PR65217. Richard. 2015-04-27 Richard Biener rguent...@suse.de * tree-ssa-dom.c (record_equivalences_from_phis): Valueize PHI arg. (record_equivalences_from_stmt): Valueize rhs. (record_equality): Canonicalize x and y order via tree_swap_operands_p. Do not swap operands for same loop depth. * gcc.target/i386/pr65217.c: XFAIL. Looks good. I'd spun the same record_equality changes over the weekend and have state on regressing 65217. 65217 is an interesting test. if ((n -n) != n) __builtin_unreachable(); return n; n -n will (of course) get computed into an SSA_NAME. We then propagate that name for the use of n in the return statement rather than using the effectively zero cost n. If we put some smarts into tree_swap_operands_p to order sensibly in this kind of case, we end up regressing a different case that I'll be looking at today. jeff
Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses
On 04/25/2015 05:49 AM, Richard Sandiford wrote: @@ -2099,9 +2107,9 @@ fix_crossing_conditional_branches (void) emit_label (new_label); gcc_assert (GET_CODE (old_label) == LABEL_REF); - old_label = JUMP_LABEL (old_jump); - new_jump = emit_jump_insn (gen_jump (old_label)); - JUMP_LABEL (new_jump) = old_label; + old_label_insn = JUMP_LABEL_AS_INSN (old_jump); + new_jump = emit_jump_insn (gen_jump (old_label_insn)); + JUMP_LABEL (new_jump) = old_label_insn; last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)-prev_bb; new_bb = create_basic_block (new_label, new_jump, last_bb); I think the eventual aim would be to have rtx_jump_insn member functions that get and set the jump label as an rtx_insn, with JUMP_LABEL_AS_INSN being a stepping stone towards that. In cases like this it might make more sense to ensure old_jump has the right type (rtx_jump_insn) and go straight to the member functions, rather than switching to JUMP_LABEL_AS_INSN now and then having to rewrite it later. I'm comfortable with either way, so long as we get there. I know that David certainly found it easier to introduce scaffolding early in this patch series, then exploit it, then tear down the scaffolding near the end of a patch series. diff --git a/gcc/is-a.h b/gcc/is-a.h index 58917eb..4fb9dde 100644 --- a/gcc/is-a.h +++ b/gcc/is-a.h @@ -46,6 +46,11 @@ TYPE as_a TYPE (pointer) do_something_with (as_a cgraph_node * *ptr); +TYPE assert_as_a TYPE (pointer) + +Like as_a TYPE (pointer), but uses assertion, which is enabled even in +non-checking (release) build. + TYPE safe_as_a TYPE (pointer) Like as_a TYPE (pointer), but where pointer could be NULL. This @@ -193,6 +198,17 @@ as_a (U *p) return is_a_helper T::cast (p); } +/* Same as above, but checks the condition even in release build. */ + +template typename T, typename U +inline T +assert_as_a (U *p) +{ + gcc_assert (is_a T (p)); + return is_a_helper T::cast (p); +} + + /* Similar to as_a, but where the pointer can be NULL, even if is_a_helperT doesn't check for NULL. */ This preserves the behaviour of the original code but I'm not sure it's worth it. I doubt the distinction between: gcc_assert (JUMP_P (x)); and: gcc_checking_assert (JUMP_P (x)); was ever very scientific. Seems like we should take this refactoring as an opportunity to make the checking more consistent. Without some guidelines I suspect usage of gcc_check_assert would be highly inconsistent. And ultimately we want to get away from the helpers as much as possible, instead relying on the static typesystem to detect errors at compile time. So unless there's a compelling reason, I'd prefer not to add more support for these helpers. [ snip] That seems pretty heavy-weight for LRA-local code. Also, the long-term plan is for INSN_LIST and rtx_insn to be in separate hierarchies, at which point we'd have no alias-safe way to distinguish them. That's certainly what I think ought to happen. INSN_LIST should turn into a standard vector or forward list. For the use cases in GCC, either ought to be acceptable. Jeff
Re: [C/C++ PATCH] Improve -Wlogical-op (PR c/63357)
On Mon, Apr 27, 2015 at 10:06:26AM -0600, Jeff Law wrote: On 04/27/2015 08:47 AM, Marek Polacek wrote: On Sat, Apr 25, 2015 at 10:18:59PM +0200, Gerald Pfeifer wrote: In case this example feels too contrived (even though it is an excerpt of Wine code), we now also warn about the following where the two types and variables are defined in different places and the size of one is set implicitly: typedef int r_fun_t (int); r_fun_t * text_funcs[] = {0,0,0}; int report (unsigned t) { typedef int s_fun_t (long, char); static s_fun_t * GUI_funcs[3]; return (t sizeof text_funcs / sizeof text_funcs[0] t sizeof GUI_funcs / sizeof GUI_funcs[0]); } (I also filed this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65891 so that we keep track.) I'm afraid there isn't an easy solution to this; the problem is that we fold sizeof early, so the warning sees t sizeof 4 t 4 and warns. Maybe using SIZEOF_EXPR would help... Can you file a bug for this for future reference? We may not tackle this specific issue in the current iteration of delayed folding, but it certainly helps to have BZs for specific issues that we'd like to fix once we can delay folding. Gerald already opened PR65891 for this :). Marek