RE: [PATCH]Construct canonical scaled address expression in IVOPT
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Monday, September 23, 2013 8:08 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT On Fri, Sep 20, 2013 at 12:00 PM, bin.cheng bin.ch...@arm.com wrote: Hi, For now IVOPT constructs scaled address expression in the form of scaled*index and checks whether backend supports it. The problem is the address expression is invalid on ARM, causing scaled expression disabled in IVOPT on ARM. This patch fixes the IVOPT part by constructing rtl address expression like index*scaled+base. - addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); + /* Construct address expression in the canonical form of + base+index*scale and call memory_address_addr_space_p to see whether + it is allowed by target processors. */ + index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); for (i = -MAX_RATIO; i = MAX_RATIO; i++) { - XEXP (addr, 1) = gen_int_mode (i, address_mode); + if (i == 1) +continue; + + XEXP (index, 1) = gen_int_mode (i, address_mode); addr = + gen_rtx_fmt_ee (PLUS, address_mode, index, reg1); if (memory_address_addr_space_p (mode, addr, as)) bitmap_set_bit (valid_mult, i + MAX_RATIO); so you now build reg1*scale+reg1 - which checks if offset and scale work at the same time (and with the same value - given this is really reg1*(scale+1)). It might be that this was implicitely assumed (or that no targets allow scale but no offset), but it's a change that requires audit of behavior on more targets. So literally, function multiplier_allowed_in_address_p should check whether multiplier is allowed in any kind of addressing mode, like [reg*scale + offset] and [reg*scale + reg]. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. Thanks. bin
Re: [PATCH] Bug fix: *var and MEM[(const int *)var] (var has int* type) are not treated as the same data ref.
Hi! On Mon, Sep 23, 2013 at 05:26:13PM -0700, Cong Hou wrote: Missing ChangeLog entry. --- gcc/testsuite/gcc.dg/alias-14.c (revision 0) +++ gcc/testsuite/gcc.dg/alias-14.c (revision 0) Vectorizer tests should go into gcc.dg/vect/ instead, or, if they are for a single target (but there is no reason why this should be a single target), into gcc.target/target/. --- gcc/fold-const.c (revision 202662) +++ gcc/fold-const.c (working copy) @@ -2693,8 +2693,9 @@ operand_equal_p (const_tree arg0, const_ operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) - (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) - == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 + types_compatible_p ( + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))), + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); This looks wrong. types_compatible_p will happily return true say for unsigned long and unsigned long long types on x86_64, because they are both integral types with the same precision, but the second argument of MEM_REF contains aliasing information, where the distinction between the two is important. So, while == comparison of main variant is too strict, types_compatible_p is too weak, so I guess you need to write a new predicate that will either handle the == and a few special cases that are safe to be handled, or look for what exactly we use the type of the second MEM_REF argument and check those properties. We certainly need that get_deref_alias_set_1 and get_deref_alias_set return the same values for both the types, but whether that is the only information we are using, not sure, CCing Richard. Jakub
Re: [C++1y] [PATCH 3/4] ... canonical type workaround and refactoring
On 23.09.2013 08:15, Adam Butcher wrote: On 22.09.2013 18:57, Adam Butcher wrote: The following solves the canonical type issue in the general case (pointers and refs) and makes it equivalent to the explicit template case -- in terms of canonical type at least. for (tree t = TREE_TYPE (TREE_VALUE (p)); t; t = TREE_CHAIN (t)) TYPE_CANONICAL (t) = t; The above is insufficient; the traversal doesn't handle function pointers. Currently, to get my local testcases passing, I have the following workaround that abuses find_type_usage. I intend for this to be replaced with a better solution but it will at least mean that people can start experimenting with this feature now (as they appear to be doing http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58500). This supports that testcase also. Shall I push the patch below to trunk as an intermediate workaround whilst I get to refactoring to support on-the-fly template parm synthesis? On the subject of on-the-fly synthesis: I haven't started yet but I'm thinking to trigger in 'cp_parser_simple_type_specifier' when 'current_binding_level-kind == sk_function_parms'. I can foresee a potential issue with packs in that, upon reading the 'auto' (potentially nested in some other type), a plain template parm will be synthesized; but it may need to be a pack parm type if '...' is found later. My initial testcase is: template typename T struct X { T m(int, float); }; auto f(Xauto, auto (Xauto::*) (auto...)) { char* s = warn; } int main() { Xchar x; f(x, Xvoid::m); } where f should be translated as similar to: template typename A1, typename A2, typename A3, typename... A4 auto f(XA1, A2 (XA3::*) (A4...)) { char* s = warn; } In the case of something like: auto f(Xauto...) the translation would need to be: template typename... A auto f(XA...) I'm thinking that getting that to happen correctly might be tricky (but haven't tried yet). The 'auto' would trigger plain template parameter synthesis. Perhaps a 'could_be_parameter_pack_p' on the template_type_parm? Though I don't know how the following could be handled as implicit template typename T, typename A... auto f(XT, A...) It would not be possible to infer which of the template parms to make the pack. auto f(Xauto, auto...) Probably multiple generic-type pack expansions should be forbidden. I'll see what happens when I get there but any guidance/thoughts you have on the subject will be valuable. Cheers, Adam Workaround implicit function template parameter canonical type issue. * parser.c (add_implicit_template_parms): Workaround to fix up canonical type references left over from before substation of 'auto'. Better solution needed but this makes test cases functional. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index f3133f3..4171476 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -28987,6 +28987,25 @@ add_implicit_template_parms (cp_parser *parser, size_t expect_count, if (!generic_type_ptr) continue; + /* Make the canonical type of each part of the parameter distinct. +FIXME: A better solution is needed for this. Not least the abuse of +find_type_usage. Need foreach_type or similar for proper mutable +access. If something like this does turn out to be necessary then the +find_type_usage loop above can be replaced by a foreach_type that fixes +up the canonical types on the way to finding the 'auto'. */ + + struct helper { static bool fixup_canonical_types (tree t) { + t = TREE_TYPE (t); + if (!t) + return false; + if (is_auto_or_concept (t)) + return true; + TYPE_CANONICAL (t) = t; + return false; + }}; + find_type_usage (TREE_VALUE (p), (bool (*)(const_tree)) + helper::fixup_canonical_types); + ++synth_count; tree synth_id = make_generic_type_name ();
Re: [PATCH] Bug fix: *var and MEM[(const int *)var] (var has int* type) are not treated as the same data ref.
On Tue, 24 Sep 2013, Jakub Jelinek wrote: Hi! On Mon, Sep 23, 2013 at 05:26:13PM -0700, Cong Hou wrote: Missing ChangeLog entry. --- gcc/testsuite/gcc.dg/alias-14.c (revision 0) +++ gcc/testsuite/gcc.dg/alias-14.c (revision 0) Vectorizer tests should go into gcc.dg/vect/ instead, or, if they are for a single target (but there is no reason why this should be a single target), into gcc.target/target/. --- gcc/fold-const.c (revision 202662) +++ gcc/fold-const.c (working copy) @@ -2693,8 +2693,9 @@ operand_equal_p (const_tree arg0, const_ operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) - (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) - == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 + types_compatible_p ( + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))), + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); This looks wrong. types_compatible_p will happily return true say for unsigned long and unsigned long long types on x86_64, because they are both integral types with the same precision, but the second argument of MEM_REF contains aliasing information, where the distinction between the two is important. So, while == comparison of main variant is too strict, types_compatible_p is too weak, so I guess you need to write a new predicate that will either handle the == and a few special cases that are safe to be handled, or look for what exactly we use the type of the second MEM_REF argument and check those properties. We certainly need that get_deref_alias_set_1 and get_deref_alias_set return the same values for both the types, but whether that is the only information we are using, not sure, CCing Richard. Using TYPE_MAIN_VARIANT is exactly correct - this is the best we can do that will work with all frontends. TYPE_MAIN_VARIANT guarantees that the alias-sets stay the same: /* If the innermost reference is a MEM_REF that has a conversion embedded treat it like a VIEW_CONVERT_EXPR above, using the memory access type for determining the alias-set. */ if (TREE_CODE (inner) == MEM_REF TYPE_MAIN_VARIANT (TREE_TYPE (inner)) != TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1) return get_deref_alias_set (TREE_OPERAND (inner, 1)); so we cannot change the compatibility checks without touching the alias-set deriving code. For the testcase in question we have MEM[(const int )_7] vs. MEM[(int *)_7] and unfortunately pointer and reference types are not variant types. We also cannot easily resort to pointed-to types as our all-beloved ref-all qualification is on the pointer type rather than on the pointed-to type. But yes, we could implement a more complicated predicate bool alias_ptr_types_compatible_p (const_tree t1, const_tree t2) { t1 = TYPE_MAIN_VARIANT (t1); t2 = TYPE_MAIN_VARIANT (t2); if (t1 == t2) return true; if (TYPE_REF_CAN_ALIAS_ALL (t1) || TYPE_REF_CAN_ALIAS_ALL (t2)) return false; return (TYPE_MAIN_VARIANT (TREE_TYPE (t1)) == TYPE_MAIN_VARIANT (TREE_TYPE (t2))); } Note that the fold-const code in question is return ((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1)) || (TYPE_SIZE (TREE_TYPE (arg0)) TYPE_SIZE (TREE_TYPE (arg1)) operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); which you may notice uses types_compatible_p on the reference type which is at least suspicious as that can let through reference trees that will end up using different alias sets due to the stricter check in get_alias_set. So in addition to alias_ptr_types_compatible_p we may want to have bool reference_alias_ptr_types_compatible_p (const_tree t1, const_tree t2) { ... } abstracting this away for the actual reference trees (also noting that reference_alias_ptr_type isn't 1:1 following what get_alias_set does either). I will give this a try. Richard.
[PATCH, ARM] Fix assembly scan test.
Hi, this patch fix the scan-assembler pattern of gcc.target/arm/atomic-comp-swap-release-acquire.c, which didn't allowed aliases register and failed when enabling LRA where 'ip' is used in the ldaex instruction. Thanks, Yvan 2013-09-24 Yvan Roux yvan.r...@linaro.org * gcc.target/arm/atomic-comp-swap-release-acquire.c: Correct scan-assembler pattern to support register aliases. fix-asm-scan.diff Description: Binary data
Re: [PATCH, RTL] Prepare ARM build with LRA
Ping On 16 September 2013 10:57, Yvan Roux yvan.r...@linaro.org wrote: Adding Eric and Steven in the loop as it is RTL related. Thanks Yvan On 11 September 2013 21:08, Yvan Roux yvan.r...@linaro.org wrote: Here is the new patch discussed in the other thread. Thanks Yvan 2013-09-11 Yvan Roux yvan.r...@linaro.org Vladimir Makarov vmaka...@redhat.com * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations from the least significant bit. (strip_address_mutations): Add bitfield operations handling. (shift_code_p): New predicate for shifting operations. (must_be_index_p): Add shifting operations handling. (set_address_index): Likewise.
Re: [PATCH, ARM] Fix assembly scan test.
On 09/24/13 09:27, Yvan Roux wrote: Hi, this patch fix the scan-assembler pattern of gcc.target/arm/atomic-comp-swap-release-acquire.c, which didn't allowed aliases register and failed when enabling LRA where 'ip' is used in the ldaex instruction. Ok - The changelog could just read : Adjust expected output. or some other words to that effect. Ramana Thanks, Yvan 2013-09-24 Yvan Roux yvan.r...@linaro.org * gcc.target/arm/atomic-comp-swap-release-acquire.c: Correct scan-assembler pattern to support register aliases.=
RE: [PR58463][Backport to 4.8] Adjust dumping for ref nodes
Ping on this patch. Note I don't have write access. Paulo Matos -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: 23 September 2013 09:00 To: Paulo Matos Cc: gcc-patches@gcc.gnu.org Subject: Re: [PR58463][Backport to 4.8] Adjust dumping for ref nodes On Fri, Sep 20, 2013 at 5:39 PM, Paulo Matos pma...@broadcom.com wrote: Can we please backport this to 4.8 (it will fix PR58463)? I attach Richard's patch to master, let me know if instead I should create one specific for 4.8.1. I tested this against 4.8 branch and everything looks fine. Ok. Thanks, Richard. 2013-03-27 Richard Biener rguent...@suse.de PR tree-optimization/56716 * tree-ssa-structalias.c (perform_var_substitution): Adjust dumping for ref nodes. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-structalias.c Paulo Matos
Re: [PATCH] Bug fix: *var and MEM[(const int *)var] (var has int* type) are not treated as the same data ref.
On Tue, 24 Sep 2013, Richard Biener wrote: On Tue, 24 Sep 2013, Jakub Jelinek wrote: Hi! On Mon, Sep 23, 2013 at 05:26:13PM -0700, Cong Hou wrote: Missing ChangeLog entry. --- gcc/testsuite/gcc.dg/alias-14.c (revision 0) +++ gcc/testsuite/gcc.dg/alias-14.c (revision 0) Vectorizer tests should go into gcc.dg/vect/ instead, or, if they are for a single target (but there is no reason why this should be a single target), into gcc.target/target/. --- gcc/fold-const.c (revision 202662) +++ gcc/fold-const.c (working copy) @@ -2693,8 +2693,9 @@ operand_equal_p (const_tree arg0, const_ operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) - (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) - == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 + types_compatible_p ( + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))), + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); This looks wrong. types_compatible_p will happily return true say for unsigned long and unsigned long long types on x86_64, because they are both integral types with the same precision, but the second argument of MEM_REF contains aliasing information, where the distinction between the two is important. So, while == comparison of main variant is too strict, types_compatible_p is too weak, so I guess you need to write a new predicate that will either handle the == and a few special cases that are safe to be handled, or look for what exactly we use the type of the second MEM_REF argument and check those properties. We certainly need that get_deref_alias_set_1 and get_deref_alias_set return the same values for both the types, but whether that is the only information we are using, not sure, CCing Richard. Using TYPE_MAIN_VARIANT is exactly correct - this is the best we can do that will work with all frontends. TYPE_MAIN_VARIANT guarantees that the alias-sets stay the same: /* If the innermost reference is a MEM_REF that has a conversion embedded treat it like a VIEW_CONVERT_EXPR above, using the memory access type for determining the alias-set. */ if (TREE_CODE (inner) == MEM_REF TYPE_MAIN_VARIANT (TREE_TYPE (inner)) != TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1) return get_deref_alias_set (TREE_OPERAND (inner, 1)); so we cannot change the compatibility checks without touching the alias-set deriving code. For the testcase in question we have MEM[(const int )_7] vs. MEM[(int *)_7] and unfortunately pointer and reference types are not variant types. We also cannot easily resort to pointed-to types as our all-beloved ref-all qualification is on the pointer type rather than on the pointed-to type. But yes, we could implement a more complicated predicate bool alias_ptr_types_compatible_p (const_tree t1, const_tree t2) { t1 = TYPE_MAIN_VARIANT (t1); t2 = TYPE_MAIN_VARIANT (t2); if (t1 == t2) return true; if (TYPE_REF_CAN_ALIAS_ALL (t1) || TYPE_REF_CAN_ALIAS_ALL (t2)) return false; return (TYPE_MAIN_VARIANT (TREE_TYPE (t1)) == TYPE_MAIN_VARIANT (TREE_TYPE (t2))); } Note that the fold-const code in question is return ((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1)) || (TYPE_SIZE (TREE_TYPE (arg0)) TYPE_SIZE (TREE_TYPE (arg1)) operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); which you may notice uses types_compatible_p on the reference type which is at least suspicious as that can let through reference trees that will end up using different alias sets due to the stricter check in get_alias_set. So in addition to alias_ptr_types_compatible_p we may want to have bool reference_alias_ptr_types_compatible_p (const_tree t1, const_tree t2) { ... } abstracting this away for the actual reference trees (also noting that reference_alias_ptr_type isn't 1:1 following what get_alias_set does either). I will give this a try. The following is an untested (well, the testcase from PR58513 is now vectorized) patch doing that refactoring. Richard. Index: gcc/tree.c === *** gcc/tree.c (revision 202859) --- gcc/tree.c (working copy) *** mem_ref_offset
Re: [gomp4] Dumping gimple for offload.
On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, I've rebased my patch. Is it ok for gomp4 Passing through is_omp looks bad - please find a more suitable place to attach this meta-information. From a quick look you only need it to produce an alternate section name, thus consider assigning the section name in a different place. Richard. 2013/9/13 Ilya Tocar tocarip.in...@gmail.com: Hi, I'm working on dumping gimple for omp pragma target stuff into gnu.target_lto_ sections. I've tried to reuse current lto infrastructure as much as possible. Could you please take a look at attached patch? --- gcc/ipa-inline-analysis.c | 2 +- gcc/ipa-profile.c | 2 +- gcc/ipa-prop.c| 4 +- gcc/ipa-pure-const.c | 2 +- gcc/ipa-reference.c | 2 +- gcc/lto-cgraph.c | 22 +++-- gcc/lto-opts.c| 2 +- gcc/lto-section-out.c | 14 ++- gcc/lto-streamer-out.c| 215 +++--- gcc/lto-streamer.c| 6 +- gcc/lto-streamer.h| 13 +-- gcc/lto/lto.c | 2 +- gcc/passes.c | 3 +- gcc/passes.def| 2 + gcc/timevar.def | 2 + gcc/tree-pass.h | 2 + 16 files changed, 237 insertions(+), 58 deletions(-) diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index ba6221e..ea3fc90 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -4023,7 +4023,7 @@ inline_write_summary (void) } } streamer_write_char_stream (ob-main_stream, 0); - produce_asm (ob, NULL); + produce_asm (ob, NULL, false); destroy_output_block (ob); if (optimize !flag_ipa_cp) diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c index 424e4a6..b16ba6c 100644 --- a/gcc/ipa-profile.c +++ b/gcc/ipa-profile.c @@ -247,7 +247,7 @@ ipa_profile_write_summary (void) streamer_write_uhwi_stream (ob-main_stream, histogram[i]-time); streamer_write_uhwi_stream (ob-main_stream, histogram[i]-size); } - lto_destroy_simple_output_block (ob); + lto_destroy_simple_output_block (ob, false); } /* Deserialize the ipa info for lto. */ diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index c09ec2f..69603c9 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -4234,7 +4234,7 @@ ipa_prop_write_jump_functions (void) ipa_write_node_info (ob, node); } streamer_write_char_stream (ob-main_stream, 0); - produce_asm (ob, NULL); + produce_asm (ob, NULL, false); destroy_output_block (ob); } @@ -4409,7 +4409,7 @@ ipa_prop_write_all_agg_replacement (void) write_agg_replacement_chain (ob, node); } streamer_write_char_stream (ob-main_stream, 0); - produce_asm (ob, NULL); + produce_asm (ob, NULL, false); destroy_output_block (ob); } diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index 55b679d..d6bbd52 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -988,7 +988,7 @@ pure_const_write_summary (void) } } - lto_destroy_simple_output_block (ob); + lto_destroy_simple_output_block (ob, false); } diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c index e6f19fd..0593c77 100644 --- a/gcc/ipa-reference.c +++ b/gcc/ipa-reference.c @@ -1022,7 +1022,7 @@ ipa_reference_write_optimization_summary (void) } } BITMAP_FREE (ltrans_statics); - lto_destroy_simple_output_block (ob); + lto_destroy_simple_output_block (ob, false); splay_tree_delete (reference_vars_to_consider); } diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 952588d..831e74d 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -690,7 +690,7 @@ output_outgoing_cgraph_edges (struct cgraph_edge *edge, /* Output the part of the cgraph in SET. */ static void -output_refs (lto_symtab_encoder_t encoder) +output_refs (lto_symtab_encoder_t encoder, bool is_omp) { lto_symtab_encoder_iterator lsei; struct lto_simple_output_block *ob; @@ -719,7 +719,7 @@ output_refs (lto_symtab_encoder_t encoder) streamer_write_uhwi_stream (ob-main_stream, 0); - lto_destroy_simple_output_block (ob); + lto_destroy_simple_output_block (ob, is_omp); } /* Add NODE into encoder as well as nodes it is cloned from. @@ -878,7 +878,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder) /* Output the part of the symtab in SET and VSET. */ void -output_symtab (void) +output_symtab (bool is_omp) { struct cgraph_node *node; struct lto_simple_output_block *ob; @@ -907,9 +907,15 @@ output_symtab (void) { symtab_node node = lto_symtab_encoder_deref (encoder, i); if (cgraph_node *cnode = dyn_cast cgraph_node (node)) -lto_output_node (ob, cnode, encoder); + { + if (!is_omp || lookup_attribute (omp declare target, + DECL_ATTRIBUTES
Re: [PATCH] Fix typo in check for null
On Mon, Sep 23, 2013 at 4:03 PM, Paulo Matos pma...@broadcom.com wrote: This is a patch for master, where in number_of_loops, we want to check if the loops (returned by loops_for_fn) is non-null but instead we are using fn, which is the function argument. I haven't opened a bug report, please let me know if I need to do that before submitting patches. Patch is ok. Thanks, Richard. 2013-09-23 Paulo Matos pma...@broadcom.com * gcc/cfgloop.h (number_of_loops): Fix typo in check for null Paulo Matos
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, On 18 Sep 11:17, Kirill Yukhin wrote: Hello, On 13 Sep 14:28, Kirill Yukhin wrote: Hello, On 09 Sep 15:11, Kirill Yukhin wrote: Hello, On 06 Sep 17:41, Kirill Yukhin wrote: Hello, PING. PING. PING. PING PING. -- Thanks, K
[PATCH, LRA, AARCH64] Switching LRA on for AArch64
Hi, The following patch switch LRA on for AArch64. The patch introduces an undocumented option -mlra to use LRA instead of reload, for a testing purpose. Please notice that this patch is dependent on the one submitted in the thread below: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html Thanks, Yvan 2013-09-24 Yvan Roux yvan.r...@linaro.org * config/aarch64/aarch64.opt (mlra): New option. * config/aarch64/aarch64.c (aarch64_lra_p): New function. (TARGET_LRA_P): Define. aarch64-lra.diff Description: Binary data
[PATCH]Fix computation of offset in ivopt
Hi, This patch fix two minor bugs when computing offset in IVOPT. 1) Considering below example: #define MAX 100 struct tag { int i; int j; } struct tag arr[MAX] int foo (int len) { int i = 0; for (; i len; i++) { access arr[i].j; } } Without this patch, the offset computed by strip_offset_1 for address arr[i].j is ZERO, which is apparently not. 2) Considering below example: //... bb 15: KeyIndex_66 = KeyIndex_194 + 4294967295; if (KeyIndex_66 != 0) goto bb 16; else goto bb 18; bb 16: bb 17: # KeyIndex_194 = PHI KeyIndex_66(16), KeyIndex_180(73) _62 = KeyIndex_194 + 1073741823; _63 = _62 * 4; _64 = pretmp_184 + _63; _65 = *_64; if (_65 == 0) goto bb 15; else goto bb 71; //... There are iv use and candidate like: use 1 address in statement _65 = *_64; at position *_64 type handletype * base pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 step 4294967292 base object (void *) pretmp_184 related candidates candidate 6 var_before ivtmp.16 var_after ivtmp.16 incremented before use 1 type unsigned int base (unsigned int) (pretmp_184 + (sizetype) KeyIndex_180 * 4) step 4294967292 base object (void *) pretmp_184 Candidate 6 is related to use 1 In function get_computation_cost_at for use 1 using cand 6, ubase and cbase are: pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 pretmp_184 + (sizetype) KeyIndex_180 * 4 The cstepi computed in HOST_WIDE_INT is : 0xfffc, while offset computed in TYPE(utype) is : 0xfffc. Though they both stand for value -4 in different precision, statement offset -= ratio * cstepi returns 0x1, which is wrong. Tested on x86_64 and arm. Is it OK? Thanks. bin 2013-09-24 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (strip_offset_1): Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. (get_computation_cost_at): Sign extend offset.Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 200774) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2133,19 +2133,28 @@ strip_offset_1 (tree expr, bool inside_addr, bool break; case COMPONENT_REF: - if (!inside_addr) - return orig_expr; + { + tree field; + HOST_WIDE_INT boffset = 0; + if (!inside_addr) + return orig_expr; - tmp = component_ref_field_offset (expr); - if (top_compref - cst_and_fits_in_hwi (tmp)) - { - /* Strip the component reference completely. */ - op0 = TREE_OPERAND (expr, 0); - op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); - *offset = off0 + int_cst_value (tmp); - return op0; - } + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } + } break; case ADDR_EXPR: @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than HOST_WIDE_INT. */ + offset = sext_hwi (offset, TYPE_PRECISION (utype)); + /* If we are after the increment, the value of the candidate is higher by one iteration. */ if (stmt_is_after_inc)
Re: RFC: patch to build GCC for arm with LRA
Hi, Fair enough - we should just fix the test and move on. Done. I would suggest in addition a transitional command-line option to switch between LRA and reload as a temporary measure so that folks can do some more experimenting for AArch32. I've a patch which fixes the REG_NOTE issues, it's still under validation, but seems to work. I'll add the -mlra option to switch between LRA and reload, but my question is, will LRA be enabled by default or not, as we still have the Thumb issue ? Maybe switching LRA on only in ARM mode is a good trade off, but in that case what is the best way to do it ? (I'll pass a full validation when we'll be agree on that point) Thanks, Yvan
Re: Fix PR middle-end/57393
On Mon, Sep 23, 2013 at 5:56 PM, Easwaran Raman era...@google.com wrote: Ping. On Mon, Sep 16, 2013 at 4:42 PM, Easwaran Raman era...@google.com wrote: There are two separate root causes for the problem reported in PR 57393. This patch attempts to fix both. First is due to newly created stmts that have the default UID of 0 which are compared with statements with valid UIDs leading to broken dependences. As discussed in an earlier thread, I check the UIDs before using them and ensure stmts have a valid UID. In the worst case, this reassigns UIDs for the entire BB containing the stmts in question. The second is due to debug stmts being out of sync with the IR after reassociation. I think the right fix is to create debug temps before actual reassociation to decouple them from the SSA variables involved in reassociation. This bootstraps in x86_64 and I am running the tests. Ok for trunk? In the end I like the scheduling code less and less - I believe it will become a major compile-time hog. Still, + gimple temp = gsi_stmt (gsi); + uid = gimple_uid (temp); no need for 'temp', just write gimple_uid (gsi_stmt (gsi)) +} + if (iters = MAX_UNASSIGNED_UIDS || uid == 0) +renumber_gimple_stmt_uids_in_blocks (bb, 1); so this will renumber uids whenever we have trailing zero-UID stmts at the end of a basic-block? + else +{ + for (gsi_prev (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) +{ + stmt = gsi_stmt (gsi); + if (gimple_uid (stmt) 0) +break; + gimple_set_uid (stmt, uid); +} while this fills zero-UIDs from the found UID. A comment on the re-fill strathegy would be nice here. @@ -2901,6 +2930,9 @@ find_insert_point (gimple stmt, gimple dep_stmt) gimple insert_stmt = stmt; if (dep_stmt == NULL) return stmt; + ensure_valid_uid (stmt); + ensure_valid_uid (dep_stmt); + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) vertical space oddity - move the blank line before the first ensure_valid_uid () call. +{ + tree rhs1 = gimple_assign_rhs1 (stmt); + swap_ops_for_binary_stmt (ops, len - 3, stmt); + regenerate_debug_stmts (SSA_NAME_DEF_STMT (rhs1), len - 3); +} I'm not sure this is the right place for this. Also note my previous comment that it is wrong to re-use SSA names for different values because of associated info that can now be wrong. Yes, it's a pre-existing bug but you end up exposing it. I believe the fix is in the places that adjust a stmts rhs1/rhs2. Yes, reassoc is a complete mess with regard to this as it modifies the IL for its analysis purposes. We'll have interesting times dealing with all this now that we preserve value-range information for SSA names (which definitely is garbled by reassoc now). We absolutely have to do something about this for 4.9 and I think I shouldn't have approved the original scheduling work :/ Which complicates matters more, similar to the || and reassoc work, and makes the needed rewrite a much more complex task. Bah. Richard. Thanks, Easwaran 2013-09-16 Easwaran Raman era...@google.com PR middle-end/57393 * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove. (build_and_add_sum): Do not set UID of newly created statements. (ensure_valid_uid): New function, (find_insert_point): called here before UIDs are compared. (insert_stmt_after): Do not reset debug statements. (regenerate_debug_stmts): New function, (reassociate_bb): use here. testsuite/ChangeLog: 2013-09-16 Easwaran Raman era...@google.com PR middle-end/57393 * gcc.dg/tree-ssa/reassoc-32.c: New testcase. * gcc.dg/tree-ssa/reassoc-33.c: New testcase. * gcc.dg/tree-ssa/reassoc-34.c: New testcase.
Re: Fix PR middle-end/57393
On Tue, Sep 24, 2013 at 11:26:16AM +0200, Richard Biener wrote: + if (iters = MAX_UNASSIGNED_UIDS || uid == 0) +renumber_gimple_stmt_uids_in_blocks (bb, 1); so this will renumber uids whenever we have trailing zero-UID stmts at the end of a basic-block? Trailing zero-UIDs at the end of a basic block should be trivially handlable without renumbering the whole basic block. Jakub
Re: RFA: Store the REG_BR_PROB probability directly as an int
Richard Sandiford rdsandif...@googlemail.com writes: REG_BR_PROB notes are stored as: (expr_list:REG_BR_PROB (const_int prob) chain) but a full const_int rtx seems a bit heavweight when all we want is a plain int. This patch uses: (int_list:REG_BR_PROB prob chain) instead. I think you left out the handling of INT_LIST in eliminate_regs_1. This lets me finish the build: diff --git a/gcc/reload1.c b/gcc/reload1.c index 7a82c07..41f1aa8 100644 --- a/gcc/reload1.c +++ b/gcc/reload1.c @@ -2576,6 +2576,7 @@ eliminate_regs_1 (rtx x, enum machine_mode mem_mode, rtx insn, case ADDR_VEC: case ADDR_DIFF_VEC: case RETURN: +case INT_LIST: return x; case REG: Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: gimple build interface
On Mon, Sep 23, 2013 at 7:16 PM, Andrew MacLeod amacl...@redhat.com wrote: On 09/23/2013 01:05 PM, David Malcolm wrote: On Mon, 2013-09-23 at 12:21 -0400, Andrew MacLeod wrote: On 09/20/2013 04:08 AM, Richard Biener wrote: On Thu, Sep 19, 2013 at 6:56 PM, Andrew MacLeod amacl...@redhat.com wrote: On 09/19/2013 09:24 AM, Andrew MacLeod wrote: I think this is of most use to ssa passes that need to construct code snippets, so I propose we make this ssa specific and put it in tree-ssa.c (renaming it ssa_build_assign), *OR* we could leave it general purpose and put it in its own set of files, gimple-ssa-build.[ch] or something that crosses the border between the two representations. I'd also suggest that the final optional parameter be changed to tree *lhs = NULL_TREE, which would allow the caller to specify the LHS if they want, otherwise make_ssa_name would be called. If we want to leave it supporting both gimple and ssa, then anyone from gimple land could pass in a gimple LHS variable thus avoiding the call to make_ssa_name Thoughts? Andrew Anyway, here is a patch which does that and a bit more. I didn't rename build_assign() to ssa_build_assign().. even though those are the only kind actually created right now. we can leave that for the day someone actually decides to flush this interface out, and maybe we'll want to pass in gimple_tmps and call them from front ends or other places... then it would have to be renamed again. So I just left it as is for the moment, but that could be changed. I also moved gimple_replace_lhs() to tree-ssa.c and renamed it ssa_replace_lhs(). It calls insert_debug_temp_for_var_def() from tree-ssa.c and that only works with the immediate use operands.. so that is an SSA specific routine, which makes this one SSA specific as well. Those 2 changes allow tree-ssa.h to no longer be included, it is replaced with tree-flow.h. Some preliminary work to enable removing immediate use routines out of tree-flow.h include: struct count_ptr_d, count_ptr_derefs(), count_uses_and_derefs() also get moved to tree-ssa.c since those are also require the immediate use mechanism, and thus is also SSA dependent. This bootstraps on x86_64-unknown-linux-gnu and has no new regressions. OK? Can you move the builders to asan.c please? From a quick glance it seems to have various issues so it shouldn't be used (I wonder who approved them in the end ... maybe it was even me). ssa_replace_lhs sounds odd (a 'SSA' has a lhs?), but maybe it's just me. I'd have chosen gimple_replace_ssa_lhs? That sounds better. done. And I also think a seperate file for those builders is probably best... here's a patch with those changes.. New files called gimple-builder.[ch]...Then diego can eventually do whatever his grand vision for them is. I minimized the includes. bootstraps and rerunning tests. OK? Did you forget to attach the patch? Of course not! :-P Oh how I hate mondays. Old patch attached? Richard. Andrew
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
On Mon, Sep 23, 2013 at 10:34 PM, Eric Botcazou ebotca...@adacore.com wrote: I have committed it for you (rev 202831), with a few modifications (ChangeLog formatting, typos). Here is what I have committed: 2013-09-23 Kugan Vivekanandarajah kug...@linaro.org * gimple-pretty-print.c (dump_ssaname_info): New function. (dump_gimple_phi): Call it. (pp_gimple_stmt_1): Likewise. * tree-core.h (tree_ssa_name): New union ssa_name_info_type field. (range_info_def): Declare. * tree-pretty-print.c (pp_double_int): New function. (dump_generic_node): Call it. * tree-pretty-print.h (pp_double_int): Declare. * tree-ssa-alias.c (dump_alias_info): Check pointer type. * tree-ssanames.h (range_info_def): New structure. (value_range_type): Move definition here. (set_range_info, value_range_type, duplicate_ssa_name_range_info): Declare. * tree-ssanames.c (make_ssa_name_fn): Check pointer type at initialization. (set_range_info): New function. (get_range_info): Likewise. (duplicate_ssa_name_range_info): Likewise. (duplicate_ssa_name_fn): Check pointer type and call duplicate_ssa_name_range_info. * tree-ssa-copy.c (fini_copy_prop): Likewise. * tree-vrp.c (value_range_type): Remove definition, now in tree-ssanames.h. (vrp_finalize): Call set_range_info to update value range of SSA_NAMEs. * tree.h (SSA_NAME_PTR_INFO): Macro changed to access via union. (SSA_NAME_RANGE_INFO): New macro. Nice patch, but the formatting is totally wrong wrt spaces, please reformat using 2-space indentation and 8-space TABs, as already used in the files. The patch has also introduced 2 regressions in Ada: === acats tests === FAIL: c37211b FAIL: c37211c === acats Summary === # of expected passes2318 # of unexpected failures2 Program received signal SIGSEGV, Segmentation fault. vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458 9458 if (POINTER_TYPE_P (TREE_TYPE (name)) (gdb) bt I'm testing a trivial patch to fix that. Richard. #0 vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458 #1 execute_vrp () at /home/eric/svn/gcc/gcc/tree-vrp.c:9583 #2 (anonymous namespace)::pass_vrp::execute (this=optimized out) at /home/eric/svn/gcc/gcc/tree-vrp.c:9673 #3 0x00c52c9a in execute_one_pass (pass=pass@entry=0x22e2210) at /home/eric/svn/gcc/gcc/passes.c:2201 #4 0x00c52e76 in execute_pass_list (pass=0x22e2210) at /home/eric/svn/gcc/gcc/passes.c:2253 #5 0x00c52e88 in execute_pass_list (pass=0x22e04d0) at /home/eric/svn/gcc/gcc/passes.c:2254 #6 0x009b9c49 in expand_function (node=0x76d12e40) at /home/eric/svn/gcc/gcc/cgraphunit.c:1750 #7 0x009bbc17 in expand_all_functions () at /home/eric/svn/gcc/gcc/cgraphunit.c:1855 #8 compile () at /home/eric/svn/gcc/gcc/cgraphunit.c:2192 #9 0x009bc1fa in finalize_compilation_unit () at /home/eric/svn/gcc/gcc/cgraphunit.c:2269 #10 0x006681b5 in gnat_write_global_declarations () at /home/eric/svn/gcc/gcc/ada/gcc-interface/utils.c:5630 #11 0x00d4577d in compile_file () at /home/eric/svn/gcc/gcc/toplev.c:560 #12 0x00d4750a in do_compile () at /home/eric/svn/gcc/gcc/toplev.c:1891 #13 toplev_main (argc=14, argv=0x7fffdca8) at /home/eric/svn/gcc/gcc/toplev.c:1967 #14 0x76f2a23d in __libc_start_main () from /lib64/libc.so.6 #15 0x00635381 in _start () at ../sysdeps/x86_64/elf/start.S:113 (gdb) p name $1 = (tree) 0x0 -- Eric Botcazou
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, with the attached patch the read-side of the out of bounds accesses are fixed. There is a single new test case pr57748-3.c that is derived from Martin's test case. The test case does only test the read access and does not depend on part 1 of the patch. This patch was boot-strapped and regression tested on x86_64-unknown-linux-gnu. Additionally I generated eCos and an eCos-application (on ARMv5 using packed structures) with an arm-eabi cross compiler, and looked for differences in the disassembled code with and without this patch, but there were none. OK for trunk? Index: gcc/expr.c === --- gcc/expr.c (revision 202778) +++ gcc/expr.c (working copy) @@ -9878,7 +9878,7 @@ modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, -modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); +EXPAND_MEMORY); /* If the bitfield is volatile, we want to access it in the field's mode, not the computed mode. context suggests that we may arrive with EXPAND_STACK_PARM here which is a correctness modifier (see its docs). But I'm not too familiar with the details of the various expand modifiers, Eric may be though. That said, I still believe that fixing the misalign path in expand_assignment would be better than trying to avoid it. For this testcase the issue is again that expand_assignment passes the wrong mode/target to the movmisalign optab. Richard. Regards Bernd.
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
On 24/09/13 19:23, Richard Biener wrote: On Mon, Sep 23, 2013 at 10:34 PM, Eric Botcazou ebotca...@adacore.com wrote: I have committed it for you (rev 202831), with a few modifications (ChangeLog formatting, typos). Here is what I have committed: 2013-09-23 Kugan Vivekanandarajah kug...@linaro.org * gimple-pretty-print.c (dump_ssaname_info): New function. (dump_gimple_phi): Call it. (pp_gimple_stmt_1): Likewise. * tree-core.h (tree_ssa_name): New union ssa_name_info_type field. (range_info_def): Declare. * tree-pretty-print.c (pp_double_int): New function. (dump_generic_node): Call it. * tree-pretty-print.h (pp_double_int): Declare. * tree-ssa-alias.c (dump_alias_info): Check pointer type. * tree-ssanames.h (range_info_def): New structure. (value_range_type): Move definition here. (set_range_info, value_range_type, duplicate_ssa_name_range_info): Declare. * tree-ssanames.c (make_ssa_name_fn): Check pointer type at initialization. (set_range_info): New function. (get_range_info): Likewise. (duplicate_ssa_name_range_info): Likewise. (duplicate_ssa_name_fn): Check pointer type and call duplicate_ssa_name_range_info. * tree-ssa-copy.c (fini_copy_prop): Likewise. * tree-vrp.c (value_range_type): Remove definition, now in tree-ssanames.h. (vrp_finalize): Call set_range_info to update value range of SSA_NAMEs. * tree.h (SSA_NAME_PTR_INFO): Macro changed to access via union. (SSA_NAME_RANGE_INFO): New macro. Nice patch, but the formatting is totally wrong wrt spaces, please reformat using 2-space indentation and 8-space TABs, as already used in the files. I am looking at everything and will send a patch to fix that. The patch has also introduced 2 regressions in Ada: === acats tests === FAIL: c37211b FAIL: c37211c === acats Summary === # of expected passes2318 # of unexpected failures2 I am sorry I missed this as I didnt test ada. I wrongly assumed that all the frontends are enabled by dafault. Program received signal SIGSEGV, Segmentation fault. vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458 9458 if (POINTER_TYPE_P (TREE_TYPE (name)) (gdb) bt I'm testing a trivial patch to fix that. I think the return value of ssa_name () (i.e. name) can be NULL and it has to be checked for NULL. In tree-vrp.c it is not checked in some other places related to debugging. In other places (eg. in tree-ssa-pre.c) there are checks . Thanks for looking into it and I will wait for your fix. Thanks, Kugan Richard. #0 vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458 #1 execute_vrp () at /home/eric/svn/gcc/gcc/tree-vrp.c:9583 #2 (anonymous namespace)::pass_vrp::execute (this=optimized out) at /home/eric/svn/gcc/gcc/tree-vrp.c:9673 #3 0x00c52c9a in execute_one_pass (pass=pass@entry=0x22e2210) at /home/eric/svn/gcc/gcc/passes.c:2201 #4 0x00c52e76 in execute_pass_list (pass=0x22e2210) at /home/eric/svn/gcc/gcc/passes.c:2253 #5 0x00c52e88 in execute_pass_list (pass=0x22e04d0) at /home/eric/svn/gcc/gcc/passes.c:2254 #6 0x009b9c49 in expand_function (node=0x76d12e40) at /home/eric/svn/gcc/gcc/cgraphunit.c:1750 #7 0x009bbc17 in expand_all_functions () at /home/eric/svn/gcc/gcc/cgraphunit.c:1855 #8 compile () at /home/eric/svn/gcc/gcc/cgraphunit.c:2192 #9 0x009bc1fa in finalize_compilation_unit () at /home/eric/svn/gcc/gcc/cgraphunit.c:2269 #10 0x006681b5 in gnat_write_global_declarations () at /home/eric/svn/gcc/gcc/ada/gcc-interface/utils.c:5630 #11 0x00d4577d in compile_file () at /home/eric/svn/gcc/gcc/toplev.c:560 #12 0x00d4750a in do_compile () at /home/eric/svn/gcc/gcc/toplev.c:1891 #13 toplev_main (argc=14, argv=0x7fffdca8) at /home/eric/svn/gcc/gcc/toplev.c:1967 #14 0x76f2a23d in __libc_start_main () from /lib64/libc.so.6 #15 0x00635381 in _start () at ../sysdeps/x86_64/elf/start.S:113 (gdb) p name $1 = (tree) 0x0 -- Eric Botcazou
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Monday, September 23, 2013 8:08 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT On Fri, Sep 20, 2013 at 12:00 PM, bin.cheng bin.ch...@arm.com wrote: Hi, For now IVOPT constructs scaled address expression in the form of scaled*index and checks whether backend supports it. The problem is the address expression is invalid on ARM, causing scaled expression disabled in IVOPT on ARM. This patch fixes the IVOPT part by constructing rtl address expression like index*scaled+base. - addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); + /* Construct address expression in the canonical form of + base+index*scale and call memory_address_addr_space_p to see whether + it is allowed by target processors. */ + index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); for (i = -MAX_RATIO; i = MAX_RATIO; i++) { - XEXP (addr, 1) = gen_int_mode (i, address_mode); + if (i == 1) +continue; + + XEXP (index, 1) = gen_int_mode (i, address_mode); addr = + gen_rtx_fmt_ee (PLUS, address_mode, index, reg1); if (memory_address_addr_space_p (mode, addr, as)) bitmap_set_bit (valid_mult, i + MAX_RATIO); so you now build reg1*scale+reg1 - which checks if offset and scale work at the same time (and with the same value - given this is really reg1*(scale+1)). It might be that this was implicitely assumed (or that no targets allow scale but no offset), but it's a change that requires audit of behavior on more targets. So literally, function multiplier_allowed_in_address_p should check whether multiplier is allowed in any kind of addressing mode, like [reg*scale + offset] and [reg*scale + reg]. Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). I suppose so, but it of course depends on what IVOPTs uses the answer for in the end. Appearantly it doesn't distinguish between the various cases even though TARGET_MEM_REF does support all the variants in question (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + reg + const). So the better answer may be to teach the costs about the differences? The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. Ok. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. I see. Richard.
Re: [PR58463][Backport to 4.8] Adjust dumping for ref nodes
On Tue, Sep 24, 2013 at 10:42 AM, Paulo Matos pma...@broadcom.com wrote: Ping on this patch. Note I don't have write access. Please get yourself write access (I suppose you do have a copyright assignment), you can name me as sponsor. Richard. Paulo Matos -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: 23 September 2013 09:00 To: Paulo Matos Cc: gcc-patches@gcc.gnu.org Subject: Re: [PR58463][Backport to 4.8] Adjust dumping for ref nodes On Fri, Sep 20, 2013 at 5:39 PM, Paulo Matos pma...@broadcom.com wrote: Can we please backport this to 4.8 (it will fix PR58463)? I attach Richard's patch to master, let me know if instead I should create one specific for 4.8.1. I tested this against 4.8 branch and everything looks fine. Ok. Thanks, Richard. 2013-03-27 Richard Biener rguent...@suse.de PR tree-optimization/56716 * tree-ssa-structalias.c (perform_var_substitution): Adjust dumping for ref nodes. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-structalias.c Paulo Matos
Re: [PATCH]Fix computation of offset in ivopt
On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch fix two minor bugs when computing offset in IVOPT. 1) Considering below example: #define MAX 100 struct tag { int i; int j; } struct tag arr[MAX] int foo (int len) { int i = 0; for (; i len; i++) { access arr[i].j; } } Without this patch, the offset computed by strip_offset_1 for address arr[i].j is ZERO, which is apparently not. 2) Considering below example: //... bb 15: KeyIndex_66 = KeyIndex_194 + 4294967295; if (KeyIndex_66 != 0) goto bb 16; else goto bb 18; bb 16: bb 17: # KeyIndex_194 = PHI KeyIndex_66(16), KeyIndex_180(73) _62 = KeyIndex_194 + 1073741823; _63 = _62 * 4; _64 = pretmp_184 + _63; _65 = *_64; if (_65 == 0) goto bb 15; else goto bb 71; //... There are iv use and candidate like: use 1 address in statement _65 = *_64; at position *_64 type handletype * base pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 step 4294967292 base object (void *) pretmp_184 related candidates candidate 6 var_before ivtmp.16 var_after ivtmp.16 incremented before use 1 type unsigned int base (unsigned int) (pretmp_184 + (sizetype) KeyIndex_180 * 4) step 4294967292 base object (void *) pretmp_184 Candidate 6 is related to use 1 In function get_computation_cost_at for use 1 using cand 6, ubase and cbase are: pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 pretmp_184 + (sizetype) KeyIndex_180 * 4 The cstepi computed in HOST_WIDE_INT is : 0xfffc, while offset computed in TYPE(utype) is : 0xfffc. Though they both stand for value -4 in different precision, statement offset -= ratio * cstepi returns 0x1, which is wrong. Tested on x86_64 and arm. Is it OK? + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; may still overflow. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than HOST_WIDE_INT. */ + offset = sext_hwi (offset, TYPE_PRECISION (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). Richard. Thanks. bin 2013-09-24 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (strip_offset_1): Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. (get_computation_cost_at): Sign extend offset.
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
On Tue, 24 Sep 2013, Richard Biener wrote: On Mon, Sep 23, 2013 at 10:34 PM, Eric Botcazou ebotca...@adacore.com wrote: I have committed it for you (rev 202831), with a few modifications (ChangeLog formatting, typos). Here is what I have committed: 2013-09-23 Kugan Vivekanandarajah kug...@linaro.org * gimple-pretty-print.c (dump_ssaname_info): New function. (dump_gimple_phi): Call it. (pp_gimple_stmt_1): Likewise. * tree-core.h (tree_ssa_name): New union ssa_name_info_type field. (range_info_def): Declare. * tree-pretty-print.c (pp_double_int): New function. (dump_generic_node): Call it. * tree-pretty-print.h (pp_double_int): Declare. * tree-ssa-alias.c (dump_alias_info): Check pointer type. * tree-ssanames.h (range_info_def): New structure. (value_range_type): Move definition here. (set_range_info, value_range_type, duplicate_ssa_name_range_info): Declare. * tree-ssanames.c (make_ssa_name_fn): Check pointer type at initialization. (set_range_info): New function. (get_range_info): Likewise. (duplicate_ssa_name_range_info): Likewise. (duplicate_ssa_name_fn): Check pointer type and call duplicate_ssa_name_range_info. * tree-ssa-copy.c (fini_copy_prop): Likewise. * tree-vrp.c (value_range_type): Remove definition, now in tree-ssanames.h. (vrp_finalize): Call set_range_info to update value range of SSA_NAMEs. * tree.h (SSA_NAME_PTR_INFO): Macro changed to access via union. (SSA_NAME_RANGE_INFO): New macro. Nice patch, but the formatting is totally wrong wrt spaces, please reformat using 2-space indentation and 8-space TABs, as already used in the files. The patch has also introduced 2 regressions in Ada: === acats tests === FAIL: c37211b FAIL: c37211c === acats Summary === # of expected passes2318 # of unexpected failures2 Program received signal SIGSEGV, Segmentation fault. vrp_finalize () at /home/eric/svn/gcc/gcc/tree-vrp.c:9458 9458 if (POINTER_TYPE_P (TREE_TYPE (name)) (gdb) bt I'm testing a trivial patch to fix that. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2013-09-24 Richard Biener rguent...@suse.de * tree-vrp.c (vrp_finalize): Check for SSA name presence. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 202860) +++ gcc/tree-vrp.c (working copy) @@ -9455,7 +9455,8 @@ vrp_finalize (void) { tree name = ssa_name (i); - if (POINTER_TYPE_P (TREE_TYPE (name)) + if (!name + || POINTER_TYPE_P (TREE_TYPE (name)) || (vr_value[i]-type == VR_VARYING) || (vr_value[i]-type == VR_UNDEFINED)) continue;
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On 24/09/13 11:12, Richard Biener wrote: Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. I can't conceive of a realistic case where one would want to scale the base address. Scaling the offset is fine, but never the base. So reg*scale+offset seems meaningless. Base + Offset * Scale on the other hand makes much more sense. Of course, this is all about terminology, so if you regard an immediate, such as a symbol as an offset, then perhaps you have something close to the original term, but I think then you've inverted things, since your symbol is really the base, not the offset. R.
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Here is the new patch discussed in the other thread. Thanks Yvan 2013-09-11 Yvan Roux yvan.r...@linaro.org Vladimir Makarov vmaka...@redhat.com * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations from the least significant bit. (strip_address_mutations): Add bitfield operations handling. (shift_code_p): New predicate for shifting operations. (must_be_index_p): Add shifting operations handling. (set_address_index): Likewise. +/* Return true if X is a sign_extract or zero_extract from the least + significant bit. */ + +static bool +lsb_bitfield_op_p (rtx x) +{ + if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS) +{ + enum machine_mode mode = GET_MODE(x); + unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); + HOST_WIDE_INT pos = INTVAL (XEXP (x, 2)); + + return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0)); It seems strange to use the destination mode to decide whether this is the LSB of the source. +/* Return true if X is a shifting operation. */ + +static bool +shift_code_p (rtx x) +{ + return (GET_CODE (x) == ASHIFT + || GET_CODE (x) == ASHIFTRT + || GET_CODE (x) == LSHIFTRT + || GET_CODE (x) == ROTATE + || GET_CODE (x) == ROTATERT); +} ROTATE and ROTATERT aren't really shifting operations though, so are they really needed here? -- Eric Botcazou
Re: expand_expr tweaks to fix PR57134
On Fri, Sep 13, 2013 at 12:37:20PM +0930, Alan Modra wrote: PR middle-end/57586 * stmt.c (expand_asm_operands): Call expand_expr with EXPAND_MEMORY for output operands that disallow regs. Don't use EXPAND_WRITE on inout operands. Ping? -- Alan Modra Australia Development Lab, IBM
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On Tue, Sep 24, 2013 at 12:36 PM, Richard Earnshaw rearn...@arm.com wrote: On 24/09/13 11:12, Richard Biener wrote: Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. I can't conceive of a realistic case where one would want to scale the base address. Scaling the offset is fine, but never the base. So reg*scale+offset seems meaningless. Base + Offset * Scale on the other hand makes much more sense. Of course, this is all about terminology, so if you regard an immediate, such as a symbol as an offset, then perhaps you have something close to the original term, but I think then you've inverted things, since your symbol is really the base, not the offset. Sure, this can't be the complete memory address - but that doesn't seem to be what IVOPTs is testing. Your example of $SYM + offset * scale is a good one for example. IVOPTs is merely asking whether it can use a scaled offset. That the present test doesn't work for arm hints at that it doesn't build up the correct RTL for the test, but also since this works for other targets which probably also don't only have a scaled offset without a base it isn't all that clear what a) the codes desire is, b) the correct solution is without regressing on other targets. Btw, it should be reasonably possible to compute the whole multiplier_allowed_in_address_p table for all primary and secondary archs (simply build cross-cc1) and compare the results before / after a patch candidate. Querying both reg * scale and reg + reg * scale if the first fails sounds like a good solution to me. Richard. R.
Re: expand_expr tweaks to fix PR57134
On Tue, Sep 24, 2013 at 12:43 PM, Alan Modra amo...@gmail.com wrote: On Fri, Sep 13, 2013 at 12:37:20PM +0930, Alan Modra wrote: PR middle-end/57586 * stmt.c (expand_asm_operands): Call expand_expr with EXPAND_MEMORY for output operands that disallow regs. Don't use EXPAND_WRITE on inout operands. Ping? Ok. Thanks, Richard. -- Alan Modra Australia Development Lab, IBM
[PATCH] Refactor type handling in get_alias_set, fix PR58513
As noted in PR58513 the alias type comparison in operand_equal_p for MEM_REF and TARGET_MEM_REF is overly restrictive. The following adds a new alias_ptr_types_compatible_p helper for a less conservative comparison and refactors get_alias_set and reference_alias_ptr_type to share code and behave in a more similar manner. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-09-24 Richard Biener rguent...@suse.de PR middle-end/58513 * tree.c (reference_alias_ptr_type): Move ... * alias.c (reference_alias_ptr_type): ... here and implement in terms of the new reference_alias_ptr_type_1. (ref_all_alias_ptr_type_p): New helper. (get_deref_alias_set_1): Drop flag_strict_aliasing here, use ref_all_alias_ptr_type_p. (get_deref_alias_set): Add flag_strict_aliasing check here. (reference_alias_ptr_type_1): New function, split out from ... (get_alias_set): ... here. (alias_ptr_types_compatible_p): New function. * alias.h (reference_alias_ptr_type): Declare. (alias_ptr_types_compatible_p): Likewise. * tree.h (reference_alias_ptr_type): Remove. * fold-const.c (operand_equal_p): Use alias_ptr_types_compatible_p to compare MEM_REF alias types. * g++.dg/vect/pr58513.cc: New testcase. Index: gcc/tree.c === *** gcc/tree.c.orig 2013-09-24 10:48:25.0 +0200 --- gcc/tree.c 2013-09-24 10:48:27.646770499 +0200 *** mem_ref_offset (const_tree t) *** 4263,4286 return tree_to_double_int (toff).sext (TYPE_PRECISION (TREE_TYPE (toff))); } - /* Return the pointer-type relevant for TBAA purposes from the -gimple memory reference tree T. This is the type to be used for -the offset operand of MEM_REF or TARGET_MEM_REF replacements of T. */ - - tree - reference_alias_ptr_type (const_tree t) - { - const_tree base = t; - while (handled_component_p (base)) - base = TREE_OPERAND (base, 0); - if (TREE_CODE (base) == MEM_REF) - return TREE_TYPE (TREE_OPERAND (base, 1)); - else if (TREE_CODE (base) == TARGET_MEM_REF) - return TREE_TYPE (TMR_OFFSET (base)); - else - return build_pointer_type (TYPE_MAIN_VARIANT (TREE_TYPE (base))); - } - /* Return an invariant ADDR_EXPR of type TYPE taking the address of BASE offsetted by OFFSET units. */ --- 4263,4268 Index: gcc/tree.h === *** gcc/tree.h.orig 2013-09-24 10:48:25.0 +0200 --- gcc/tree.h 2013-09-24 10:48:27.662770694 +0200 *** extern tree build_simple_mem_ref_loc (lo *** 4345,4351 #define build_simple_mem_ref(T)\ build_simple_mem_ref_loc (UNKNOWN_LOCATION, T) extern double_int mem_ref_offset (const_tree); - extern tree reference_alias_ptr_type (const_tree); extern tree build_invariant_address (tree, tree, HOST_WIDE_INT); extern tree constant_boolean_node (bool, tree); extern tree div_if_zero_remainder (enum tree_code, const_tree, const_tree); --- 4345,4350 Index: gcc/alias.c === *** gcc/alias.c.orig2013-09-24 10:48:25.0 +0200 --- gcc/alias.c 2013-09-24 10:55:15.949616280 +0200 *** component_uses_parent_alias_set (const_t *** 547,552 --- 547,564 } } + + /* Return whether the pointer-type T effective for aliasing may +access everything and thus the reference has to be assigned +alias-set zero. */ + + static bool + ref_all_alias_ptr_type_p (const_tree t) + { + return (TREE_CODE (TREE_TYPE (t)) == VOID_TYPE + || TYPE_REF_CAN_ALIAS_ALL (t)); + } + /* Return the alias set for the memory pointed to by T, which may be either a type or an expression. Return -1 if there is nothing special about dereferencing T. */ *** component_uses_parent_alias_set (const_t *** 554,564 static alias_set_type get_deref_alias_set_1 (tree t) { - /* If we're not doing any alias analysis, just assume everything - aliases everything else. */ - if (!flag_strict_aliasing) - return 0; - /* All we care about is the type. */ if (! TYPE_P (t)) t = TREE_TYPE (t); --- 566,571 *** get_deref_alias_set_1 (tree t) *** 566,573 /* If we have an INDIRECT_REF via a void pointer, we don't know anything about what that might alias. Likewise if the pointer is marked that way. */ ! if (TREE_CODE (TREE_TYPE (t)) == VOID_TYPE ! || TYPE_REF_CAN_ALIAS_ALL (t)) return 0; return -1; --- 573,579 /* If we have an INDIRECT_REF via a void pointer, we don't know anything about what that might alias. Likewise if the pointer is marked that way. */ ! if (ref_all_alias_ptr_type_p (t)) return 0; return -1; ***
Re: Ping patch to enable *.cc files in gengtype
On Fri, Sep 20, 2013 at 05:56:22PM +0200, Jakub Jelinek wrote: On Fri, Sep 20, 2013 at 05:52:38PM +0200, Basile Starynkevitch wrote: On Fri, Sep 20, 2013 at 09:53:10AM -0400, Diego Novillo wrote: On 2013-09-16 04:19 , Basile Starynkevitch wrote: Hello all, I'm pinging the patch (of september 2nd) on http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00036.html gcc/ChangeLog entry 2013-09-16 Basile Starynkevitch bas...@starynkevitch.net * gengtype.c (file_rules): Added rule for *.cc files. (get_output_file_with_visibility): Give fatal message when no rules found. OK. Thanks for the review and the approval. I just commited revision 202782. BTW, I believe this patch should be back ported, at least to next GCC 4.8.2 (and perhaps even to 4.7). What do you think? Why? There are no *.cc files in 4.8.x and I don't see why this patch would be desirable for the release branches. Because 4.8 (and even 4.7) plugins may have .cc files and they are running (when they have GTY-ed data) gengtype to be built. In general, I still do find that naming .c a C++ file is very confusing (especially for newbies). BTW, some other compilers don't like that (in particular Clang issues a warning when compiling a C++ file named .c, so Clang issues a lot of warnings when compiling GCC) Of course, one might say that no recent C++ formal standard states that C++ source files should have a .cc or .cxx or .cpp or .C extension (but I am not sure of this) but this is a common practice since several decades (probably since the origin of C++). INSHO plugins for GCC don't have any reasons to follow the GCC (obscure and undocumented) bad crazy habit of naming .c a C++ file (that a plain C compiler won't even compile!). So I believe that the plugin infrastructure should accept -and even mandates, or at least strongly advise- that plugin files should be named *.cc (or *.cpp). And since GCC 4.8 (and even 4.7) requires a C++ compiler, their plugins should generally be coded in C++ not in C. And such a plugin C++ source file has absolutely no reason to be named *.c (the reason why GCC still have *.c file is some obscure version controlling reason; one might be tempted to think that if the version control wants *.c files for C++ source, that version control is broken and should be replaced by some more fancy thing...) Not accepting plugin source files named .cc for C++ source code is IMHO a bug. And some plugins (those with GTY-ed data) do use gengtype during their build. Cheers. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Hi Eric, Thanks for the review. +/* Return true if X is a sign_extract or zero_extract from the least + significant bit. */ + +static bool +lsb_bitfield_op_p (rtx x) +{ + if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS) +{ + enum machine_mode mode = GET_MODE(x); + unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); + HOST_WIDE_INT pos = INTVAL (XEXP (x, 2)); + + return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0)); It seems strange to use the destination mode to decide whether this is the LSB of the source. Indeed, I think it has to be the mode of loc, but I just wonder if it is not always the same, as in the doc it is written that mode m is the same as the mode that would be used for loc if it were a register. +/* Return true if X is a shifting operation. */ + +static bool +shift_code_p (rtx x) +{ + return (GET_CODE (x) == ASHIFT + || GET_CODE (x) == ASHIFTRT + || GET_CODE (x) == LSHIFTRT + || GET_CODE (x) == ROTATE + || GET_CODE (x) == ROTATERT); +} ROTATE and ROTATERT aren't really shifting operations though, so are they really needed here? According to gcc internals, ROTATE and ROTATERT are similar to the shifting operations, but to be more accurate maybe we can rename shif_code_p in shift_and _rotate_code_p rotation are used in arm address calculation, and thus need to be handle in must_be_index_p and set_address_index Thanks, Yvan
Re: New GCC options for loop vectorization
On Mon, Sep 23, 2013 at 10:37 PM, Xinliang David Li davi...@google.com wrote: Thanks. I modified the patch so that the max allowed peel iterations can be specified via a parameter. Testing on going. Ok for trunk ? +DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT, + vect-max-peeling-for-alignment, + Max number of loop peels to enhancement alignment of data references in a loop, + -1, 0, 0) I believe this doesn't allow --param vect-max-peeling-for-alignment=-1 as you specify 0 as the minimum. So, ok with changing minimum and maxmum to -1. (double check that this works) Thanks, Richard. David On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li davi...@google.com wrote: On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li davi...@google.com wrote: On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li davi...@google.com wrote: On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li davi...@google.com wrote: Currently -ftree-vectorize turns on both loop and slp vectorizations, but there is no simple way to turn on loop vectorization alone. The logic for default O3 setting is also complicated. In this patch, two new options are introduced: 1) -ftree-loop-vectorize This option is used to turn on loop vectorization only. option -ftree-slp-vectorize also becomes a first class citizen, and no funny business of Init(2) is needed. With this change, -ftree-vectorize becomes a simple alias to -ftree-loop-vectorize + -ftree-slp-vectorize. For instance, to turn on only slp vectorize at O3, the old way is: -O3 -fno-tree-vectorize -ftree-slp-vectorize With the new change it becomes: -O3 -fno-loop-vectorize To turn on only loop vectorize at O2, the old way is -O2 -ftree-vectorize -fno-slp-vectorize The new way is -O2 -ftree-loop-vectorize 2) -ftree-vect-loop-peeling This option is used to turn on/off loop peeling for alignment. In the long run, this should be folded into the cheap cost model proposed by Richard. This option is also useful in scenarios where peeling can introduce runtime problems: http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be common in practice. Patch attached. Compiler boostrapped. Ok after testing? I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2). Ok. Can you also comment on 2) ? I think we want to decide how granular we want to control the vectorizer and using which mechanism. My cost-model re-org makes ftree-vect-loop-version a no-op (basically removes it), so 2) looks like a step backwards in this context. Using cost model to do a coarse grain control/configuration is certainly something we want, but having a fine grain control is still useful. So, can you summarize what pieces (including versioning) of the vectorizer you'd want to be able to disable separately? Loop peeling seems to be the main one. There is also a correctness issue related. For instance, the following code is common in practice, but loop peeling wrongly assumes initial base-alignment and generates aligned mov instruction after peeling, leading to SEGV. Peeling is not something we can blindly turned on -- even when it is on, there should be a way to turn it off explicitly: char a[1]; void foo(int n) { int* b = (int*)(a+n); int i = 0; for (; i 1000; ++i) b[i] = 1; } int main(int argn, char** argv) { foo(argn); } But that's just a bug that should be fixed (looking into it). Just disabling peeling for alignment may get you into the versioning for alignment path (and thus an unvectorized loop at runtime). This is not true for target supporting mis-aligned access. I have not seen a case where alignment driver loop version happens on x86. Also it's know that the alignment peeling code needs some serious TLC (it's outcome depends on the order of DRs, the cost model it uses leaves to be desired as we cannot distinguish between unaligned load and store costs). Yet another reason to turn it off as it is not effective anyways? As said I'll disable all remains of -ftree-vect-loop-version with the cost model patch because it wasn't guarding versioning for aliasing but only versioning for alignment. We have to be consistent here - if we add a way to disable peeling for alignment then we certainly don't want to remove the ability to disable versioning for alignment, no? We already have the ability to turn off versioning -- via --param. It is a more natural way to fine tune a pass instead of introducing a -f option. For this reason, your planned deprecation of the option is a good
Re: [PATCH, PR 57748] Check for out of bounds access
On Tue, Sep 17, 2013 at 2:08 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: On Tue, 17 Sep 2013 12:45:40, Richard Biener wrote: On Tue, Sep 17, 2013 at 12:00 PM, Richard Biener richard.guent...@gmail.com wrote: On Sun, Sep 15, 2013 at 6:55 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hello Richard, attached is my second attempt at fixing PR 57748. This time the movmisalign path is completely removed and a similar bug in the read handling of misaligned structures with a non-BLKmode is fixed too. There are several new test cases for the different possible failure modes. This patch was boot-strapped and regression tested on x86_64-unknown-linux-gnu and i686-pc-linux-gnu. Additionally I generated eCos and an eCos-application (on ARMv5 using packed structures) with an arm-eabi cross compiler, and looked for differences in the disassembled code with and without this patch, but there were none. OK for trunk? I agree that the existing movmisaling path that you remove is simply bogus, so removing it looks fine to me. Can you give rationale to @@ -4773,6 +4738,8 @@ expand_assignment (tree to, tree from, b if (MEM_P (to_rtx) GET_MODE (to_rtx) == BLKmode GET_MODE (XEXP (to_rtx, 0)) != VOIDmode + bitregion_start == 0 + bitregion_end == 0 bitsize 0 (bitpos % bitsize) == 0 (bitsize % GET_MODE_ALIGNMENT (mode1)) == 0 OK, as already said, I think it could be dangerous to set bitpos=0 without considering bitregion_start/end, but I think it may be possible that this can not happen, because if bitsize is a multiple if ALIGNMENT, and bitpos is a multiple of bitsize, we probably do not have a bit-field at all. And of course I have no test case that fails without this hunk. Maybe it would be better to add an assertion here like: { gcc_assert (bitregion_start == 0 bitregion_end == 0); to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT); bitpos = 0; } and especially to @@ -9905,7 +9861,7 @@ expand_expr_real_1 (tree exp, rtx target modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, - modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); + EXPAND_MEMORY); /* If the bitfield is volatile, we want to access it in the field's mode, not the computed mode. which AFAIK makes memory expansion of loads/stores from/to registers change (fail? go through stack memory?) - see handling of non-MEM return values from that expand_expr call. I wanted to make the expansion of MEM_REF and TARGET_MEM_REF not go thru the final misalign handling, which is guarded by if (modifier != EXPAND_WRITE modifier != EXPAND_MEMORY ... What we want here is most likely EXPAND_MEMORY, which returns a memory context if possible. Could you specify more explicitly what you mean with handling of non-MEM return values from that expand_expr call, then I could try finding test cases for that. In particular this seems to disable all movmisalign handling for MEM_REFs wrapped in component references which looks wrong. I was playing with typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); struct S { long long a[11]; V v; }__attribute__((aligned(8),packed)) ; struct S a, *b = a; V v, w; int main() { v = b-v; b-v = w; return 0; } (use -fno-common) and I see that we use unaligned stores too often (even with a properly aligned MEM). The above at least shows movmisalign opportunities wrapped in component-refs. hmm, interesting. This does not compile differently with or without this patch. I have another observation, regarding the testcase pr50444.c: method: .LFB4: .cfi_startproc movq32(%rdi), %rax testq %rax, %rax jne .L7 addl$1, 16(%rdi) movl$3, %eax movq%rax, 32(%rdi) movdqu 16(%rdi), %xmm0 pxor(%rdi), %xmm0 movdqu %xmm0, 40(%rdi) here the first movdqu could as well be movdqa, because 16+rdi is 128-bit aligned. In the ctor method a movdqa is used, but the SRA is very pessimistic and generates an unaligned MEM_REF. Also this example does not compile any different with this patch. That is, do you see anything break with just removing the movmisalign path? I'd rather install that (with the new testcases that then pass) separately as this is a somewhat fragile area and being able to more selectively bisect/backport would be nice. No, I think that is a good idea. Attached the first part of the patch, that does only remove the movmisalign path. Should I apply this one after regression testing? It seems you already have...? Richard. Bernd. Thanks, Richard. Regards Bernd.
Re: gimple build interface
On 09/24/2013 05:50 AM, Richard Biener wrote: Did you forget to attach the patch? Of course not! :-P Oh how I hate mondays. Old patch attached? Richard. Errr. no. that last one has gimple-builder.[ch] and the ssa_replace_lhs renamed to gimple_replace_ssa_lhs.. I'll also wait for Tromey's auto dependency patch, and then just add the .o to the list to be built in Makefile.in. Andrew
Re: gimple build interface
On Tue, Sep 24, 2013 at 1:31 PM, Andrew MacLeod amacl...@redhat.com wrote: On 09/24/2013 05:50 AM, Richard Biener wrote: Did you forget to attach the patch? Of course not! :-P Oh how I hate mondays. Old patch attached? Richard. Errr. no. that last one has gimple-builder.[ch] and the ssa_replace_lhs renamed to gimple_replace_ssa_lhs.. I'll also wait for Tromey's auto dependency patch, and then just add the .o to the list to be built in Makefile.in. Ah, stupid firefox saving as gimp(1).patch. This looks ok. Thanks, Richard. Andrew
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). I suppose so, but it of course depends on what IVOPTs uses the answer for in the end. Appearantly it doesn't distinguish between the various cases even though TARGET_MEM_REF does support all the variants in question (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + reg + const). So the better answer may be to teach the costs about the differences? Ideally, IVOPT should be aware whether scaling is allowed in every kind of addressing modes and account cost of multiplier accordingly. For current code, there are two scenarios here 1) If target supports reg*scale+reg, but not reg*scale, in this case, IVOPT considers multiplier is not allowed in any addressing mode and account multiplier with high cost. This is the problem arm having. 2) If target supports reg*scale, but not some kind of addressing mode (saying reg*scale+reg), in this case, IVOPT still constructs various scaled addressing mode in get_address_cost and depends on address_cost to compute correct cost for that addressing expression. I think this happens to work even IVOPT doesn't know reg*scale+reg is actually not supported. The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. Ok. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. I see. Also from the newer comments: Btw, it should be reasonably possible to compute the whole multiplier_allowed_in_address_p table for all primary and secondary archs (simply build cross-cc1) and compare the results before / after a patch candidate. Querying both reg * scale and reg + reg * scale if the first fails sounds like a good solution to me. I take this as we should do minimal change by checking reg + reg * scale if reg * scale is failed, right? Thanks. bin -- Best Regards.
[PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set
Eric, the current way component_uses_parent_alias_set is called from get_alias_set causes the reference tree chain to be walked O(n^2) in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT reference in the tree chain. Also the comment above component_uses_parent_alias_set doesn't seem to match behavior in get_alias_set. get_alias_set ends up using exactly the type of the parent of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference instead of the alias set provided by the object at the heart of T I also fail to see why component_uses_parent_alias_set invokes get_alias_set (is that to make 'a.c' in struct { char c; } a; use the alias set of 'a' instead of the alias set of 'char'? Or is it for correctness reasons in case there is a ref-all indirect ref at the base? For the former I believe it doesn't work because it only checks the non-outermost type). So, the following patch removes the quadraticness by returning the parent of the innermost non-addressable (or alias-set zero) reference component instead of just a flag. That changes behavior for the alias-set zero case but still doesn't match the overall function comment. The only other use besides from get_alias_set is to set MEM_KEEP_ALIAS_SET_P - whatever that is exactly for, I can see a single non-obvious use in store_constructor_field (didn't bother to lookup how callers specify the alias_set argument). In if (MEM_P (to_rtx) !MEM_KEEP_ALIAS_SET_P (to_rtx) DECL_NONADDRESSABLE_P (field)) { to_rtx = copy_rtx (to_rtx); MEM_KEEP_ALIAS_SET_P (to_rtx) = 1; } we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case MEM_KEEP_ALIAS_SET_P is dropped). And for the following call store_constructor_field (to_rtx, bitsize, bitpos, mode, value, cleared, get_alias_set (TREE_TYPE (field))); pass MEM_ALIAS_SET instead of get_alias_set (TREE_TYPE (field)) for DECL_NONADDRESSABLE_P (field), similar for the array handling in the other use. Btw, checks like if (!MEM_KEEP_ALIAS_SET_P (to_rtx) MEM_ALIAS_SET (to_rtx) != 0) set_mem_alias_set (to_rtx, alias_set); seem to miss MEM_ALIAS_SET (to_rtx) being ALIAS_SET_MEMORY_BARRIER? Or alias_set being zero / ALIAS_SET_MEMORY_BARRIER? Anyway, the patch nearly preserves behavior for the emit-rtl.c use. Comments? Thanks, Richard. Index: gcc/alias.c === --- gcc/alias.c (revision 202865) +++ gcc/alias.c (working copy) @@ -510,41 +510,49 @@ objects_must_conflict_p (tree t1, tree t alias set used by the component, but we don't have per-FIELD_DECL assignable alias sets. */ -bool +tree component_uses_parent_alias_set (const_tree t) { - while (1) + const_tree non_addressable = NULL_TREE; + while (handled_component_p (t)) { - /* If we're at the end, it vacuously uses its own alias set. */ - if (!handled_component_p (t)) - return false; - switch (TREE_CODE (t)) { case COMPONENT_REF: if (DECL_NONADDRESSABLE_P (TREE_OPERAND (t, 1))) - return true; + non_addressable = t; break; case ARRAY_REF: case ARRAY_RANGE_REF: if (TYPE_NONALIASED_COMPONENT (TREE_TYPE (TREE_OPERAND (t, 0 - return true; + non_addressable = t; break; case REALPART_EXPR: case IMAGPART_EXPR: break; - default: + case BIT_FIELD_REF: + case VIEW_CONVERT_EXPR: /* Bitfields and casts are never addressable. */ - return true; + non_addressable = t; + break; + + default: + gcc_unreachable (); } - t = TREE_OPERAND (t, 0); if (get_alias_set (TREE_TYPE (t)) == 0) - return true; + non_addressable = t; + + t = TREE_OPERAND (t, 0); } + + if (non_addressable) +return TREE_OPERAND (non_addressable, 0); + + return NULL_TREE; } @@ -645,14 +653,11 @@ reference_alias_ptr_type_1 (tree *t) (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1)) return TREE_TYPE (TREE_OPERAND (inner, 1)); - /* Otherwise, pick up the outermost object that we could have a pointer - to, processing conversions as above. */ - /* ??? Ick, this is worse than quadratic! */ - while (component_uses_parent_alias_set (*t)) -{ - *t = TREE_OPERAND (*t, 0); - STRIP_NOPS (*t); -} + /* Otherwise, pick up the outermost object that we could have + a pointer to. */ + tree tem = component_uses_parent_alias_set (*t); + if (tem) +*t = tem; return NULL_TREE; } Index: gcc/alias.h === --- gcc/alias.h (revision 202865) +++ gcc/alias.h (working copy) @@ -33,7 +33,7 @@ extern alias_set_type get_alias_set
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). I suppose so, but it of course depends on what IVOPTs uses the answer for in the end. Appearantly it doesn't distinguish between the various cases even though TARGET_MEM_REF does support all the variants in question (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + reg + const). So the better answer may be to teach the costs about the differences? Ideally, IVOPT should be aware whether scaling is allowed in every kind of addressing modes and account cost of multiplier accordingly. For current code, there are two scenarios here 1) If target supports reg*scale+reg, but not reg*scale, in this case, IVOPT considers multiplier is not allowed in any addressing mode and account multiplier with high cost. This is the problem arm having. 2) If target supports reg*scale, but not some kind of addressing mode (saying reg*scale+reg), in this case, IVOPT still constructs various scaled addressing mode in get_address_cost and depends on address_cost to compute correct cost for that addressing expression. I think this happens to work even IVOPT doesn't know reg*scale+reg is actually not supported. The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. Ok. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. I see. Also from the newer comments: Btw, it should be reasonably possible to compute the whole multiplier_allowed_in_address_p table for all primary and secondary archs (simply build cross-cc1) and compare the results before / after a patch candidate. Querying both reg * scale and reg + reg * scale if the first fails sounds like a good solution to me. I take this as we should do minimal change by checking reg + reg * scale if reg * scale is failed, right? Yes, you can share a single RTL expression for all this and I think querying reg + reg * scale first makes sense (then fallback to reg * scale for compatibility). Richard. Thanks. bin -- Best Regards.
Re: [Patch] match_results::format and regex_replace
On 09/23/2013 10:09 PM, Tim Shen wrote: On Sun, Sep 22, 2013 at 4:20 PM, Paolo Carlini paolo.carl...@oracle.com wrote: If testing goes well patch is Ok to commit. Tested under -m32 and -m64 and committed :) I'll learn how locale in glibc works. Thank you all! Thank *you*! regex has been dogging us for years. I, for one, hope you hang around after GSOC. There are lots of library components coming up for C++2014 and other TS. ;-) Ed
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
Hi Honza, I am finally getting back to working on this after a few weeks of working on some other priorities. I am also trying to return to this, so good timming ;) Martin has got smaller C++ programs (Inkscape) to not touch cold segment during the startup with FDO (w/o partitioning). Firefox still does, I think the problem are lost samples due to different linker decisions even with LTO. (i.e. linker pick an object from .a libraryat profile-generate time that i never passes later.). I plan to look into that today. Did you mean to commit the above change? I see that it went in as part of r202258 but doesn't show up in the ChangeLog entry for that revision. Yes, I meant to check it in, but did not mean to do so w/o Changelog. I wil fix that. In other cases it was mostly loop unrolling in combination with jump threading. So I modified my script to separately report when failure happens for test trained once and test trained hundred times. Thanks for the linker script. I reproduced your results. I looked at a couple cases. The first was one that failed after 1 training run only (2910-2.c). It was due to jump threading, which you noted was a problem. For this one I think we can handle it in the partitioning, since there is an FDO insanity that we could probably treat more conservatively when splitting. We should fix the roundoff issues - when I was introducing the frequency/probability/count system I made an assumptions that parts of programs with very low counts do not matter, since they are not part of hot spot (and I cared only about the hot spot). Now we care about identifying unlikely executed spots and we need to fix this. I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto bb 16; else goto bb 19; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2422-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2910-2.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20020413-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20030903-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060420-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060905-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-2.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120808-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920726-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/981001-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/981001-1.c FAIL1
Re: [PATCH] Fix typo in check for null
On Mon, Sep 23, 2013 at 02:03:02PM +, Paulo Matos wrote: This is a patch for master, where in number_of_loops, we want to check if the loops (returned by loops_for_fn) is non-null but instead we are using fn, which is the function argument. I haven't opened a bug report, please let me know if I need to do that before submitting patches. 2013-09-23 Paulo Matos pma...@broadcom.com * gcc/cfgloop.h (number_of_loops): Fix typo in check for null The CL should have two spaces (twice) between the date and the date and the e-mail, plus please drop the gcc/ prefix and add a full stop at the end of the sentence. Marek
Re: [PATCH][ARM][testsuite] Add effective target check for arm conditional execution
On 13/09/13 16:25, Kyrill Tkachov wrote: Hi all, gcc.target/arm/minmax_minus.c is really only valid when we have conditional execution available, that is non Thumb1-only targets. I've added an effective target check for that and used it in the test so that it does not get run and give a false negative when testing Thumb1 targets. Ok for trunk? Thanks, Kyrill 2013-09-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * lib/target-supports.exp (check_effective_target_arm_cond_exec): New procedure. * gcc.target/arm/minmax_minus.c: Check for cond_exec target. Ping.
Re: patch to canonize small wide-ints.
On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard.
RE: [PATCH] Fix typo in check for null
-Original Message- From: Marek Polacek [mailto:pola...@redhat.com] Sent: 24 September 2013 13:57 To: Paulo Matos Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix typo in check for null On Mon, Sep 23, 2013 at 02:03:02PM +, Paulo Matos wrote: This is a patch for master, where in number_of_loops, we want to check if the loops (returned by loops_for_fn) is non-null but instead we are using fn, which is the function argument. I haven't opened a bug report, please let me know if I need to do that before submitting patches. 2013-09-23 Paulo Matos pma...@broadcom.com * gcc/cfgloop.h (number_of_loops): Fix typo in check for null The CL should have two spaces (twice) between the date and the date and the e-mail, plus please drop the gcc/ prefix and add a full stop at the end of the sentence. Marek Thanks for the comments Marek, will fix it. 2013-09-24 Paulo Matos pma...@broadcom.com * cfgloop.h (number_of_loops): Fix typo in check for null.
RE: [PATCH] Fix typo in check for null
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: 24 September 2013 10:03 To: Paulo Matos Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix typo in check for null On Mon, Sep 23, 2013 at 4:03 PM, Paulo Matos pma...@broadcom.com wrote: This is a patch for master, where in number_of_loops, we want to check if the loops (returned by loops_for_fn) is non-null but instead we are using fn, which is the function argument. I haven't opened a bug report, please let me know if I need to do that before submitting patches. Patch is ok. Thanks, Richard. Will submit once I get a reply from the write access request. Cheers, Paulo
Re: patch to canonize small wide-ints.
On Tue, 24 Sep 2013, Kenneth Zadeck wrote: On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. Yes, this will give wide-ints a sign (as I originally proposed). For RTL constants we're going to lie and make all constants negative that have their MSB set. Richard.
Re: [PATCH] Fix typo in check for null
On Tue, Sep 24, 2013 at 01:44:30PM +, Paulo Matos wrote: Thanks for the comments Marek, will fix it. 2013-09-24 Paulo Matos pma...@broadcom.com ^^ Two spaces between name and e-mail. Sorry for nitpicking like this. Thanks! Marek
Re: patch to canonize small wide-ints.
On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. thanks kenny
Re: patch to canonize small wide-ints.
On 09/24/2013 09:51 AM, Richard Biener wrote: On Tue, 24 Sep 2013, Kenneth Zadeck wrote: On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. Yes, this will give wide-ints a sign (as I originally proposed). For RTL constants we're going to lie and make all constants negative that have their MSB set. Richard. it only sort of does. anyway it seems to be mostly consistent. As i said on irc, i am pleasantly surprised at the lack of damage to the ports by assuming that the values are always canonized. Two large public ports and my private port required no changes.
RE: [PATCH] Fix typo in check for null
-Original Message- From: Marek Polacek [mailto:pola...@redhat.com] Sent: 24 September 2013 14:52 To: Paulo Matos Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix typo in check for null On Tue, Sep 24, 2013 at 01:44:30PM +, Paulo Matos wrote: Thanks for the comments Marek, will fix it. 2013-09-24 Paulo Matos pma...@broadcom.com ^^ Two spaces between name and e-mail. Sorry for nitpicking like this. Thanks! Marek It's fine. :) 2013-09-24 Paulo Matos pma...@broadcom.com * cfgloop.h (number_of_loops): Fix typo in check for null.
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
Index: gcc/expr.c === --- gcc/expr.c (revision 202778) +++ gcc/expr.c (working copy) @@ -9878,7 +9878,7 @@ modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, -modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); +EXPAND_MEMORY); /* If the bitfield is volatile, we want to access it in the field's mode, not the computed mode. context suggests that we may arrive with EXPAND_STACK_PARM here which is a correctness modifier (see its docs). But I'm not too familiar with the details of the various expand modifiers, Eric may be though. Yes, this change looks far too bold and is presumably papering over the underlying issue... That said, I still believe that fixing the misalign path in expand_assignment would be better than trying to avoid it. For this testcase the issue is again that expand_assignment passes the wrong mode/target to the movmisalign optab. ...then let's just fix the movmisalign stuff. -- Eric Botcazou
Re: [patch, libgfortran, configure] Cross-compile support for libgfortran
On 23/09/13 18:43, Steve Ellcey wrote: On Mon, 2013-09-23 at 16:26 +0100, Marcus Shawcroft wrote: On 4 June 2013 20:49, Steve Ellcey sell...@mips.com wrote: This patch allows me to build libgfortran for a cross-compiling toolchain using newlib. Currently the checks done by AC_CHECK_FUNCS_ONCE fail with my toolchain because the compile/link fails due to the configure script not using the needed linker script in the link command. The check for with_newlib is how libjava deals with the problem and it fixes my build problems. My only concern is defining HAVE_STRTOLD, strtold exists in my newlib but I am not sure if that is true for all newlib builds. I didn't see any flags that I could easily use to check for long double support in the libgfortran configure.ac, but it seems to assume that the type exists. OK to checkin? This patch breaks systems where long double is wider than double. The newlib implementation provides strtold for systems where double is as wide as long double but not on systems where long double is wider. I don;t see any issues with AC_CHECK_FUNC_ONCE when cross configuring aarch64 toolchains but I would have thought that fixing the link test issue you encountered would be preferable to hard coding assumptions in the configure script? Cheers /Marcus AC_CHECK_FUNC_ONCE may work for aarch64 but I don't think there is anyway to make it work generally for all platforms. In the libjava configure.ac file is the comments: I think it would be preferable to check whether link tests do something sane and only fall back to hard coding choices when that is not the case. As you've currently written things we do it the other way around. Historically that's caused problems as newlib has evolved.
[PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
Hi, This patch removes REG_DEAD and REG_UNUSED notes in update_inc_notes, as it is what the function is supposed to do (see the comments) and as keeping these notes produce some failures, at least on ARM. Thanks, Yvan 2013-09-24 Yvan Roux yvan.r...@linaro.org * lra.c (update_inc_notes): Remove all REG_DEAD and REG_UNUSED notes. lra-reg-notes.diff Description: Binary data
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Indeed, I think it has to be the mode of loc, but I just wonder if it is not always the same, as in the doc it is written that mode m is the same as the mode that would be used for loc if it were a register. So can we assert that we have a REG here and use GET_MODE (XEXP (x, 0))? Or else return false if we don't have a REG. According to gcc internals, ROTATE and ROTATERT are similar to the shifting operations, but to be more accurate maybe we can rename shif_code_p in shift_and _rotate_code_p rotation are used in arm address calculation, and thus need to be handle in must_be_index_p and set_address_index Egad. I guess I just wanted to see it written down. :-) Btw, are you sure that you don't need to modify must_be_index_p instead? /* Return true if X must be an index rather than a base. */ static bool must_be_index_p (rtx x) { return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT; } and call it from set_address_index? -- Eric Botcazou
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
This patch removes REG_DEAD and REG_UNUSED notes in update_inc_notes, as it is what the function is supposed to do (see the comments) and as keeping these notes produce some failures, at least on ARM. The description is too terse. In the RTL middle-end, you shouldn't have to manually deal with the REG_DEAD and REG_UNUSED notes (unlike REG_EQUAL and REG_EQUIV notes), as the DF framework is supposed to do it for you. -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
So can we assert that we have a REG here and use GET_MODE (XEXP (x, 0))? Or else return false if we don't have a REG. I'm currently testing the patch with the modification below +static bool +lsb_bitfield_op_p (rtx x) +{ + if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS) +{ + enum machine_mode mode = GET_MODE(XEXP (x, 0)); + unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); + HOST_WIDE_INT pos = INTVAL (XEXP (x, 2)); + + return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0)); +} + return false; +} According to gcc internals, ROTATE and ROTATERT are similar to the shifting operations, but to be more accurate maybe we can rename shif_code_p in shift_and _rotate_code_p rotation are used in arm address calculation, and thus need to be handle in must_be_index_p and set_address_index Egad. I guess I just wanted to see it written down. :-) Sorry, I'm not sure I understand well, it means that you prefer the shift_and _rotate_code_p notation, right ? Btw, are you sure that you don't need to modify must_be_index_p instead? /* Return true if X must be an index rather than a base. */ static bool must_be_index_p (rtx x) { return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT; } and call it from set_address_index? Yes, if we don't modify must_be_index_p we'll have failures when enabling LRA on ARM. You can find an history if the patch in the thread named RFC: patch to build GCC for arm with LRA which started more or less here : http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00526.html Thanks, Yvan -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Sorry, I'm not sure I understand well, it means that you prefer the shift_and _rotate_code_p notation, right ? Let's do the following in addition to the lsb_bitfield_op_p thing: 1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p, 2. Replace the MULT || ASHIFT test in set_address_index by a call to must_be_index_p, 3. Add the new cases to must_be_index_p directly, with a comment saying that there are e.g. for the ARM. -- Eric Botcazou
Re: [patch] Cleanup tree-ssa-ter.c exports
On 09/16/2013 10:42 AM, Andrew MacLeod wrote: On 09/16/2013 04:55 AM, Richard Biener wrote: On Fri, Sep 13, 2013 at 9:15 PM, Andrew MacLeod amacl...@redhat.com wrote: OK, a slightly different take.. I realized that I should be adding tree-outof-ssa.h to handle the 3 exports from tree-outof-ssa.c that are in ssaexpand.h... In fact, by far the most sensible thing to do is to simply rename tree-outof-ssa.c to ssaexpand.c. This actually resolves a number of warts... And is_replaceable_p() very naturally fits in ssaexpand.c now... what do you think of this option? :-) and svn rename preserves all the tree-outof-ssa.c history... I don't like the new name for tree-outof-ssa.c, it matches less to its contents. I'd say either keep ssaexpand.h and tree-outof-ssa.c as-is or rename ssaexpand.h to tree-outof-ssa.h. I prefer the latter. I as well. ssaexpand.h -tree-outof-ssa.h it is. The rest of the changes look ok to me, but watch out for odd whitespace changes: +static inline bool +ter_is_replaceable_p (gimple stmt) +{ + + if (ssa_is_replaceable_p (stmt)) spurious vertical space. bah, where'd that come from :-P. I'll check this approach in after running it through the gauntlet again. Andrew Huh, still in my queue here. For the record, here's the patch I'll was going to apply changes again , (without Makefile.in dependencies), and once I re-verify with a new bootstrap and testsuite runs I'll check it in. Andrew * tree-ssa-live.h (find_replaceable_exprs, dump_replaceable_exprs): Move prototypes to... * tree-ssa-ter.h: New File. Move prototypes here. * tree-flow.h (stmt_is_replaceable_p): Remove prototype. * tree-outof-ssa.h: New. Rename ssaexpand.h, include tree-ssa-ter.h. * tree-outof-ssa.c (ssa_is_replaceable_p): New. Refactor common bits from is_replaceable_p. * tree-ssa-ter.c (is_replaceable_p, stmt_is_replaceable_p): Delete. (ter_is_replaceable_p): New. Use new refactored ssa_is_replaceable_p. (process_replaceable): Use ter_is_replaceable_p. (find_replaceable_in_bb): Use ter_is_replaceable_p. * expr.c (stmt_is_replaceable_p): Relocate from tree-ssa-ter.c. Use newly refactored ssa_is_replaceable_p. * cfgexpand.c: Include tree-outof-ssa.h. * ssaexpand.h: Delete. Index: tree-ssa-live.h === *** tree-ssa-live.h (revision 202699) --- tree-ssa-live.h (working copy) *** make_live_on_entry (tree_live_info_p liv *** 325,334 /* From tree-ssa-coalesce.c */ extern var_map coalesce_ssa_name (void); - - /* From tree-ssa-ter.c */ - extern bitmap find_replaceable_exprs (var_map); - extern void dump_replaceable_exprs (FILE *, bitmap); - - #endif /* _TREE_SSA_LIVE_H */ --- 325,328 Index: tree-ssa-ter.h === *** tree-ssa-ter.h (revision 0) --- tree-ssa-ter.h (revision 0) *** *** 0 --- 1,26 + /* Header file for tree-ssa-ter.c exports. +Copyright (C) 2013 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 3, or (at your option) any later + version. + + GCC is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + http://www.gnu.org/licenses/. */ + + #ifndef GCC_TREE_SSA_TER_H + #define GCC_TREE_SSA_TER_H + + extern bitmap find_replaceable_exprs (var_map); + extern void dump_replaceable_exprs (FILE *, bitmap); + + #endif /* GCC_TREE_SSA_TER_H */ Index: tree-flow.h === *** tree-flow.h (revision 202699) --- tree-flow.h (working copy) *** bool fixup_noreturn_call (gimple stmt); *** 683,691 /* In ipa-pure-const.c */ void warn_function_noreturn (tree); - /* In tree-ssa-ter.c */ - bool stmt_is_replaceable_p (gimple); - /* In tree-parloops.c */ bool parallelized_function_p (tree); --- 683,688 Index: tree-outof-ssa.h === *** tree-outof-ssa.h (revision 202699) --- tree-outof-ssa.h (working copy) *** along with GCC; see the file COPYING3. *** 22,27 --- 22,28 #define _SSAEXPAND_H 1 #include tree-ssa-live.h + #include tree-ssa-ter.h /* This structure (of which only a singleton SA exists) is used to pass around information between the outof-SSA functions, cfgexpand *** get_gimple_for_ssa_name (tree exp) *** 71,79 return NULL; } ! /* In tree-outof-ssa.c. */ ! void finish_out_of_ssa
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
On 23 September 2013 22:34, Eric Botcazou ebotca...@adacore.com wrote: Nice patch, but the formatting is totally wrong wrt spaces, please reformat using 2-space indentation and 8-space TABs, as already used in the files. Sorry for missing this problem when committing Kugan's patch. I have just committed the attached patch, which I hope fixes all the spaces/indentation issues introduced. Christophe. 2013-09-24 Christophe Lyon christophe.l...@linaro.org * gimple-pretty-print.c: Various whitespace tweaks. * tree-core.h: Likewise. * tree-pretty-print.c: Likewise. * tree-ssa-alias.c: Likewise. * tree-ssa-copy.c: Likewise. * tree-ssanames.c: Likewise. * tree-ssanames.h: Likewise. * tree-vrp.c: Likewise. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 202868) +++ gcc/tree-vrp.c (revision 202869) @@ -9451,48 +9451,48 @@ vrp_finalize (void) /* Set value range to non pointer SSA_NAMEs. */ for (i = 0; i num_vr_values; i++) - if (vr_value[i]) -{ - tree name = ssa_name (i); +if (vr_value[i]) + { + tree name = ssa_name (i); if (!name || POINTER_TYPE_P (TREE_TYPE (name)) - || (vr_value[i]-type == VR_VARYING) - || (vr_value[i]-type == VR_UNDEFINED)) -continue; - - if ((TREE_CODE (vr_value[i]-min) == INTEGER_CST) - (TREE_CODE (vr_value[i]-max) == INTEGER_CST)) -{ - if (vr_value[i]-type == VR_RANGE) -set_range_info (name, -tree_to_double_int (vr_value[i]-min), -tree_to_double_int (vr_value[i]-max)); - else if (vr_value[i]-type == VR_ANTI_RANGE) -{ - /* VR_ANTI_RANGE ~[min, max] is encoded compactly as - [max + 1, min - 1] without additional attributes. - When min value max value, we know that it is - VR_ANTI_RANGE; it is VR_RANGE otherwise. */ - - /* ~[0,0] anti-range is represented as - range. */ - if (TYPE_UNSIGNED (TREE_TYPE (name)) - integer_zerop (vr_value[i]-min) - integer_zerop (vr_value[i]-max)) -set_range_info (name, -double_int_one, -double_int::max_value -(TYPE_PRECISION (TREE_TYPE (name)), true)); - else -set_range_info (name, -tree_to_double_int (vr_value[i]-max) -+ double_int_one, -tree_to_double_int (vr_value[i]-min) -- double_int_one); -} -} -} + || (vr_value[i]-type == VR_VARYING) + || (vr_value[i]-type == VR_UNDEFINED)) + continue; + + if ((TREE_CODE (vr_value[i]-min) == INTEGER_CST) +(TREE_CODE (vr_value[i]-max) == INTEGER_CST)) + { + if (vr_value[i]-type == VR_RANGE) + set_range_info (name, + tree_to_double_int (vr_value[i]-min), + tree_to_double_int (vr_value[i]-max)); + else if (vr_value[i]-type == VR_ANTI_RANGE) + { + /* VR_ANTI_RANGE ~[min, max] is encoded compactly as + [max + 1, min - 1] without additional attributes. + When min value max value, we know that it is + VR_ANTI_RANGE; it is VR_RANGE otherwise. */ + + /* ~[0,0] anti-range is represented as + range. */ + if (TYPE_UNSIGNED (TREE_TYPE (name)) +integer_zerop (vr_value[i]-min) +integer_zerop (vr_value[i]-max)) + set_range_info (name, + double_int_one, + double_int::max_value + (TYPE_PRECISION (TREE_TYPE (name)), true)); + else + set_range_info (name, + tree_to_double_int (vr_value[i]-max) + + double_int_one, + tree_to_double_int (vr_value[i]-min) + - double_int_one); + } + } + } /* Free allocated memory. */ for (i = 0; i num_vr_values; i++) Index: gcc/tree-pretty-print.c === --- gcc/tree-pretty-print.c (revision 202868) +++ gcc/tree-pretty-print.c (revision 202869) @@ -1063,8 +1063,8 @@ dump_generic_node (pretty_printer *buffe pp_string (buffer, B); /* pseudo-unit */ } else -pp_double_int (buffer, tree_to_double_int (node), - TYPE_UNSIGNED (TREE_TYPE (node))); +
Re: patch to canonize small wide-ints.
Richard Biener rguent...@suse.de writes: On Tue, 24 Sep 2013, Kenneth Zadeck wrote: On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. Yes, this will give wide-ints a sign (as I originally proposed). No, nothing changes on that front. An N-bit wide_int has no sign. It looks the same whether it came from a signed tree constant, an unsigned tree constant, or an rtx (which also has no sign). All the posted patch does is change the internal representation of excess bits, which are operationally don't-cares. The semantics are just the same as before. And the point of the follow-up change that Kenny mentioned is to avoid a copy when treating an N-bit INTEGER_CST as a wider-than-N wide_int in cases where we're extending according to TYPE_SIGN. I.e. it's purely an optimisation. The resulting wide_int storage is exactly the same as before, we just do less work to get it. Thanks, Richard
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Thanks Eric, here is the new patch, validation is ongoing for ARM. Yvan 2013-09-24 Yvan Roux yvan.r...@linaro.org Vladimir Makarov vmaka...@redhat.com * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations from the least significant bit. (strip_address_mutations): Add bitfield operations handling. (must_be_index_p): Add shifting and rotate operations handling. (set_address_base): Use must_be_base_p predicate. (set_address_index):Use must_be_index_p predicate. On 24 September 2013 17:26, Eric Botcazou ebotca...@adacore.com wrote: Sorry, I'm not sure I understand well, it means that you prefer the shift_and _rotate_code_p notation, right ? Let's do the following in addition to the lsb_bitfield_op_p thing: 1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p, 2. Replace the MULT || ASHIFT test in set_address_index by a call to must_be_index_p, 3. Add the new cases to must_be_index_p directly, with a comment saying that there are e.g. for the ARM. -- Eric Botcazou rtlanal.patch Description: Binary data
Re: [v3] More noexcept -- 6th
Hi, On 9/24/13 11:13 AM, Marc Glisse wrote: Hello, bootstrap+testsuite ok. I think all container iterators are done, but not the containers themselves. Ok, thanks. Paolo.
[PATCH] Set expr loc safely (PR c++/58516)
I admit I haven't spent much time on this, but it seems we should just check whether we can set the expr location before actually setting it... Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-09-24 Marek Polacek pola...@redhat.com PR c++/58516 cp/ * semantics.c (finish_transaction_stmt): Set location only when the expression can have location. testsuite/ * g++.dg/tm/pr58516.C: New test. --- gcc/cp/semantics.c.mp 2013-09-24 17:24:59.907548551 +0200 +++ gcc/cp/semantics.c 2013-09-24 17:25:04.251564960 +0200 @@ -5199,7 +5199,9 @@ finish_transaction_stmt (tree stmt, tree { tree body = build_must_not_throw_expr (TRANSACTION_EXPR_BODY (stmt), noex); - SET_EXPR_LOCATION (body, EXPR_LOCATION (TRANSACTION_EXPR_BODY (stmt))); + /* This may occur when the STATEMENT_LIST is empty. */ + if (CAN_HAVE_LOCATION_P (body)) +SET_EXPR_LOCATION (body, EXPR_LOCATION (TRANSACTION_EXPR_BODY (stmt))); TREE_SIDE_EFFECTS (body) = 1; TRANSACTION_EXPR_BODY (stmt) = body; } --- gcc/testsuite/g++.dg/tm/pr58516.C.mp2013-09-24 17:27:08.859055542 +0200 +++ gcc/testsuite/g++.dg/tm/pr58516.C 2013-09-24 17:28:29.829354635 +0200 @@ -0,0 +1,7 @@ +// { dg-do compile } +// { dg-options -std=c++0x -fgnu-tm } + +void foo() +{ + __transaction_atomic noexcept(false) {} +} Marek
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Eric Botcazou ebotca...@adacore.com writes: Sorry, I'm not sure I understand well, it means that you prefer the shift_and _rotate_code_p notation, right ? Let's do the following in addition to the lsb_bitfield_op_p thing: 1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p, 2. Replace the MULT || ASHIFT test in set_address_index by a call to must_be_index_p, 3. Add the new cases to must_be_index_p directly, with a comment saying that there are e.g. for the ARM. FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and must_be_index_p (x) don't imply that we should look specifically at XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.). I think it's better to keep the code tests and the associated XEXPs together. Thanks, Richard
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
The description is too terse. In the RTL middle-end, you shouldn't have to manually deal with the REG_DEAD and REG_UNUSED notes (unlike REG_EQUAL and REG_EQUIV notes), as the DF framework is supposed to do it for you. Sorry, for that. The description of the LRA function update_inc_notes explicitly says that it removes all the REG_DEAD and REG_UNUSED notes, but the code didn't do it, and after the LRA pass we indeed still have that kind of notes, which wasn't the case with reload, and for instance with the code below: int f(int x) { return (x (sizeof (x) * __CHAR_BIT__ - 1)) ? -1 : 1; } we used to have that kind of insns on ARM: (insn 8 3 27 2 (parallel [ (set (reg:CC 100 cc) (compare:CC (reg:SI 0 r0 [ x ]) (const_int 0 [0]))) (set (reg/v:SI 112 [ x ]) (reg:SI 0 r0 [ x ])) ]) cmp-lra.c:4 205 {*movsi_compare0} (expr_list:REG_DEAD (reg:SI 0 r0 [ x ]) (expr_list:REG_UNUSED (reg:CC 100 cc) (nil (insn 27 8 28 2 (set (reg:CC 100 cc) (compare:CC (reg/v:SI 112 [ x ]) (const_int 0 [0]))) cmp-lra.c:4 228 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg/v:SI 112 [ x ]) (nil))) (note 28 27 17 2 NOTE_INSN_DELETED) (insn 17 28 20 2 (set (reg/i:SI 0 r0) (if_then_else:SI (ge (reg:CC 100 cc) (const_int 0 [0])) (const_int 1 [0x1]) (const_int -1 [0x]))) cmp-lra.c:5 249 {*movsicc_insn} (expr_list:REG_DEAD (reg:CC 100 cc) (nil))) (insn 20 17 0 2 (use (reg/i:SI 0 r0)) cmp-lra.c:5 -1 (nil)) with the notes still present after LRA, and the post-reload pass just drop insns 8 and 27 and the generated code has no more comparison in it... Hope it makes more sense, now Thanks, Yvan
Re: [PATCH] Set expr loc safely (PR c++/58516)
Hi, On 9/24/13 11:35 AM, Marek Polacek wrote: I admit I haven't spent much time on this, but it seems we should just check whether we can set the expr location before actually setting it... Regtested/bootstrapped on x86_64-linux, ok for trunk? Two minor nits (assuming the general idea makes sense, first blush it does). First, solicited by the ChangeLog entry too and given that non-null expression trees can always have locations, I wonder if we shouldn't check EXPR_P instead, it seems more direct, unless we really fear that body could be NULL_TREE. I would also phrase in a negative form the comment before the check, like This may not be true when... Second, I'm pretty sure about this one, I think we are standardizing on c++11 for testcases, by now c++0x is essentially legacy. Paolo.
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
On 08/27/2013 11:37 AM, Kirill Yukhin wrote: Hello, This patch is still far too large. I think you should split it up based on every single mode iterator that you need to add or change. Problem is that some iterators are depend on each other, so patches are not going to be tiny. Here is 1st one. It extends VF iterator - biggest impact I believe Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 2006 run shows no stability regressions without -mavx512f option. Ok. r~
Re: [PATCH][ARM][testsuite] Add effective target check for arm conditional execution
On Sep 13, 2013, at 8:25 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: gcc.target/arm/minmax_minus.c is really only valid when we have conditional execution available, that is non Thumb1-only targets. I've added an effective target check for that and used it in the test so that it does not get run and give a false negative when testing Thumb1 targets. Ok for trunk? Ok. I was hoping the arm folks would review it, usually they are very responsive. I'm fine with target maintainers reviewing patches like this.
wide-int: Re: patch to canonize small wide-ints.
I just committed the patch to do this as revision 202871. there are two changes from the earlier patch: 1) the addition of frag in loop-doloop.c. This fixes an rtl canonization bug, but it is unique to the branch. it did not cause a problem on x86 but did on ppc. 2) the code in rtl.h for converting rtl to wide-ints is now much simpler. I now rely on richard's patch, which has been merged into this branch to canonize the rtl constants. I have an assert to make sure that canonization is never violated. I did not change the tree-cst constructor as richard and i had talked about on irc.my next patch will be to change the rep of tree-cst so that unsigned types that have precisions of some multiple of HBPWI and have their top bit set will just carry an extra block of zeros. That will force a rewrite of this constructor. Kenny On 09/24/2013 12:06 PM, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Tue, 24 Sep 2013, Kenneth Zadeck wrote: On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. Yes, this will give wide-ints a sign (as I originally proposed). No, nothing changes on that front. An N-bit wide_int has no sign. It looks the same whether it came from a signed tree constant, an unsigned tree constant, or an rtx (which also has no sign). All the posted patch does is change the internal representation of excess bits, which are operationally don't-cares. The semantics are just the same as before. And the point of the follow-up change that Kenny mentioned is to avoid a copy when treating an N-bit INTEGER_CST as a wider-than-N wide_int in cases where we're extending according to TYPE_SIGN. I.e. it's purely an optimisation. The resulting wide_int storage is exactly the same as before, we just do less work to get it. Thanks, Richard Index: gcc/tree.c === --- gcc/tree.c (revision 202811) +++ gcc/tree.c (working copy) @@ -1112,7 +1112,6 @@ force_fit_type (tree type, const wide_in || (overflowable 0 sign == SIGNED)) { wide_int tmp = wide_int::from (cst, TYPE_PRECISION (type), sign); - wi::clear_undef (tmp, sign); int l = tmp.get_len (); tree t = make_int_cst (l); if (l 1) @@ -1205,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - /* The following call makes sure that all tree-cst's are canonical. - i.e. it really does sign or zero extend the top block of the - value if the precision of the type is not an even multiple of the - size of an HWI. */ - wi::clear_undef (cst, sgn); + int len = int (cst.get_len ()); + int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); + bool recanonize = sgn == UNSIGNED + (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len + small_prec; switch (TREE_CODE (type)) { @@ -1291,18 +1290,31 @@ wide_int_to_tree (tree type, const wide_ t = TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix); if (t) { - /* Make sure no one is clobbering the shared constant. */ + /* Make sure no one is clobbering the shared constant. We + must be careful here because tree-csts and wide-ints are + not canonicalized in the same way. */ gcc_assert (TREE_TYPE (t) == type); - gcc_assert
Re: RFA: Store the REG_BR_PROB probability directly as an int
Andreas Schwab sch...@suse.de writes: Richard Sandiford rdsandif...@googlemail.com writes: REG_BR_PROB notes are stored as: (expr_list:REG_BR_PROB (const_int prob) chain) but a full const_int rtx seems a bit heavweight when all we want is a plain int. This patch uses: (int_list:REG_BR_PROB prob chain) instead. I think you left out the handling of INT_LIST in eliminate_regs_1. This lets me finish the build: Sorry for the breakage. I think we need to handle INT_LIST in the same way as INSN_LIST though, and eliminate in XEXP (x, 1). How about the attached? Testing in progress... Thanks, Richard gcc/ * cse.c (count_reg_usage): Handle INT_LIST. * lra-eliminations.c (lra_eliminate_regs_1): Likewise. * reginfo.c (reg_scan_mark_refs): Likewise. * reload1.c (eliminate_regs_1): Likewise. Index: gcc/cse.c === --- gcc/cse.c 2013-09-24 18:29:49.308378918 +0100 +++ gcc/cse.c 2013-09-24 18:29:49.460380403 +0100 @@ -6739,6 +6739,7 @@ count_reg_usage (rtx x, int *counts, rtx return; case INSN_LIST: +case INT_LIST: gcc_unreachable (); default: Index: gcc/lra-eliminations.c === --- gcc/lra-eliminations.c 2013-09-24 18:29:49.308378918 +0100 +++ gcc/lra-eliminations.c 2013-09-24 18:29:49.461380412 +0100 @@ -471,6 +471,7 @@ lra_eliminate_regs_1 (rtx x, enum machin /* ... fall through ... */ case INSN_LIST: +case INT_LIST: /* Now do eliminations in the rest of the chain. If this was an EXPR_LIST, this might result in allocating more memory than is strictly needed, but it simplifies the code. */ Index: gcc/reginfo.c === --- gcc/reginfo.c 2013-09-24 18:29:49.309378928 +0100 +++ gcc/reginfo.c 2013-09-24 18:29:49.462380422 +0100 @@ -1075,6 +1075,7 @@ reg_scan_mark_refs (rtx x, rtx insn) break; case INSN_LIST: +case INT_LIST: if (XEXP (x, 1)) reg_scan_mark_refs (XEXP (x, 1), insn); break; Index: gcc/reload1.c === --- gcc/reload1.c 2013-09-24 18:29:49.311378947 +0100 +++ gcc/reload1.c 2013-09-24 18:29:49.463380432 +0100 @@ -2776,6 +2776,7 @@ eliminate_regs_1 (rtx x, enum machine_mo /* ... fall through ... */ case INSN_LIST: +case INT_LIST: /* Now do eliminations in the rest of the chain. If this was an EXPR_LIST, this might result in allocating more memory than is strictly needed, but it simplifies the code. */
Re: [PATCH]Fix computation of offset in ivopt
On Tue, 2013-09-24 at 12:31 +0200, Richard Biener wrote: On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch fix two minor bugs when computing offset in IVOPT. 1) Considering below example: #define MAX 100 struct tag { int i; int j; } struct tag arr[MAX] int foo (int len) { int i = 0; for (; i len; i++) { access arr[i].j; } } Without this patch, the offset computed by strip_offset_1 for address arr[i].j is ZERO, which is apparently not. 2) Considering below example: //... bb 15: KeyIndex_66 = KeyIndex_194 + 4294967295; if (KeyIndex_66 != 0) goto bb 16; else goto bb 18; bb 16: bb 17: # KeyIndex_194 = PHI KeyIndex_66(16), KeyIndex_180(73) _62 = KeyIndex_194 + 1073741823; _63 = _62 * 4; _64 = pretmp_184 + _63; _65 = *_64; if (_65 == 0) goto bb 15; else goto bb 71; //... There are iv use and candidate like: use 1 address in statement _65 = *_64; at position *_64 type handletype * base pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 step 4294967292 base object (void *) pretmp_184 related candidates candidate 6 var_before ivtmp.16 var_after ivtmp.16 incremented before use 1 type unsigned int base (unsigned int) (pretmp_184 + (sizetype) KeyIndex_180 * 4) step 4294967292 base object (void *) pretmp_184 Candidate 6 is related to use 1 In function get_computation_cost_at for use 1 using cand 6, ubase and cbase are: pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 pretmp_184 + (sizetype) KeyIndex_180 * 4 The cstepi computed in HOST_WIDE_INT is : 0xfffc, while offset computed in TYPE(utype) is : 0xfffc. Though they both stand for value -4 in different precision, statement offset -= ratio * cstepi returns 0x1, which is wrong. Tested on x86_64 and arm. Is it OK? + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; may still overflow. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than HOST_WIDE_INT. */ + offset = sext_hwi (offset, TYPE_PRECISION (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). After reading overflow and ivopt, I was wondering whether http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 is somehow related. Cheers, Oleg
[PATCH v2 1/4] Ignore access-control keywords when parsing fields.
Classes containing access-control keywords such as public: confuse struct_field_seq, leading it to call consume_until_eos i.e. ignore text until after the next semicolon. This leads to the first field after an access-control keyword being ignored by gengtype. This can be seen in: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01532.html where the autogenerated marking function erroneously omitted the traversal of the callees field of cgraph_node *. This patch fixes up struct_field_seq to gracefully ignore such keywords, thus fixing gengtype so that it does not erroneouly omit fields of such a class. * gengtype-parse.c (struct_field_seq): Ignore access-control keywords (public: etc). --- gcc/gengtype-parse.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/gcc/gengtype-parse.c b/gcc/gengtype-parse.c index 68d372e..e5204c1 100644 --- a/gcc/gengtype-parse.c +++ b/gcc/gengtype-parse.c @@ -733,6 +733,17 @@ struct_field_seq (void) { ty = type (opts, true); + /* Ignore access-control keywords (public: etc). */ + while (!ty token () == IGNORABLE_CXX_KEYWORD) + { + const char *keyword = advance (); + if (strcmp (keyword, public:) != 0 + strcmp (keyword, private:) != 0 + strcmp (keyword, protected:) != 0) + break; + ty = type (opts, true); + } + if (!ty || token () == ':') { consume_until_eos (); -- 1.7.11.7
[PATCH v2 0/4] Support some cases of inheritance in gengtype
Here's an updated version of the patch series. Changes since v1 of the patch series: * Patch 1/4: this one is new: I noticed that the public: specifier within class cgraph_node confused gengtype, leading it to erroneously omit the first field within the class. This patch fixes this bug. * Patch 2/4: this is the updated parser support for base classes. In the previous version of the patch, it only attempted to parse the base classes of GTY()-marked types. In this version of the patch it only attempts to pass the bases of GTY()-marked types that are *not* marked with GTY((user)) i.e. it ignores GTY((user)) types. * Patch 3/4: this version incorporates suggestions by Michael Matz: set_gc_used_type now visits base classes, and all inherited fields, not just those in the class itself, and fixes the stylistic nits he noted. walk_subclasses now only writes out cases for concrete subclasses i.e. those with a tag GTY option. Similarly, I updated USED_BY_TYPED_GC_P so that it emits allocators etc for subclasses with a tag GTY option. I didn't fix the O(N^2) in walk_subclasses (it doesn't seem to affect the speed of the build). * Patch 4/4: this is new: it adds documentation to gty.texi about how desc and tag can be used for class hierarchies. Successfully bootstrapped and regtested together with the symtab patches posted earlier, which port symtab/cgraph/varpool to use this machinery. OK for trunk? David Malcolm (4): Ignore access-control keywords when parsing fields. Parse base classes for GTY-marked types Handle simple inheritance in gengtype. Add documentation about gengtype and inheritance gcc/doc/gty.texi | 52 +++ gcc/gengtype-parse.c | 63 --- gcc/gengtype-state.c | 2 + gcc/gengtype.c | 117 +-- gcc/gengtype.h | 12 +- 5 files changed, 226 insertions(+), 20 deletions(-) -- 1.7.11.7
[PATCH v2 4/4] Add documentation about gengtype and inheritance
gcc/ * doc/gty.texi (GTY Options): Add note about inheritance to description of desc and tag. (Inheritance and GTY): New. --- gcc/doc/gty.texi | 52 1 file changed, 52 insertions(+) diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi index 82e8e4f..090f6a6 100644 --- a/gcc/doc/gty.texi +++ b/gcc/doc/gty.texi @@ -87,6 +87,7 @@ data members. @menu * GTY Options:: What goes inside a @code{GTY(())}. +* Inheritance and GTY:: Adding GTY to a class hierarchy. * User GC::Adding user-provided GC marking routines. * GGC Roots:: Making global variables GGC roots. * Files:: How the generated files work. @@ -234,6 +235,10 @@ In this example, the value of BINDING_HAS_LEVEL_P when applied to a mechanism will treat the field @code{level} as being present and if 0, will treat the field @code{scope} as being present. +The @code{desc} and @code{tag} options can also be used for inheritance +to denote which subclass an instance is. See @ref{Inheritance and GTY} +for more information. + @findex param_is @findex use_param @item param_is (@var{type}) @@ -470,6 +475,53 @@ fields is completely handled by user-provided routines. See section @ref{User GC} for details on what functions need to be provided. @end table +@node Inheritance and GTY +@section Support for inheritance +gengtype has some support for simple class hierarchies. You can use +this to have gengtype autogenerate marking routines, provided: + +@itemize @bullet +@item +There must be a concrete base class, with a discriminator expression +that can be used to identify which subclass an instance is. +@item +Only single inheritance is used. +@item +None of the classes within the hierarchy are templates. +@end itemize + +If your class hierarchy does not fit in this pattern, you must use +@ref{User GC} instead. + +The base class and its discriminator must be identified using the ``desc'' +option. Each concrete subclass must use the ``tag'' option to identify +which value of the discriminator it corresponds to. + +@smallexample +class GTY((desc(%h.kind), tag(0))) example_base +@{ +public: +int kind; +tree a; +@}; + +class GTY((tag(1)) some_subclass : public example_base +@{ +public: +tree b; +@}; + +class GTY((tag(2)) some_other_subclass : public example_base +@{ +public: +tree c; +@}; +@end smallexample + +The generated marking routines for the above will contain a ``switch'' +on ``kind'', visiting all appropriate fields. For example, if kind is +2, it will cast to ``some_other_subclass'' and visit fields a, b, and c. + @node User GC @section Support for user-provided GC marking routines @cindex user gc -- 1.7.11.7
[PATCH v2 2/4] Parse base classes for GTY-marked types
Extend gengtype (and gtype.state reading/writing) so that it is able to parse base classes in simple cases, and only attempt to do it for GTY-marked types. * gengtype-parse.c (require_without_advance): New. (type): For GTY-marked types that are not GTY((user)), parse any base classes, requiring them to be single-inheritance, and not be templates. For non-GTY-marked types and GTY((user)), continue to skip over any C++ inheritance specification. * gengtype-state.c (state_writer::write_state_struct_type): Write base class of type (if any). (read_state_struct_type): Read base class of type (if any). * gengtype.c (new_structure): Add a base_class parameter. (create_optional_field_): Update for new parameter to new_structure. (adjust_field_rtx_def): Likewise. (adjust_field_tree_exp): Likewise. * gengtype.h (struct type): Add base_class field to the s union field. (new_structure): Add base parameter. --- gcc/gengtype-parse.c | 50 +- gcc/gengtype-state.c | 2 ++ gcc/gengtype.c | 15 --- gcc/gengtype.h | 4 +++- 4 files changed, 58 insertions(+), 13 deletions(-) diff --git a/gcc/gengtype-parse.c b/gcc/gengtype-parse.c index e5204c1..31d493a 100644 --- a/gcc/gengtype-parse.c +++ b/gcc/gengtype-parse.c @@ -165,6 +165,21 @@ require (int t) return v; } +/* As per require, but do not advance. */ +static const char * +require_without_advance (int t) +{ + int u = token (); + const char *v = T.value; + if (u != t) +{ + parse_error (expected %s, have %s, + print_token (t, 0), print_token (u, v)); + return 0; +} + return v; +} + /* If the next token does not have one of the codes T1 or T2, report a parse error; otherwise return the token's value. */ static const char * @@ -829,6 +844,7 @@ type (options_p *optsp, bool nested) case STRUCT: case UNION: { + type_p base_class = NULL; options_p opts = 0; /* GTY annotations follow attribute syntax GTY_BEFORE_ID is for union/struct declarations @@ -868,16 +884,39 @@ type (options_p *optsp, bool nested) opts = gtymarker_opt (); } + bool is_user_gty = opts_have (opts, user); + if (token () == ':') { - /* Skip over C++ inheritance specification. */ - while (token () != '{') - advance (); + if (is_gty !is_user_gty) + { + /* For GTY-marked types that are not user, parse some C++ + inheritance specifications. + We require single-inheritance from a non-template type. */ + advance (); + const char *basename = require (ID); + /* This may be either an access specifier, or the base name. */ + if (0 == strcmp (basename, public) + || 0 == strcmp (basename, protected) + || 0 == strcmp (basename, private)) + basename = require (ID); + base_class = find_structure (basename, TYPE_STRUCT); + if (!base_class) + parse_error (unrecognized base class: %s, basename); + require_without_advance ('{'); + } + else + { + /* For types lacking GTY-markings, skip over C++ inheritance + specification (and thus avoid having to parse e.g. template + types). */ + while (token () != '{') + advance (); + } } if (is_gty) { - bool is_user_gty = opts_have (opts, user); if (token () == '{') { pair_p fields; @@ -900,7 +939,8 @@ type (options_p *optsp, bool nested) return create_user_defined_type (s, lexer_line); } - return new_structure (s, kind, lexer_line, fields, opts); + return new_structure (s, kind, lexer_line, fields, opts, + base_class); } } else if (token () == '{') diff --git a/gcc/gengtype-state.c b/gcc/gengtype-state.c index ba7948a..54e4287 100644 --- a/gcc/gengtype-state.c +++ b/gcc/gengtype-state.c @@ -957,6 +957,7 @@ state_writer::write_state_struct_type (type_p current) { write_state_struct_union_type (current, struct); write_state_type (current-u.s.lang_struct); + write_state_type (current-u.s.base_class); } /* Write a GTY user-defined struct type. */ @@ -1613,6 +1614,7 @@ read_state_struct_type (type_p type) read_state_options ((type-u.s.opt)); read_state_lang_bitmap ((type-u.s.bitmap)); read_state_type ((type-u.s.lang_struct)); + read_state_type ((type-u.s.base_class)); } else { diff --git
[PATCH v2 3/4] Handle simple inheritance in gengtype.
Treat GTY structs that have a desc as being the root of an inheritance hierarchy. Generate a switch on desc within the marking function with cases for each subclass, visiting all fields of the type (including inherited ones). Don't create marking functions for subclasses, instead using the base class marking functions. Use walk_type on them within walk_subclasses to generate the case within the switch for handling the tag, directly walking all fields of the type. * gengtype-parse.c (opts_have): Drop static so that we can use this from gengtype.c. * gengtype.c (set_gc_used_type): Mark any base class as used; update field traversal to visit inherited fields. (output_mangled_typename): Convert references to classes within an inheritance hierarchy to reference the ultimate base class, since only it will have gt_ functions. (get_string_option): New. (walk_subclasses): New. (walk_type): Treat GTY structs that have a desc as being the root of an inheritance hierarchy. Generate a switch on it within the marking function which walks all subclasses, adding cases for them via walk_subclasses. For subclasses, visit all fields of the type (including inherited ones). (write_func_for_structure): Don't write fns for subclasses, only for the ultimate base class within an inheritance hierarchy. Subclasses-marking will be handled by the base class marking functions. (write_types): Likewise. (write_local_func_for_structure): Likewise. (USED_BY_TYPED_GC_P): Emit allocators for subclasses that have a tag option (and are thus concrete subclasses). (write_root): Use the marker function for the ultimate base class. * gengtype.h (FOR_ALL_INHERITED_FIELDS): New. (opts_have): Add declaration. --- gcc/gengtype-parse.c | 2 +- gcc/gengtype.c | 102 --- gcc/gengtype.h | 8 3 files changed, 105 insertions(+), 7 deletions(-) diff --git a/gcc/gengtype-parse.c b/gcc/gengtype-parse.c index 31d493a..f480503 100644 --- a/gcc/gengtype-parse.c +++ b/gcc/gengtype-parse.c @@ -793,7 +793,7 @@ struct_field_seq (void) /* Return true if OPTS contain the option named STR. */ -static bool +bool opts_have (options_p opts, const char *str) { for (options_p opt = opts; opt; opt = opt-next) diff --git a/gcc/gengtype.c b/gcc/gengtype.c index a6dc221..4224a5e 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -1532,7 +1532,11 @@ set_gc_used_type (type_p t, enum gc_used_enum level, type_p param[NUM_PARAM], process_gc_options (t-u.s.opt, level, dummy, dummy, dummy, dummy, dummy2); - for (f = t-u.s.fields; f; f = f-next) + if (t-u.s.base_class) + set_gc_used_type (t-u.s.base_class, level, param, + allow_undefined_types); + + FOR_ALL_INHERITED_FIELDS(t, f) { int maybe_undef = 0; int pass_param = 0; @@ -2534,6 +2538,11 @@ output_mangled_typename (outf_p of, const_type_p t) case TYPE_LANG_STRUCT: case TYPE_USER_STRUCT: { + /* For references to classes within an inheritance hierarchy, +only ever reference the ultimate base class, since only +it will have gt_ functions. */ + while (t-u.s.base_class) + t = t-u.s.base_class; const char *id_for_tag = filter_type_name (t-u.s.tag); oprintf (of, %lu%s, (unsigned long) strlen (id_for_tag), id_for_tag); @@ -2596,6 +2605,44 @@ output_escaped_param (struct walk_type_data *d, const char *param, } } +const char * +get_string_option (options_p opt, const char *key) +{ + for (; opt; opt = opt-next) +if (strcmp (opt-name, key) == 0) + return opt-info.string; + return NULL; +} + +static void +walk_subclasses (type_p base, struct walk_type_data *d) +{ + for (type_p sub = structures; sub != NULL; sub = sub-next) +{ + if (sub-u.s.base_class == base) + { + const char *type_tag = get_string_option (sub-u.s.opt, tag); + if (type_tag) + { + oprintf (d-of, %*scase %s:\n, d-indent, , type_tag); + d-indent += 2; + oprintf (d-of, %*s{\n, d-indent, ); + d-indent += 2; + oprintf (d-of, %*s%s *sub = static_cast %s * (x);\n, + d-indent, , sub-u.s.tag, sub-u.s.tag); + const char *old_val = d-val; + d-val = (*sub); + walk_type (sub, d); + d-val = old_val; + d-indent -= 2; + oprintf (d-of, %*s}\n, d-indent, ); + oprintf (d-of, %*sbreak;\n, d-indent, ); + d-indent -= 2; + } + walk_subclasses (sub, d); + } +} +} /* Call D-PROCESS_FIELD for every field (or subfield)
Re: [PATCH] Set expr loc safely (PR c++/58516)
On Tue, Sep 24, 2013 at 12:00:41PM -0500, Paolo Carlini wrote: Hi, On 9/24/13 11:35 AM, Marek Polacek wrote: I admit I haven't spent much time on this, but it seems we should just check whether we can set the expr location before actually setting it... Regtested/bootstrapped on x86_64-linux, ok for trunk? Two minor nits (assuming the general idea makes sense, first blush it does). First, solicited by the ChangeLog entry too and given that non-null expression trees can always have locations, I wonder if we shouldn't check EXPR_P instead, it seems more direct, unless we really fear that body could be NULL_TREE. Well, seems build_must_not_throw_expr won't return NULL_TREE, and given: #define CAN_HAVE_LOCATION_P(NODE) ((NODE) EXPR_P (NODE)) I think EXPR_P should do ;). I would also phrase in a negative form the comment before the check, like This may not be true when... Fixed. Second, I'm pretty sure about this one, I think we are standardizing on c++11 for testcases, by now c++0x is essentially legacy. Sure, adjusted. Thanks. Regtested on x86_64-linux. 2013-09-24 Marek Polacek pola...@redhat.com PR c++/58516 cp/ * semantics.c (finish_transaction_stmt): Check for EXPR_P before setting the expr location. testsuite/ * g++.dg/tm/pr58516.C: New test. --- gcc/cp/semantics.c.mp 2013-09-24 17:24:59.907548551 +0200 +++ gcc/cp/semantics.c 2013-09-24 19:14:18.590639066 +0200 @@ -5199,7 +5199,9 @@ finish_transaction_stmt (tree stmt, tree { tree body = build_must_not_throw_expr (TRANSACTION_EXPR_BODY (stmt), noex); - SET_EXPR_LOCATION (body, EXPR_LOCATION (TRANSACTION_EXPR_BODY (stmt))); + /* This may not be true when the STATEMENT_LIST is empty. */ + if (EXPR_P (body)) +SET_EXPR_LOCATION (body, EXPR_LOCATION (TRANSACTION_EXPR_BODY (stmt))); TREE_SIDE_EFFECTS (body) = 1; TRANSACTION_EXPR_BODY (stmt) = body; } --- gcc/testsuite/g++.dg/tm/pr58516.C.mp2013-09-24 17:27:08.859055542 +0200 +++ gcc/testsuite/g++.dg/tm/pr58516.C 2013-09-24 19:13:36.958487511 +0200 @@ -0,0 +1,7 @@ +// { dg-do compile } +// { dg-options -std=c++11 -fgnu-tm } + +void foo() +{ + __transaction_atomic noexcept(false) {} +} Marek
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto bb 16; else goto bb 19; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. I see, we produce: ;; if (_26 != 0) (insn 94 93 95 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 107 [ D.2184 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) (eq:QI (reg:CCZ 17 flags) (const_int 0 [0]))) a.c:31 -1 (nil)) (insn 96 95 97 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 122 [ D.2186 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (jump_insn 97 96 98 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 100) (pc))) a.c:31 -1 (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) (nil))) (insn 98 97 99 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 108 [ D.2186 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (jump_insn 99 98 100 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 0) (pc))) a.c:31 -1 (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) (nil))) (code_label 100 99 0 14 [0 uses]) That is because we TER together _26 = _25 | _24 and if (_26 != 0) First I think the logic of do_jump should really be moved to trees. It is not doing things that can not be adequately represented by gimple. I am not that certain we want to move it before profiling though. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the Yep, but it is still better than what we have now - if the test above was in hot part of program (i.e. not executed once), we will end up optimizing the second conditional for size. So I think it is do_jump bug to not distribute probabilities across the two conditoinals introduced. bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. Yep, we can probably do that, but we ought to fix the individual cases above at least for resonable number of runs. Will you look into logic of do_jump or shall I try to dive in? Honza
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi, On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote: On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, with the attached patch the read-side of the out of bounds accesses are fixed. There is a single new test case pr57748-3.c that is derived from Martin's test case. The test case does only test the read access and does not depend on part 1 of the patch. This patch was boot-strapped and regression tested on x86_64-unknown-linux-gnu. Additionally I generated eCos and an eCos-application (on ARMv5 using packed structures) with an arm-eabi cross compiler, and looked for differences in the disassembled code with and without this patch, but there were none. OK for trunk? Index: gcc/expr.c === --- gcc/expr.c (revision 202778) +++ gcc/expr.c (working copy) @@ -9878,7 +9878,7 @@ modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, -modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); +EXPAND_MEMORY); /* If the bitfield is volatile, we want to access it in the field's mode, not the computed mode. context suggests that we may arrive with EXPAND_STACK_PARM here which is a correctness modifier (see its docs). But I'm not too familiar with the details of the various expand modifiers, Eric may be though. That said, I still believe that fixing the misalign path in expand_assignment would be better than trying to avoid it. For this testcase the issue is again that expand_assignment passes the wrong mode/target to the movmisalign optab. Perhaps I'm stating the obvious, but this is supposed to address a separate bug in the expansion of the RHS (as opposed to the first bug which was in the expansion of the LHS), and so we would have to make expand_assignment to also examine potential misalignments in the RHS, which it so far does not do, despite its complexity. Having said that, I also dislike the patch, but I am quite convinced that if we allow non-BLK structures with zero sized arrays, the fix will be ugly - but I'll be glad to be shown I've been wrong. Martin
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka hubi...@ucw.cz wrote: I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto bb 16; else goto bb 19; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. I see, we produce: ;; if (_26 != 0) (insn 94 93 95 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 107 [ D.2184 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) (eq:QI (reg:CCZ 17 flags) (const_int 0 [0]))) a.c:31 -1 (nil)) (insn 96 95 97 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 122 [ D.2186 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (jump_insn 97 96 98 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 100) (pc))) a.c:31 -1 (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) (nil))) (insn 98 97 99 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 108 [ D.2186 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (jump_insn 99 98 100 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 0) (pc))) a.c:31 -1 (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) (nil))) (code_label 100 99 0 14 [0 uses]) That is because we TER together _26 = _25 | _24 and if (_26 != 0) First I think the logic of do_jump should really be moved to trees. It is not doing things that can not be adequately represented by gimple. I am not that certain we want to move it before profiling though. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the Yep, but it is still better than what we have now - if the test above was in hot part of program (i.e. not executed once), we will end up optimizing the second conditional for size. So I think it is do_jump bug to not distribute probabilities across the two conditoinals introduced. bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. Yep, we can probably do that, but we ought to fix the individual cases above at least for resonable number of runs. I made this change and it removed a few of the failures. I looked at another case that still failed with 1 train run but passed with 100. It turned out to be another truncation issue exposed by RTL expansion, where we created some control flow for a memset builtin which was in a block with an execution count of 1. Some of the blocks got frequencies less than half the original block, so the count was rounded down or truncated to 0. I noticed that in this case (as well as the jump threading case I fixed by looking for non-zero incoming edges in partitioning) that the bb frequency was non-zero. Why not just have probably_never_executed_bb_p return simply return false bb-frequency is non-zero (right now it does the opposite - returns true when bb-frequency is 0)? Making this change removed a bunch of other failures. With this change as well, there are only 3 cases that still fail with 1 train run that pass with 100. Need to look at those. Will you look into logic of do_jump or shall I try to dive in? I can take a look, but probably won't have a chance until late this week. If you don't get to it before then I will
[wwwdocs] svnwrite.html -- extend documentation of gcc.gnu.org accounts
Caroline made me realize that we should improve our documentation around gcc.gnu.org accounts. This is a first step in that direction. (As a side effect, wrap the e-mail address in div/div to avoid undesired line breaks.) Applied. Gerald Index: svnwrite.html === RCS file: /cvs/gcc/wwwdocs/htdocs/svnwrite.html,v retrieving revision 1.27 diff -u -3 -p -r1.27 svnwrite.html --- svnwrite.html 18 Aug 2013 20:59:25 - 1.27 +++ svnwrite.html 24 Sep 2013 18:20:48 - @@ -411,9 +411,10 @@ like Merge from mainline or similar./ hr / h2a name=accountTipsamp;Tricks around your account/a/h2 -pIf you ever need to change the address e-mail to -iusername/i@gcc.gnu.org is forwarded to, you can easily do so -using/p +pYour gcc.gnu.org account also receives e-mail (and is what you +use for Bugzilla). If you ever need to change the address e-mail to +iusername/i@gcc.gnu.org is forwarded to, you can easily +do so using/p blockquotepre ssh iusername/i@gcc.gnu.org email mynewaddr...@example.com /pre/blockquote
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
Rong - can you answer the questions below on the comdat patch? On Tue, Sep 24, 2013 at 5:31 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi Honza, I am finally getting back to working on this after a few weeks of working on some other priorities. I am also trying to return to this, so good timming ;) Martin has got smaller C++ programs (Inkscape) to not touch cold segment during the startup with FDO (w/o partitioning). Firefox still does, I think the problem are lost samples due to different linker decisions even with LTO. (i.e. linker pick an object from .a libraryat profile-generate time that i never passes later.). I plan to look into that today. Did you mean to commit the above change? I see that it went in as part of r202258 but doesn't show up in the ChangeLog entry for that revision. Yes, I meant to check it in, but did not mean to do so w/o Changelog. I wil fix that. Should the same fix be applied to probably_never_executed_edge_p? In other cases it was mostly loop unrolling in combination with jump threading. So I modified my script to separately report when failure happens for test trained once and test trained hundred times. Thanks for the linker script. I reproduced your results. I looked at a couple cases. The first was one that failed after 1 training run only (2910-2.c). It was due to jump threading, which you noted was a problem. For this one I think we can handle it in the partitioning, since there is an FDO insanity that we could probably treat more conservatively when splitting. We should fix the roundoff issues - when I was introducing the frequency/probability/count system I made an assumptions that parts of programs with very low counts do not matter, since they are not part of hot spot (and I cared only about the hot spot). Now we care about identifying unlikely executed spots and we need to fix this. I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto bb 16; else goto bb 19; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2422-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2910-2.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20020413-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20030903-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060420-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060905-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-2.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120808-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c FAIL1
Re: [PATCH]: Fix use of __builtin_eh_pointer in EH_ELSE
On 09/03/2013 07:08 AM, Tristan Gingold wrote: Hi, The field state-ehp_region wasn't updated before lowering constructs in the eh path of EH_ELSE. As a consequence, __builtin_eh_pointer is lowered to 0 (or possibly to a wrong region number) in this path. The only user of EH_ELSE looks to be trans-mem.c:lower_transaction, and the consequence of that is a memory leak. Furthermore, according to calls.c:flags_from_decl_or_type, the transaction_pure attribute must be set on the function type, not on the function declaration. Hence the change to declare __builtin_eh_pointer. (I don't understand the guard condition to set the attribute for it in tree.c. Why is 'builtin_decl_explicit_p (BUILT_IN_TM_LOAD_1)' needed in addition to flag_tm ?) Clearly these are totally unrelated and should not be in the same patch. The BUILT_IN_TM_LOAD_1 thing looks like it might have something to do with non-C front ends, which don't all create the builtins. diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c index 6ffbd26..86ee77e 100644 --- a/gcc/tree-eh.c +++ b/gcc/tree-eh.c @@ -1093,8 +1093,12 @@ lower_try_finally_nofallthru (struct leh_state *state, if (tf-may_throw) { + eh_region prev_ehp_region = state-ehp_region; + finally = gimple_eh_else_e_body (eh_else); + state-ehp_region = tf-region; lower_eh_constructs_1 (state, finally); + state-ehp_region = prev_ehp_region; This doesn't look right. Does it work to save and restore ehp_region in the two places that we currently set it instead? r~
Re: [PATCH 0/3] Support some cases of inheritance in gengtype; use it for symtab
On Mon, 2013-09-23 at 13:19 +0200, Richard Biener wrote: On Fri, Sep 20, 2013 at 4:05 PM, David Malcolm dmalc...@redhat.com wrote: There have been various discussions about how to handle inheritance within ggc and PCH: whether to extend gengtype to support C++ syntax such as templates, or to require people to use GTY((user)) and manually write field-traversal. I've attempted to introduce class hierarchies in a few places recently, for example for the symtab types: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00668.html and the gimple types: http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01788.html In each case the GTY((user)) is always the most painful part: it requires ugly hand-written code which must be kept up-to-date with changes to the underlying types, lest we run into difficult-to-track-down crashes inside the GC and PCH. I got sick of writing and debugging these GTY((user)) traversal functions, for IMHO quite modest use of C++ (no templates etc), so I had a go at implementing support for inheritance in gengtype, as seen in the following patch series. The key compromise I'm introducing, as I see it, is to require the user to opt in to the support for the inheritance, on a class-by-class basis, and not attempt to support all of C++, but only single inheritance, without templates. Please make sure to update doc/gty.texi with this feature and its restrictions. Thanks; I've added documentation in the latest version of the patches: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01803.html The idea is that if you add a GTY marker to a subclass, you are opting in to gengtype, and on any restrictions on the base class that gengtype needs to impose to avoid having to deal with all of C++ syntax: specifically, no multiple inheritance, and the base class can't be a template. If those restrictions are too much, e.g. you have something like: struct alloc_pool_hasher : typed_noop_remove alloc_pool_descriptor then don't GTY-mark the class, and gengtype won't attempt to parse the base class (as per the current parser). Such restrictions are bad - does gengtype at least diagnose the case where a bad base class is used? I'm sorry if my initial email was unclear. The point I was trying to make is that we already have restrictions: we currently can't use any kind of inheritance with GTY, unless we use GTY((user)) and do things by hand. This patch makes things less restrictive, in that some forms of inheritance can now be used without needing to use GTY((user)). The example I picked is one that is already in the source tree, and which doesn't use GTY. I chose that particular example as an example of inheritance within the source tree, to illustrate that adding this parser support to gengtype doesn't break this existing code: it doesn't introduce a restriction to the use of C++ in the code as a whole. The restriction merely applies to the feature itself, and can easily be avoided (e.g. by not using GTY, or by using GTY((user))). To answer your question, in the example above, if one adds a GTY marker to alloc_pool_hasher: struct GTY(()) alloc_pool_hasher : typed_noop_remove alloc_pool_descriptor and adds alloc-pool.c to GTFILES so that it's present in gtyp-input.list and thus gets parsed by gengtype, then one gets this error: ../../src/gcc/alloc-pool.c:86: parse error: expected '{', have '' So yes: gengtype will fail immediately with an error message. Note that you can still either omit GTY, or use GTY((user)) for such cases, fixing the error message. (however, fwiw, it's meaningless to try to GTY-mark this particular class, given that it's a traits type, not holding any data itself). My initial worry about C++-ifying gengtype was exactly because of such arbitrary restrictions. Note that gengtype already has an arbitrary restriction: no inheritance, unless you use GTY((user)). This patch simply loosens the restriction somewhat, by allowing some inheritance. If code still falls within the looser restrictions, that code can continue to use GTY((user)), as before. Isn't it possible to remove the restruction by requiring to GTY annotate the subclassing itself? Like struct alloc_pool_hasher : typed_noop_remove alloc_pool_descriptor GTY((...)) with whatever information required to parse the class given explicitely as arguments of the GTY? Like for the single inheritance case with a descriptor you'd give out a tag to identify all types participating in the inheritance group FWIW it's meaningless for this particular class to be GTY-marked: as mentioned above, it's a bundle of traits for use elsewhere, and doesn't actually hold any data. But speaking to the general point about templates: I don't have a need to support automated generation of traversal functions for templates, we can use GTY((user)) for that if we need to. In contrast, I'm regularly running into a need for doing it for non-template base
Re: [PATCH] Refactor type handling in get_alias_set, fix PR58513
On Tue, Sep 24, 2013 at 4:00 AM, Richard Biener rguent...@suse.de wrote: As noted in PR58513 the alias type comparison in operand_equal_p for MEM_REF and TARGET_MEM_REF is overly restrictive. The following adds a new alias_ptr_types_compatible_p helper for a less conservative comparison and refactors get_alias_set and reference_alias_ptr_type to share code and behave in a more similar manner. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-09-24 Richard Biener rguent...@suse.de PR middle-end/58513 * tree.c (reference_alias_ptr_type): Move ... * alias.c (reference_alias_ptr_type): ... here and implement in terms of the new reference_alias_ptr_type_1. (ref_all_alias_ptr_type_p): New helper. (get_deref_alias_set_1): Drop flag_strict_aliasing here, use ref_all_alias_ptr_type_p. (get_deref_alias_set): Add flag_strict_aliasing check here. (reference_alias_ptr_type_1): New function, split out from ... (get_alias_set): ... here. (alias_ptr_types_compatible_p): New function. * alias.h (reference_alias_ptr_type): Declare. (alias_ptr_types_compatible_p): Likewise. * tree.h (reference_alias_ptr_type): Remove. * fold-const.c (operand_equal_p): Use alias_ptr_types_compatible_p to compare MEM_REF alias types. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58521 -- H.J.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
This is the updated patch2. Changed: 1. For cmp/test with rip-relative addressing mem operand, don't group insns. Bulldozer also doesn't support fusion for cmp/test with both displacement MEM and immediate operand, while m_CORE_ALL doesn't support fusion for cmp/test with MEM and immediate operand. I simplify choose to use the more stringent constraint here (m_CORE_ALL's constraint). 2. Add Budozer back and merge TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32. bootstrap and regression pass. ok for trunk? 2013-09-24 Wei Mi w...@google.com * gcc/config/i386/i386.c (rip_relative_addr_p): New Function. (ix86_macro_fusion_p): Ditto. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New Function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of BB to remain their location. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..4a04778 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24856,6 +24856,167 @@ ia32_multipass_dfa_lookahead (void) } } +/* Extracted from ix86_print_operand_address. Check whether ADDR is a + rip-relative address. */ + +static bool +rip_relative_addr_p (rtx addr) +{ + struct ix86_address parts; + rtx base, index, disp; + int ok; + + if (GET_CODE (addr) == UNSPEC XINT (addr, 1) == UNSPEC_VSIBADDR) +{ + ok = ix86_decompose_address (XVECEXP (addr, 0, 0), parts); + parts.index = XVECEXP (addr, 0, 1); +} + else if (GET_CODE (addr) == UNSPEC XINT (addr, 1) == UNSPEC_LEA_ADDR) +ok = ix86_decompose_address (XVECEXP (addr, 0, 0), parts); + else +ok = ix86_decompose_address (addr, parts); + + gcc_assert (ok); + base = parts.base; + index = parts.index; + disp = parts.disp; + + if (TARGET_64BIT !base !index) +{ + rtx symbol = disp; + + if (GET_CODE (disp) == CONST + GET_CODE (XEXP (disp, 0)) == PLUS + CONST_INT_P (XEXP (XEXP (disp, 0), 1))) + symbol = XEXP (XEXP (disp, 0), 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + SYMBOL_REF_TLS_MODEL (symbol) == 0)) + return true; +} + if (flag_pic !base !index) +{ + if (GET_CODE (disp) == CONST + GET_CODE (XEXP (disp, 0)) == UNSPEC + (XINT (XEXP (disp, 0), 1) == UNSPEC_PCREL + || XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL + || (TARGET_64BIT + XINT (XEXP (disp, 0), 1) == UNSPEC_GOTNTPOFF))) + return true; +} + return false; +} + +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair CONDGEN + CONDJMP. Refer to + Intel Architectures Optimization Reference Manual. */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src, dest; + rtx single_set = single_set (condgen); + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx alu_set = NULL_RTX, addr = NULL_RTX; + + if (get_attr_type (condgen) != TYPE_TEST + get_attr_type (condgen) != TYPE_ICMP + get_attr_type (condgen) != TYPE_INCDEC + get_attr_type (condgen) != TYPE_ALU) +return false; + + if (single_set == NULL_RTX + !TARGET_FUSE_ALU_AND_BRANCH) +return false; + + if (single_set != NULL_RTX) +compare_set = single_set; + else +{ + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET) + { + rtx set_src = SET_SRC (XVECEXP (pat, 0, i)); + if (GET_CODE (set_src) == COMPARE) + compare_set = XVECEXP (pat, 0, i); + else + alu_set = XVECEXP (pat, 0, i); + } +} + if (compare_set == NULL_RTX) +return false; + src = SET_SRC (compare_set); + if (GET_CODE (src) != COMPARE) +return false; + + /* Macro-fusion for cmp/test MEM-IMM + conditional jmp is not + supported. */ + if ((MEM_P (XEXP (src, 0)) +CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + CONST_INT_P (XEXP (src, 0 +return false; + + /* No fusion for RIP-relative address. */ + if (MEM_P (XEXP (src, 0))) +addr = XEXP (XEXP (src, 0), 0); + else if (MEM_P (XEXP (src, 1))) +addr = XEXP (XEXP (src, 1), 0); + if (addr
Re: RFA: Store the REG_BR_PROB probability directly as an int
Richard Sandiford rdsandif...@googlemail.com writes: Sorry for the breakage. I think we need to handle INT_LIST in the same way as INSN_LIST though, and eliminate in XEXP (x, 1). How about the attached? Testing in progress... Works for me as well. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
[gomp4] Taskgroup and cancellation compiler fixes
Hi! This patch: 1) defers expansion of taskgroup into GOMP_taskgroup_start and GOMP_taskgroup_end until omplower/ompexp, mainly so that e.g. invalid nesting can be diagnosed (e.g. #pragma omp cancel * inside of #pragma omp taskgroup nested in some other construct) 2) diagnoses structured block restrictions also for #pragma omp target{,data} and #pragma omp teams and taskgroup 3) fixes a bug, where cancellation of sections, for and parallel/task jumped to after the dtors rather than before them, because then some variables won't be properly destructed Will commit tomorrow to gomp-4_0-branch unless somebody finds issues with this. 2013-09-24 Jakub Jelinek ja...@redhat.com * omp-low.c (lower_omp_sections, lower_omp_for, lower_omp_taskreg): Emit ctx-cancel_label before destructors. * gimple-pretty-print.c (dump_gimple_omp_block, pp_gimple_stmt_1): Handle GIMPLE_OMP_TASKGROUP. * tree-nested.c (convert_nonlocal_reference_stmt, convert_local_reference_stmt, convert_gimple_call): Likewise. * tree-cfg.c (make_edges): Likewise. * gimple.h (gimple_build_omp_taskgroup): New prototype. (gimple_has_substatement): Handle GIMPLE_OMP_TASKGROUP. (CASE_GIMPLE_OMP): Likewise. * gimplify.c (is_gimple_stmt, gimplify_expr): Handle OMP_TASKGROUP. * omp-low.c (check_omp_nesting_restrictions): Warn if #pragma omp cancel is used in nowait loop or sections construct. (scan_omp_1_stmt, expand_omp_synch, expand_omp, lower_omp_1): Handle GIMPLE_OMP_TASKGROUP. (diagnose_sb_1, diagnose_sb_2): Likewise. Handle GIMPLE_OMP_TARGET and GIMPLE_OMP_TEAMS. (lower_omp_taskgroup): New function. * tree-inline.c (remap_gimple_stmt, estimate_num_insns): Handle GIMPLE_OMP_TASKGROUP. * gimple-low.c (lower_stmt): Likewise. * tree.h (OMP_TASKGROUP_BODY): Define. * tree.def (OMP_TASKGROUP): New tree. * tree-pretty-print.c (dump_generic_node): Handle OMP_TASKGROUP. * gimple.c (gimple_build_omp_taskgroup): New function. (walk_gimple_stmt, gimple_copy): Handle GIMPLE_OMP_TASKGROUP. * gimple.def (GIMPLE_OMP_TASKGROUP): New GIMPLE code. c-family/ * c-common.h (c_finish_omp_taskgroup): New prototype. * c-omp.c (c_finish_omp_taskgroup): New function. c/ * c-parser.c (c_parser_omp_taskgroup): Return tree. Don't call c_begin_omp_taskgroup. (c_parser_omp_construct): Adjust caller. * c-typeck.c (c_begin_omp_taskgroup, c_finish_omp_taskgroup): Remove. * c-tree.h (c_begin_omp_taskgroup, c_finish_omp_taskgroup): Remove. cp/ * parser.c (cp_parser_omp_taskgroup): Return tree. Use c_finish_omp_taskgroup. (cp_parser_omp_construct): Adjust caller. * cp-array-notation.c (expand_array_notation_exprs): Handle OMP_TASKGROUP. * pt.c (tsubst_expr): Handle OMP_TASKGROUP. * semantics.c (finish_omp_taskgroup): Remove. * cp-tree.h (finish_omp_taskgroup): Remove. testsuite/ * g++.dg/gomp/target-1.C: New test. * g++.dg/gomp/target-2.C: New test. * g++.dg/gomp/teams-1.C: New test. * g++.dg/gomp/taskgroup-1.C: New test. * gcc.dg/gomp/teams-1.c: New test. * gcc.dg/gomp/taskgroup-1.c: New test. * gcc.dg/gomp/target-1.c: New test. * gcc.dg/gomp/target-2.c: New test. * c-c++-common/gomp/cancel-1.c: New test. --- gcc/gimple-pretty-print.c.jj2013-09-13 16:48:21.0 +0200 +++ gcc/gimple-pretty-print.c 2013-09-23 15:06:23.857832390 +0200 @@ -1360,8 +1360,8 @@ dump_gimple_omp_sections (pretty_printer } } -/* Dump a GIMPLE_OMP_{MASTER,ORDERED,SECTION} tuple on the pretty_printer - BUFFER. */ +/* Dump a GIMPLE_OMP_{MASTER,TASKGROUP,ORDERED,SECTION} tuple on the + pretty_printer BUFFER. */ static void dump_gimple_omp_block (pretty_printer *buffer, gimple gs, int spc, int flags) @@ -1376,6 +1376,9 @@ dump_gimple_omp_block (pretty_printer *b case GIMPLE_OMP_MASTER: pp_string (buffer, #pragma omp master); break; + case GIMPLE_OMP_TASKGROUP: + pp_string (buffer, #pragma omp taskgroup); + break; case GIMPLE_OMP_ORDERED: pp_string (buffer, #pragma omp ordered); break; @@ -2131,6 +2134,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer break; case GIMPLE_OMP_MASTER: +case GIMPLE_OMP_TASKGROUP: case GIMPLE_OMP_ORDERED: case GIMPLE_OMP_SECTION: dump_gimple_omp_block (buffer, gs, spc, flags); --- gcc/tree-nested.c.jj2013-09-13 16:49:05.0 +0200 +++ gcc/tree-nested.c 2013-09-23 15:06:23.857832390 +0200 @@ -1309,6 +1309,7 @@ convert_nonlocal_reference_stmt (gimple_ case GIMPLE_OMP_SECTION: case GIMPLE_OMP_MASTER: +case GIMPLE_OMP_TASKGROUP: case GIMPLE_OMP_ORDERED: walk_body
[gomp4] Taskgroup library support
Hi! This implements taskgroups in the library and their cancellation. The implementation has been pretty straightforward, though I had to consolidate some operations from {gomp_barrier_handle_tasks, GOMP_taskwait} and the new GOMP_taskgroup_end to new inlines, because it became non-maintainable. In addition to this, the patch disallows plain discarding of tasks for which we've already run the copy constructors, those will be executed and will be cancelled only if they encounter a cancellation point. There are omp-lang discussions about whether the standard shouldn't be changed, so that the copy ctors would be run only in task outlined body, not earlier. And, lastly, this patch adds various extra cancellation testcase that revealed the omp-low.c issue fixed in the previous patch. Will commit tomorrow unless somebody complains. 2013-09-24 Jakub Jelinek ja...@redhat.com * parallel.c (GOMP_cancellation_point, GOMP_cancel): Handle GIMPLE_CANCEL_TASKGROUP cancellation. * libgomp.h (struct gomp_task): Add next_taskgroup, prev_taskgroup, taskgroup and copy_ctors_done fields. (struct gomp_taskgroup): New type. * task.c (gomp_init_task): Initialize copy_ctors_done and taskgroup fields. (GOMP_task): Don't start a new thread also if it's taskgroup has been cancelled. Set copy_ctors_done field if needed. Initialize taskgroup field. If copy_ctors_done and already cancelled, don't discard the task. If taskgroup is non-NULL, enqueue the task into taskgroup queue. (gomp_task_run_pre, gomp_task_run_post_remove_parent, gomp_task_run_post_remove_taskgroup): New inline functions. (gomp_barrier_handle_tasks, GOMP_taskwait): Use them. (GOMP_taskgroup_start, GOMP_taskgroup_end): Implement taskgroup support. * testsuite/libgomp.c++/cancel-parallel-1.C: New test. * testsuite/libgomp.c++/cancel-parallel-2.C: New test. * testsuite/libgomp.c++/cancel-parallel-3.C: New test. * testsuite/libgomp.c++/cancel-for-1.C: New test. * testsuite/libgomp.c++/cancel-for-1.C: New test. * testsuite/libgomp.c++/cancel-taskgroup-1.C: New test. * testsuite/libgomp.c++/cancel-taskgroup-2.C: New test. * testsuite/libgomp.c++/cancel-taskgroup-3.C: New test. * testsuite/libgomp.c++/cancel-test.h: New file. * testsuite/libgomp.c++/cancel-sections-1.C: New test. * testsuite/libgomp.c++/taskgroup-1.C: New test. * testsuite/libgomp.c/cancel-taskgroup-1.c: New test. * testsuite/libgomp.c/cancel-taskgroup-2.c: New test. * testsuite/libgomp.c/taskgroup-1.c: New test. * testsuite/libgomp.c/cancel-parallel-3.c (do_some_work): Use void return type. --- libgomp/parallel.c.jj 2013-09-24 12:52:53.271887599 +0200 +++ libgomp/parallel.c 2013-09-24 13:10:29.345564211 +0200 @@ -147,7 +147,8 @@ GOMP_cancellation_point (int which) if (!gomp_cancel_var) return false; - struct gomp_team *team = gomp_thread ()-ts.team; + struct gomp_thread *thr = gomp_thread (); + struct gomp_team *team = thr-ts.team; if (which (GOMP_CANCEL_LOOP | GOMP_CANCEL_SECTIONS)) { if (team == NULL) @@ -156,10 +157,11 @@ GOMP_cancellation_point (int which) } else if (which GOMP_CANCEL_TASKGROUP) { - /* FIXME: Check if current taskgroup has been cancelled, -then fallthru into the GOMP_CANCEL_PARALLEL case, -because if the current parallel has been cancelled, -all tasks should be cancelled too. */ + if (thr-task-taskgroup thr-task-taskgroup-cancelled) + return true; + /* FALLTHRU into the GOMP_CANCEL_PARALLEL case, +as #pragma omp cancel parallel also cancels all explicit +tasks. */ } if (team) return gomp_team_barrier_cancelled (team-barrier); @@ -176,7 +178,8 @@ GOMP_cancel (int which, bool do_cancel) if (!do_cancel) return ialias_call (GOMP_cancellation_point) (which); - struct gomp_team *team = gomp_thread ()-ts.team; + struct gomp_thread *thr = gomp_thread (); + struct gomp_team *team = thr-ts.team; if (which (GOMP_CANCEL_LOOP | GOMP_CANCEL_SECTIONS)) { /* In orphaned worksharing region, all we want to cancel @@ -187,7 +190,12 @@ GOMP_cancel (int which, bool do_cancel) } else if (which GOMP_CANCEL_TASKGROUP) { - /* FIXME: Handle taskgroup cancellation. */ + if (thr-task-taskgroup !thr-task-taskgroup-cancelled) + { + gomp_mutex_lock (team-task_lock); + thr-task-taskgroup-cancelled = true; + gomp_mutex_unlock (team-task_lock); + } return true; } team-team_cancelled = 1; --- libgomp/libgomp.h.jj2013-09-24 12:52:53.274887599 +0200 +++ libgomp/libgomp.h 2013-09-24 13:10:29.344564253 +0200 @@ -253,6 +253,8 @@ enum gomp_task_kind GOMP_TASK_TIED }; +struct gomp_taskgroup; + /* This
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
On Tue, Sep 24, 2013 at 5:03 PM, Eric Botcazou wrote: This patch removes REG_DEAD and REG_UNUSED notes in update_inc_notes, as it is what the function is supposed to do (see the comments) and as keeping these notes produce some failures, at least on ARM. The description is too terse. In the RTL middle-end, you shouldn't have to manually deal with the REG_DEAD and REG_UNUSED notes (unlike REG_EQUAL and REG_EQUIV notes), as the DF framework is supposed to do it for you. Unfortunately LRA uses its own DF framework. Ciao! Steven
[PATCH, PR 58441] Fix location where libvtv puts its headers
The following patch updates where libvtv installs its header files (fixing PR 58441). This is a trivial change, and affects only libvtv, so I am going to go ahead and commit it. -- Caroline Tice cmt...@google.com 2013-09-24 Caroline Tice cmt...@google.com * Makefile.am: Change libvtv_includedir to the directory used by the other libraries rather than the top include directory. * Makefile.in: Regenerated. libvtv-headers.patch Description: Binary data
Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.
On Sep 24, 2013, at 12:23 PM, Steven Bosscher stevenb@gmail.com wrote: On Tue, Sep 24, 2013 at 5:03 PM, Eric Botcazou wrote: This patch removes REG_DEAD and REG_UNUSED notes DF framework is supposed to do it for you. Unfortunately LRA uses its own DF framework. Is that a bug, or a feature?
Re: [PATCH] Bug fix: *var and MEM[(const int *)var] (var has int* type) are not treated as the same data ref.
Nice fix! I noticed that this patch is already combined to the trunk. Thank you very much, Richard! Cong On Tue, Sep 24, 2013 at 1:49 AM, Richard Biener rguent...@suse.de wrote: On Tue, 24 Sep 2013, Richard Biener wrote: On Tue, 24 Sep 2013, Jakub Jelinek wrote: Hi! On Mon, Sep 23, 2013 at 05:26:13PM -0700, Cong Hou wrote: Missing ChangeLog entry. --- gcc/testsuite/gcc.dg/alias-14.c (revision 0) +++ gcc/testsuite/gcc.dg/alias-14.c (revision 0) Vectorizer tests should go into gcc.dg/vect/ instead, or, if they are for a single target (but there is no reason why this should be a single target), into gcc.target/target/. --- gcc/fold-const.c (revision 202662) +++ gcc/fold-const.c (working copy) @@ -2693,8 +2693,9 @@ operand_equal_p (const_tree arg0, const_ operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) - (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) - == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 + types_compatible_p ( + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))), + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); This looks wrong. types_compatible_p will happily return true say for unsigned long and unsigned long long types on x86_64, because they are both integral types with the same precision, but the second argument of MEM_REF contains aliasing information, where the distinction between the two is important. So, while == comparison of main variant is too strict, types_compatible_p is too weak, so I guess you need to write a new predicate that will either handle the == and a few special cases that are safe to be handled, or look for what exactly we use the type of the second MEM_REF argument and check those properties. We certainly need that get_deref_alias_set_1 and get_deref_alias_set return the same values for both the types, but whether that is the only information we are using, not sure, CCing Richard. Using TYPE_MAIN_VARIANT is exactly correct - this is the best we can do that will work with all frontends. TYPE_MAIN_VARIANT guarantees that the alias-sets stay the same: /* If the innermost reference is a MEM_REF that has a conversion embedded treat it like a VIEW_CONVERT_EXPR above, using the memory access type for determining the alias-set. */ if (TREE_CODE (inner) == MEM_REF TYPE_MAIN_VARIANT (TREE_TYPE (inner)) != TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1) return get_deref_alias_set (TREE_OPERAND (inner, 1)); so we cannot change the compatibility checks without touching the alias-set deriving code. For the testcase in question we have MEM[(const int )_7] vs. MEM[(int *)_7] and unfortunately pointer and reference types are not variant types. We also cannot easily resort to pointed-to types as our all-beloved ref-all qualification is on the pointer type rather than on the pointed-to type. But yes, we could implement a more complicated predicate bool alias_ptr_types_compatible_p (const_tree t1, const_tree t2) { t1 = TYPE_MAIN_VARIANT (t1); t2 = TYPE_MAIN_VARIANT (t2); if (t1 == t2) return true; if (TYPE_REF_CAN_ALIAS_ALL (t1) || TYPE_REF_CAN_ALIAS_ALL (t2)) return false; return (TYPE_MAIN_VARIANT (TREE_TYPE (t1)) == TYPE_MAIN_VARIANT (TREE_TYPE (t2))); } Note that the fold-const code in question is return ((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1)) || (TYPE_SIZE (TREE_TYPE (arg0)) TYPE_SIZE (TREE_TYPE (arg1)) operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), TYPE_SIZE (TREE_TYPE (arg1)), flags))) types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1 OP_SAME (0) OP_SAME (1)); which you may notice uses types_compatible_p on the reference type which is at least suspicious as that can let through reference trees that will end up using different alias sets due to the stricter check in get_alias_set. So in addition to alias_ptr_types_compatible_p we may want to have bool reference_alias_ptr_types_compatible_p (const_tree t1, const_tree t2) { ... } abstracting this away for the actual reference trees (also noting that reference_alias_ptr_type isn't 1:1 following what get_alias_set does either). I will give this a try. The following is an untested (well, the testcase from PR58513 is now vectorized) patch doing that refactoring. Richard. Index: gcc/tree.c
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3
This patch adds the initial support for putting DI, DF, and SF values in the upper registers (traditional Altivec registers) using the -mupper-regs-df and -mupper-regs-sf patches. Those switches will not be enabled by default until the rest of the changes are made. This patch passes the bootstrap test and make check test. I tested all of the targets I tested previously (power4-8, G4/G5, SPE, cell, e5500/e5600, and paired floating point), and all machines generate the same code. Is it ok to install this patch? [gcc] 2013-09-24 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Allow DFmode, DImode, and SFmode in the upper VSX registers based on the -mupper-regs-{df,sf} flags. Fix wu constraint to be ALTIVEC_REGS if -mpower8-vector. Combine -mvsx-timode handling with the rest of the VSX register handling. * config/rs6000/rs6000.md (f32_lv): Use %x0 for VSX regsters. (f32_sv): Likewise. (zero_extendsidi2_lfiwzx): Add support for loading into the Altivec registers with -mpower8-vector. Use wu/wv constraints to only do VSX memory options on Altivec registers. (extendsidi2_lfiwax): Likewise. (extendsfdf2_fpr): Likewise. (movmode_hardfloat, SF/SD modes): Likewise. (movmode_hardfloat32, DF/DD modes): Likewise. (movmode_hardfloat64, DF/DD modes): Likewise. (movdi_internal64): Likewise. [gcc/testsuite] 2013-09-24 Michael Meissner meiss...@linux.vnet.ibm.com * gcc.target/powerpc/p8vector-ldst.c: New test for -mupper-regs-sf and -mupper-regs-df. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 202855) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1626,19 +1626,28 @@ rs6000_hard_regno_mode_ok (int regno, en /* VSX registers that overlap the FPR registers are larger than for non-VSX implementations. Don't allow an item to be split between a FP register - and an Altivec register. */ - if (VECTOR_MEM_VSX_P (mode)) + and an Altivec register. Allow TImode in all VSX registers if the user + asked for it. */ + if (TARGET_VSX VSX_REGNO_P (regno) + (VECTOR_MEM_VSX_P (mode) + || (TARGET_VSX_SCALAR_FLOAT mode == SFmode) + || (TARGET_VSX_SCALAR_DOUBLE (mode == DFmode || mode == DImode)) + || (TARGET_VSX_TIMODE mode == TImode))) { if (FP_REGNO_P (regno)) return FP_REGNO_P (last_regno); if (ALTIVEC_REGNO_P (regno)) - return ALTIVEC_REGNO_P (last_regno); -} + { + if (mode == SFmode !TARGET_UPPER_REGS_SF) + return 0; - /* Allow TImode in all VSX registers if the user asked for it. */ - if (mode == TImode TARGET_VSX_TIMODE VSX_REGNO_P (regno)) -return 1; + if ((mode == DFmode || mode == DImode) !TARGET_UPPER_REGS_DF) + return 0; + + return ALTIVEC_REGNO_P (last_regno); + } +} /* The GPRs can hold any mode, but values bigger than one register cannot go past R31. */ @@ -2413,7 +2422,7 @@ rs6000_init_hard_regno_mode_ok (bool glo if (TARGET_P8_VECTOR) { - rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS; + rs6000_constraints[RS6000_CONSTRAINT_wu] = ALTIVEC_REGS; rs6000_constraints[RS6000_CONSTRAINT_wy] = rs6000_constraints[RS6000_CONSTRAINT_ww] = (TARGET_UPPER_REGS_SF) ? VSX_REGS : FLOAT_REGS; Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 202846) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -314,13 +314,13 @@ (define_mode_attr real_value_to_target [ (define_mode_attr f32_lr [(SF f) (SD wz)]) (define_mode_attr f32_lm [(SF m) (SD Z)]) (define_mode_attr f32_li [(SF lfs%U1%X1 %0,%1) (SD lfiwzx %0,%y1)]) -(define_mode_attr f32_lv [(SF lxsspx %0,%y1) (SD lxsiwzx %0,%y1)]) +(define_mode_attr f32_lv [(SF lxsspx %x0,%y1) (SD lxsiwzx %x0,%y1)]) ; Definitions for store from 32-bit fpr register (define_mode_attr f32_sr [(SF f) (SD wx)]) (define_mode_attr f32_sm [(SF m) (SD Z)]) (define_mode_attr f32_si [(SF stfs%U0%X0 %1,%0) (SD stfiwx %1,%y0)]) -(define_mode_attr f32_sv [(SF stxsspx %1,%y0) (SD stxsiwzx %1,%y0)]) +(define_mode_attr f32_sv [(SF stxsspx %x1,%y0) (SD stxsiwzx %x1,%y0)]) ; Definitions for 32-bit fpr direct move (define_mode_attr f32_dm [(SF wn) (SD wm)]) @@ -541,7 +541,7 @@ (define_split ) (define_insn *zero_extendsidi2_lfiwzx - [(set (match_operand:DI 0 gpc_reg_operand =r,r,??wm,!wz,!wm) + [(set (match_operand:DI 0 gpc_reg_operand =r,r,??wm,!wz,!wu)
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Tue, Sep 24, 2013 at 12:06 PM, Wei Mi w...@google.com wrote: This is the updated patch2. Changed: 1. For cmp/test with rip-relative addressing mem operand, don't group insns. Bulldozer also doesn't support fusion for cmp/test with both displacement MEM and immediate operand, while m_CORE_ALL doesn't support fusion for cmp/test with MEM and immediate operand. I simplify choose to use the more stringent constraint here (m_CORE_ALL's constraint). 2. Add Budozer back and merge TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32. bootstrap and regression pass. ok for trunk? 2013-09-24 Wei Mi w...@google.com * gcc/config/i386/i386.c (rip_relative_addr_p): New Function. (ix86_macro_fusion_p): Ditto. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New Function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of BB to remain their location. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..4a04778 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24856,6 +24856,167 @@ ia32_multipass_dfa_lookahead (void) } } +/* Extracted from ix86_print_operand_address. Check whether ADDR is a + rip-relative address. */ + +static bool +rip_relative_addr_p (rtx addr) +{ + struct ix86_address parts; + rtx base, index, disp; + int ok; + + if (GET_CODE (addr) == UNSPEC XINT (addr, 1) == UNSPEC_VSIBADDR) +{ + ok = ix86_decompose_address (XVECEXP (addr, 0, 0), parts); + parts.index = XVECEXP (addr, 0, 1); +} + else if (GET_CODE (addr) == UNSPEC XINT (addr, 1) == UNSPEC_LEA_ADDR) +ok = ix86_decompose_address (XVECEXP (addr, 0, 0), parts); + else +ok = ix86_decompose_address (addr, parts); + + gcc_assert (ok); + base = parts.base; + index = parts.index; + disp = parts.disp; + + if (TARGET_64BIT !base !index) +{ + rtx symbol = disp; + + if (GET_CODE (disp) == CONST + GET_CODE (XEXP (disp, 0)) == PLUS + CONST_INT_P (XEXP (XEXP (disp, 0), 1))) + symbol = XEXP (XEXP (disp, 0), 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + SYMBOL_REF_TLS_MODEL (symbol) == 0)) + return true; +} + if (flag_pic !base !index) +{ + if (GET_CODE (disp) == CONST + GET_CODE (XEXP (disp, 0)) == UNSPEC + (XINT (XEXP (disp, 0), 1) == UNSPEC_PCREL + || XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL + || (TARGET_64BIT + XINT (XEXP (disp, 0), 1) == UNSPEC_GOTNTPOFF))) + return true; +} + return false; +} + It doesn't look right. IP relative address is only possible with TARGET_64BIT and 1. base == pc. Or 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and NSPEC_GOTNTPOFF. -- H.J.
Re: [PATCH] Set expr loc safely (PR c++/58516)
OK. Jason
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
+ gcc_assert (ok); + base = parts.base; + index = parts.index; + disp = parts.disp; + + if (TARGET_64BIT !base !index) +{ + rtx symbol = disp; + + if (GET_CODE (disp) == CONST + GET_CODE (XEXP (disp, 0)) == PLUS + CONST_INT_P (XEXP (XEXP (disp, 0), 1))) + symbol = XEXP (XEXP (disp, 0), 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + SYMBOL_REF_TLS_MODEL (symbol) == 0)) + return true; +} + if (flag_pic !base !index) +{ + if (GET_CODE (disp) == CONST + GET_CODE (XEXP (disp, 0)) == UNSPEC + (XINT (XEXP (disp, 0), 1) == UNSPEC_PCREL + || XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL + || (TARGET_64BIT + XINT (XEXP (disp, 0), 1) == UNSPEC_GOTNTPOFF))) + return true; +} + return false; +} + It doesn't look right. IP relative address is only possible with TARGET_64BIT and 1. base == pc. Or 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and NSPEC_GOTNTPOFF. Target 64bit should be tested above. We however output RIP addresses also for basic symbol references. I.e. when base is an symbol addresss. such as in: int a; int t() { return a; } memory_address_length already contains logic to figure out if there is IP relative addressing going on (I am not sure it is completely accurate either). Better to break it out to a common predicate and perhaps unify with what ix86_print_operand_address is doing. Honza -- H.J.