Re: [cxx-conversion] Support garbage-collected C++ templates
Diego - It's all good changes and your plan for future improvements sounds good, including the part where gengtype is killed with fire. - Functions should be emitted in files that have access to the structure where they were defined. I'm not convinced that the current multiplicity of gt-*.[ch] files is even necessary. However, I would like the guidance of a gengtype maintainer. I don't think I fully understand all of it. Yes, I remember looking into splitting the output a few years ago. It should be possible to split gtype-desc.h into header files to be included in source header files defining the relevant types. I.e. tree.h includes a generated gt-tree.h that provides allocator definitions for the tree.h types. gtype-desc.h then would be left with the master enum of all GTY-handled types. It should be also possible to split gtype-desc.c into already-existing gt-foo.h too, although the benefit of doing that is not as big I think. I've tested the patch on x86_64 with the page and zone collectors and with --enable-checking=gc,gcac (boy was that a slow mistake). Might be also interesting to try valgrind. Good to hear the zone collector hasn't bitrotten once again. * doc/gty.texi: Document support for C++ templates and user-provided markers. The 1st node in this doc file needs s/C/C++/g and perhaps some more explanation with an eye on C++. -- Laurynas
Re: Value type of map need not be default copyable
On Wed, 8 Aug 2012, François Dumont wrote: On 08/08/2012 03:39 PM, Paolo Carlini wrote: On 08/08/2012 03:15 PM, François Dumont wrote: I have also introduce a special std::pair constructor for container usage so that we do not have to include the whole tuple stuff just for associative container implementations. To be clear: sorry, this is not an option. Paolo. Then I can only imagine the attached patch which require to include tuple when including unordered_map or unordered_set. The std::pair(piecewise_construct_t, tuple, tuple) is the only constructor that allow to build a pair using the default constructor for the second member. I agree that the extra constructor would be convenient (I probably would have gone with pair(T,__default_construct_t), the symmetric version, and enough extra constructors to resolve all ambiguities). Maybe LWG would consider doing something. + __p = __h-_M_allocate_node(std::piecewise_construct, + std::make_tuple(__k), + std::make_tuple()); Don't you want cref(__k)? It might save a move at some point. -- Marc Glisse
Re: [PATCH] Strength reduction part 3 of 4: candidates with unknown strides
On Wed, 8 Aug 2012, H.J. Lu wrote: On Wed, Aug 1, 2012 at 10:36 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Greetings, Thanks for the review of part 2! Here's another chunk of the SLSR code (I feel I owe you a few beers at this point). This performs analysis and replacement on groups of related candidates having an SSA name (rather than a constant) for a stride. This leaves only the conditional increment (CAND_PHI) case, which will be handled in the last patch of the series. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Ok for trunk? Thanks, Bill gcc: 2012-08-01 Bill Schmidt wschm...@linux.ibm.com * gimple-ssa-strength-reduction.c (struct incr_info_d): New struct. (incr_vec): New static var. (incr_vec_len): Likewise. (address_arithmetic_p): Likewise. (stmt_cost): Remove dead assignment. (dump_incr_vec): New function. (cand_abs_increment): Likewise. (lazy_create_slsr_reg): Likewise. (incr_vec_index): Likewise. (count_candidates): Likewise. (record_increment): Likewise. (record_increments): Likewise. (unreplaced_cand_in_tree): Likewise. (optimize_cands_for_speed_p): Likewise. (lowest_cost_path): Likewise. (total_savings): Likewise. (analyze_increments): Likewise. (ncd_for_two_cands): Likewise. (nearest_common_dominator_for_cands): Likewise. (profitable_increment_p): Likewise. (insert_initializers): Likewise. (introduce_cast_before_cand): Likewise. (replace_rhs_if_not_dup): Likewise. (replace_one_candidate): Likewise. (replace_profitable_candidates): Likewise. (analyze_candidates_and_replace): Handle candidates with SSA-name strides. gcc/testsuite: 2012-08-01 Bill Schmidt wschm...@linux.ibm.com * gcc.dg/tree-ssa/slsr-5.c: New. * gcc.dg/tree-ssa/slsr-6.c: New. * gcc.dg/tree-ssa/slsr-7.c: New. * gcc.dg/tree-ssa/slsr-8.c: New. * gcc.dg/tree-ssa/slsr-9.c: New. * gcc.dg/tree-ssa/slsr-10.c: New. * gcc.dg/tree-ssa/slsr-11.c: New. * gcc.dg/tree-ssa/slsr-12.c: New. * gcc.dg/tree-ssa/slsr-13.c: New. * gcc.dg/tree-ssa/slsr-14.c: New. * gcc.dg/tree-ssa/slsr-15.c: New. * gcc.dg/tree-ssa/slsr-16.c: New. * gcc.dg/tree-ssa/slsr-17.c: New. * gcc.dg/tree-ssa/slsr-18.c: New. * gcc.dg/tree-ssa/slsr-19.c: New. * gcc.dg/tree-ssa/slsr-20.c: New. * gcc.dg/tree-ssa/slsr-21.c: New. * gcc.dg/tree-ssa/slsr-22.c: New. * gcc.dg/tree-ssa/slsr-23.c: New. * gcc.dg/tree-ssa/slsr-24.c: New. * gcc.dg/tree-ssa/slsr-25.c: New. * gcc.dg/tree-ssa/slsr-26.c: New. * gcc.dg/tree-ssa/slsr-30.c: New. * gcc.dg/tree-ssa/slsr-31.c: New. == --- gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c (revision 0) @@ -0,0 +1,25 @@ +/* Verify straight-line strength reduction fails for simple integer addition + with casts thrown in when -fwrapv is used. */ + +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-tree-dom2 -fwrapv } */ +/* { dg-skip-if { ilp32 } { -m32 } { } } */ + This doesn't work on x32 nor Linux/ia32 since -m32 may not be needed for ILP32. This patch works for me. OK to install? Ok. Thanks, Richard.
Re: Commit: RL78: Include tree-pass.h
On Wed, Aug 8, 2012 at 5:29 PM, Richard Henderson r...@redhat.com wrote: On 08/08/2012 07:19 AM, Ian Lance Taylor wrote: I was suggesting to for example register a 2nd mdreorg-like pass and add a 2nd target hook. regstack should get the same treatment. If the mechanism is a proliferation of mdreorg passes in every place we want a target-specific pass, that is fine with me. I think it makes much more sense to edit the pass ordering from the backend, rather than hooks upon hooks upon hooks. Since the plugin interface exists, we might as well use it. The issue is that using the plugin interface makes breakage only detectable when you are able to test a target, not by merely building it. That's bad (of course only for those weirdo targets). We should _at least_ provide an interface to internals that for example use the address of the pass structure for pass placement instead of just the dump file name. Richard. r~
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Thu, Aug 9, 2012 at 12:25 AM, Lawrence Crowl cr...@google.com wrote: On 8/8/12, Richard Guenther richard.guent...@gmail.com wrote: On Aug 7, 2012 Lawrence Crowl cr...@google.com wrote: We should probably think about naming conventions for mutating operations, as I expect we will want them eventually. Right. In the end I would prefer explicit constructors. I don't think we're thinking about the same thing. I'm talking about member functions like mystring.append (foo). The += operator is mutating as well. Constructors do not mutate, they create. Ah. For simple objects like double_int I prefer to have either all ops mutating or all ops non-mutating. Richard. -- Lawrence Crowl
Re: [PATCH,i386] fma,fma4 and xop flags
On Thu, Aug 9, 2012 at 7:55 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: Otherwise, what does -mno-fma4 -mxop do? (it should enable both xop and fma4!) what should -mfma4 -mno-xop do (it should disable both xop and fma4!). Yes! that's what GCC does now. Some flags are coupled (atleast for now). For ex, -mno-sse4.2 -mavx enables both sse4.2 and avx whereas -mavx -mno-sse4.2 disables both. Setting of the following are clubbed. 1) 3DNow sets MMX 2) SSE2 sets SSE 3) SSE3 sets SSE2 4) SSE4_1 sets SSE3 5) SSE4_2 sets SSE4_1 6) FMA sets AVX 7) AVX2 sets AVX 8) SSE4_A sets SSE3 9) FMA4 set SSE4_A and AVX 10) XOP sets FMA4 11) AES sets SSE2 12) PCLMUL sets SSE2 13) ABM sets POPCNT Resetting is done in reversely (MMX resets 3DNOW). IMO, if we have different cpuid flags, enabling\disabling the compiler flags depends on these cpuid flags directly. Adding subsets to them or tangling them together may give wrong results. Uh, ok ... it's messier than I anticipated ;) Please let me know your opinion. Well, your patch looks reasonable then. I'll defer to x86 maintainers for approval though. Thanks, Richard. Regards Ganesh -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, August 08, 2012 5:12 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org; ubiz...@gmail.com Subject: Re: [PATCH,i386] fma,fma4 and xop flags On Wed, Aug 8, 2012 at 1:31 PM, ganesh.gopalasubraman...@amd.com wrote: Hello, Bdver2 cpu supports both fma and fma4 instructions. Previous to patch, option -mno-xop removes -mfma4. Similarly, option -mno-fma4 removes -mxop. Eh? Why's that? I think we should disentangle -mxop and -mfma4 instead. Otherwise, what does -mno-fma4 -mxop do? (it should enable both xop and fma4!) what should -mfma4 -mno-xop do (it should disable both xop and fma4!). All this is just confusing to the user, even if in AMD documents XOP includes FMA4. Richard.
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Thu, Aug 9, 2012 at 3:22 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Aug 9, 2012 at 12:25 AM, Lawrence Crowl cr...@google.com wrote: On 8/8/12, Richard Guenther richard.guent...@gmail.com wrote: On Aug 7, 2012 Lawrence Crowl cr...@google.com wrote: We should probably think about naming conventions for mutating operations, as I expect we will want them eventually. Right. In the end I would prefer explicit constructors. I don't think we're thinking about the same thing. I'm talking about member functions like mystring.append (foo). The += operator is mutating as well. Constructors do not mutate, they create. Ah. For simple objects like double_int I prefer to have either all ops mutating or all ops non-mutating. Hmm, isn't that a bit extreme? I mean that does not hold for simple types that int or double, etc. -- Gaby
Re: Value type of map need not be default copyable
Hi, On 08/09/2012 09:14 AM, Marc Glisse wrote: On Wed, 8 Aug 2012, François Dumont wrote: On 08/08/2012 03:39 PM, Paolo Carlini wrote: On 08/08/2012 03:15 PM, François Dumont wrote: I have also introduce a special std::pair constructor for container usage so that we do not have to include the whole tuple stuff just for associative container implementations. To be clear: sorry, this is not an option. Paolo. Then I can only imagine the attached patch which require to include tuple when including unordered_map or unordered_set. The std::pair(piecewise_construct_t, tuple, tuple) is the only constructor that allow to build a pair using the default constructor for the second member. I agree that the extra constructor would be convenient (I probably would have gone with pair(T,__default_construct_t), the symmetric version, and enough extra constructors to resolve all ambiguities). Maybe LWG would consider doing something. When it does, and the corresponding PR will be *ready* we'll reconsider the issue. After all the *months and months and months* spent by the LWG adding and removing members from pair and tweaking everything wrt the containers and issues *still* popping up (like that with the defaulted copy constructor vs insert constraining), and with the support for scoped allocators still missing from our implementation, we are not adding members to std::pair such easily. Sorry, but personally I'm not available now to further discuss this specific point. I was still hoping that for something as simple as mapped_type() we wouldn't need the full tuple machinery, and I encourage everybody to have another look (while making sure anything we figure out adapts smoothly an consistently to std::map), then in a few days we'll take a final decision. We'll still have chances to further improve the code in time for 4.8.0. + __p = __h-_M_allocate_node(std::piecewise_construct, + std::make_tuple(__k), + std::make_tuple()); Don't you want cref(__k)? It might save a move at some point. Are we already doing that elsewhere? I think we should aim for something simple first, then carefully evaluate if the additional complexity is worth the cost and in case deploy the superior solution consistently everywhere it may apply. Thanks! Paolo.
Re: [cxx-conversion] Support garbage-collected C++ templates
On Wed, Aug 8, 2012 at 11:27 PM, Diego Novillo dnovi...@google.com wrote: On 12-08-08 17:25 , Gabriel Dos Reis wrote: Aha, so it is an ordering issue, e.g. declarations being generated after they have been seen used in an instantiation. We might want to consider including the header file (that contains only the declarations of the marking functions) in the header files that contain the GTY-marked type definition. In this case, it would be included near the end of tree.h Right. And that's the part of my plan that requires killing gengtype with fire first. When I started down that path, it became a very messy re-write, so I decided it was better to do it in stages. But now with doing it in stages you end up with (this) first stage that complicates gengtype to support a very small subset of C++ types (namely the one special case you need for vec.h). Exactly what I did _not_ want! I understood that you had the complete killing of gengtype with fire ready (or almost ready). Please finish it instead. Thanks, Richard. Diego.
[Patch, Fortran] PR54199 improve warning is also the name of an intrinsic for internal procedures
This patch makes the warning for internal procedures whose name is the same as the one of an intrinsic clearer. Initially, I though that one shouldn't warn for internal procedures, but others disagree. In any case, the warning text is better than original one. Build and regstested on x86-64-linux. OK for the trunk? Tobias 2012-08-09 Tobias Burnus bur...@net-b.de PR fortran/54199 * intrinsic.c (gfc_warn_intrinsic_shadow): Better warning for internal procedures. 2012-08-09 Tobias Burnus bur...@net-b.de PR fortran/54199 * gfortran.dg/intrinsic_shadow_4.f90: New. diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c index 60c68fe..72b149f 100644 --- a/gcc/fortran/intrinsic.c +++ b/gcc/fortran/intrinsic.c @@ -4503,7 +4511,7 @@ gfc_warn_intrinsic_shadow (const gfc_symbol* sym, bool in_module, bool func) return; /* Emit the warning. */ - if (in_module) + if (in_module || sym-ns-proc_name) gfc_warning ('%s' declared at %L may shadow the intrinsic of the same name. In order to call the intrinsic, explicit INTRINSIC declarations may be required., --- /dev/null 2012-08-08 07:41:43.631684108 +0200 +++ gcc/gcc/testsuite/gfortran.dg/intrinsic_shadow_4.f90 2012-08-09 10:28:55.0 +0200 @@ -0,0 +1,12 @@ +! { dg-do compile } +! { dg-options -Wall } +! +! PR fortran/54199 +! +subroutine test() +contains + real function fraction(x) ! { dg-warning 'fraction' declared at .1. may shadow the intrinsic of the same name. In order to call the intrinsic, explicit INTRINSIC declarations may be required. } +real :: x +fraction = x + end function fraction +end subroutine test
[AArch64] Merge from upstream trunk r189905
Hi, I've just merged upstream trunk on the aarch64-branch up to r189905. Thanks Sofiane
[PATCH][5/n] Allow anonymous SSA names
Another set of small changes. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-09 Richard Guenther rguent...@suse.de * tree.h (SSA_VAR_P): Simplify. * tree-ssanames.c (make_ssa_name_fn): Strengthen assert. * fold-const.c (fold_comparison): Check for default def first before checking for PARM_DECL. * tree-complex.c (get_component_ssa_name): Likewise. * tree-inline.c (remap_ssa_name): Likewise. * tree-ssa-loop-ivopts.c (parm_decl_cost): Likewise. * tree-ssa-structalias.c (get_fi_for_callee): Likewise. (find_what_p_points_to): Likewise. * tree-ssa-operands.c (add_stmt_operand): Simplify. Index: trunk/gcc/fold-const.c === *** trunk.orig/gcc/fold-const.c 2012-08-08 16:49:38.0 +0200 --- trunk/gcc/fold-const.c 2012-08-09 11:08:52.273217092 +0200 *** fold_comparison (location_t loc, enum tr *** 8940,8955 auto_var_in_fn_p (base0, current_function_decl) !indirect_base1 TREE_CODE (base1) == SSA_NAME ! TREE_CODE (SSA_NAME_VAR (base1)) == PARM_DECL ! SSA_NAME_IS_DEFAULT_DEF (base1)) || (TREE_CODE (arg1) == ADDR_EXPR indirect_base1 TREE_CODE (base1) == VAR_DECL auto_var_in_fn_p (base1, current_function_decl) !indirect_base0 TREE_CODE (base0) == SSA_NAME !TREE_CODE (SSA_NAME_VAR (base0)) == PARM_DECL !SSA_NAME_IS_DEFAULT_DEF (base0))) { if (code == NE_EXPR) return constant_boolean_node (1, type); --- 8940,8955 auto_var_in_fn_p (base0, current_function_decl) !indirect_base1 TREE_CODE (base1) == SSA_NAME ! SSA_NAME_IS_DEFAULT_DEF (base1) ! TREE_CODE (SSA_NAME_VAR (base1)) == PARM_DECL) || (TREE_CODE (arg1) == ADDR_EXPR indirect_base1 TREE_CODE (base1) == VAR_DECL auto_var_in_fn_p (base1, current_function_decl) !indirect_base0 TREE_CODE (base0) == SSA_NAME !SSA_NAME_IS_DEFAULT_DEF (base0) ! TREE_CODE (SSA_NAME_VAR (base0)) == PARM_DECL)) { if (code == NE_EXPR) return constant_boolean_node (1, type); Index: trunk/gcc/tree-complex.c === *** trunk.orig/gcc/tree-complex.c 2012-08-08 16:49:38.0 +0200 --- trunk/gcc/tree-complex.c2012-08-09 11:19:15.799195507 +0200 *** get_component_ssa_name (tree ssa_name, b *** 495,502 is used in an abnormal phi, and whether it's uninitialized. */ SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ret) = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ssa_name); ! if (TREE_CODE (SSA_NAME_VAR (ssa_name)) == VAR_DECL ! SSA_NAME_IS_DEFAULT_DEF (ssa_name)) { SSA_NAME_DEF_STMT (ret) = SSA_NAME_DEF_STMT (ssa_name); set_ssa_default_def (cfun, SSA_NAME_VAR (ret), ret); --- 495,502 is used in an abnormal phi, and whether it's uninitialized. */ SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ret) = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ssa_name); ! if (SSA_NAME_IS_DEFAULT_DEF (ssa_name) ! TREE_CODE (SSA_NAME_VAR (ssa_name)) == VAR_DECL) { SSA_NAME_DEF_STMT (ret) = SSA_NAME_DEF_STMT (ssa_name); set_ssa_default_def (cfun, SSA_NAME_VAR (ret), ret); Index: trunk/gcc/tree-inline.c === *** trunk.orig/gcc/tree-inline.c2012-08-08 16:49:38.0 +0200 --- trunk/gcc/tree-inline.c 2012-08-09 11:19:15.800195507 +0200 *** remap_ssa_name (tree name, copy_body_dat *** 187,194 if (processing_debug_stmt) { ! if (TREE_CODE (SSA_NAME_VAR (name)) == PARM_DECL ! SSA_NAME_IS_DEFAULT_DEF (name) id-entry_bb == NULL single_succ_p (ENTRY_BLOCK_PTR)) { --- 187,194 if (processing_debug_stmt) { ! if (SSA_NAME_IS_DEFAULT_DEF (name) ! TREE_CODE (SSA_NAME_VAR (name)) == PARM_DECL id-entry_bb == NULL single_succ_p (ENTRY_BLOCK_PTR)) { Index: trunk/gcc/tree-ssa-loop-ivopts.c === *** trunk.orig/gcc/tree-ssa-loop-ivopts.c 2012-08-08 16:49:38.0 +0200 --- trunk/gcc/tree-ssa-loop-ivopts.c2012-08-09 11:19:15.801195507 +0200 *** parm_decl_cost (struct ivopts_data *data *** 4657,4664 STRIP_NOPS (sbound); if (TREE_CODE (sbound) == SSA_NAME TREE_CODE (SSA_NAME_VAR (sbound)) == PARM_DECL -gimple_nop_p (SSA_NAME_DEF_STMT (sbound)) data-body_includes_call)
Re: [Patch, Fortran] PR54199 improve warning is also the name of an intrinsic for internal procedures
On 09/08/2012 11:12, Tobias Burnus wrote: This patch makes the warning for internal procedures whose name is the same as the one of an intrinsic clearer. Initially, I though that one shouldn't warn for internal procedures, but others disagree. In any case, the warning text is better than original one. Build and regstested on x86-64-linux. OK for the trunk? OK.
Fix PR 53701
Hello, The problem in question is uncovered by the recent speculation patch, it is in the handling of expressions blocked by bookkeeping. Those are expressions that become unavailable due to the newly created bookkeeping copies. In the original algorithm the supported insns and transformations cannot lead to this result, but when handling non-separable insns or creating speculative checks that unpredictably block certain insns the situation can arise. We just filter out all such expressions from the final availability set for correctness. The PR happens because the expression being filtered out can be transformed while being moved up, thus we need to look up not only its exact pattern but also all its previous forms saved in its history of changes. The patch does exactly that, I also clarified the comments w.r.t. this situation. Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, too. OK for trunk? Also need to backport this to 4.7 with PR 53975, say on the next week. Yours, Andrey gcc: 2012-08-09 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/53701 * sel-sched.c (vinsn_vec_has_expr_p): Clarify function comment. Process not only expr's vinsns but all old vinsns from expr's history of changes. (update_and_record_unavailable_insns): Clarify comment. testsuite: 2012-08-09 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/53701 * gcc.dg/pr53701.c: New test. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 3099b92..f0c6eaf 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -3564,29 +3564,41 @@ process_use_exprs (av_set_t *av_ptr) return NULL; } -/* Lookup EXPR in VINSN_VEC and return TRUE if found. */ +/* Lookup EXPR in VINSN_VEC and return TRUE if found. Also check patterns from + EXPR's history of changes. */ static bool vinsn_vec_has_expr_p (vinsn_vec_t vinsn_vec, expr_t expr) { - vinsn_t vinsn; + vinsn_t vinsn, expr_vinsn; int n; + unsigned i; - FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn) -if (VINSN_SEPARABLE_P (vinsn)) - { -if (vinsn_equal_p (vinsn, EXPR_VINSN (expr))) - return true; - } -else - { -/* For non-separable instructions, the blocking insn can have - another pattern due to substitution, and we can't choose - different register as in the above case. Check all registers - being written instead. */ -if (bitmap_intersect_p (VINSN_REG_SETS (vinsn), -VINSN_REG_SETS (EXPR_VINSN (expr - return true; - } + /* Start with checking expr itself and then proceed with all the old forms + of expr taken from its history vector. */ + for (i = 0, expr_vinsn = EXPR_VINSN (expr); + expr_vinsn; + expr_vinsn = (i VEC_length (expr_history_def, + EXPR_HISTORY_OF_CHANGES (expr)) + ? VEC_index (expr_history_def, + EXPR_HISTORY_OF_CHANGES (expr), + i++)-old_expr_vinsn + : NULL)) +FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn) + if (VINSN_SEPARABLE_P (vinsn)) + { + if (vinsn_equal_p (vinsn, expr_vinsn)) + return true; + } + else + { + /* For non-separable instructions, the blocking insn can have + another pattern due to substitution, and we can't choose + different register as in the above case. Check all registers + being written instead. */ + if (bitmap_intersect_p (VINSN_REG_SETS (vinsn), + VINSN_REG_SETS (expr_vinsn))) + return true; + } return false; } @@ -5694,8 +5706,8 @@ update_and_record_unavailable_insns (basic_block book_block) || EXPR_TARGET_AVAILABLE (new_expr) != EXPR_TARGET_AVAILABLE (cur_expr)) /* Unfortunately, the below code could be also fired up on - separable insns. - FIXME: add an example of how this could happen. */ + separable insns, e.g. when moving insns through the new + speculation check as in PR 53701. */ vinsn_vec_add (vec_bookkeeping_blocked_vinsns, cur_expr); } diff --git a/gcc/testsuite/gcc.dg/pr53701.c b/gcc/testsuite/gcc.dg/pr53701.c new file mode 100644 index 000..2c85223 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr53701.c @@ -0,0 +1,59 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options -O3 -fselective-scheduling2 -fsel-sched-pipelining } */ +typedef unsigned short int uint16_t; +typedef unsigned long int uintptr_t; +typedef struct GFX_VTABLE +{ + int color_depth; + unsigned char *line[]; +} +BITMAP; +extern int _drawing_mode; +extern BITMAP *_drawing_pattern; +extern int _drawing_y_anchor; +extern unsigned int _drawing_x_mask; +extern unsigned int _drawing_y_mask; +extern uintptr_t bmp_write_line (BITMAP *, int); + void +_linear_hline15 (BITMAP * dst, int dx1, int dy, int dx2, int color) +{ + int w; + if (_drawing_mode == 0) + { +int x, curw; +unsigned
Re: Value type of map need not be default copyable
On 9 August 2012 09:35, Paolo Carlini wrote: When it does, and the corresponding PR will be *ready* we'll reconsider the issue. After all the *months and months and months* spent by the LWG adding and removing members from pair and tweaking everything wrt the containers and issues *still* popping up (like that with the defaulted copy constructor vs insert constraining), and with the support for scoped allocators still missing from our implementation, we are not adding members to std::pair such easily. Sorry, but personally I'm not available now to further discuss this specific point. I'm with Paolo on this. No additional (non-standard) constructors in std::pair please. If it was possible to do without changing the ABI I'd include tuple in the unordered containers anyway, when add scoped allocator support, because std::tuple already knows how to avoid the EBO for 'final' allocators (PR 51365). I'd do the same in the other containers except that they need to work in C++03 mode without std::tuple. I think we should consider std::tuple almost as fundamental as std::pair and shouldn't jump through hoops to avoid using it. It's already included by memory for example, to implement std::unique_ptr, and I recently made changes to make it easier to use std::unique_ptr internally, so we shouldn't be afraid of std::tuple getting used more widely.
Re: [Patch, Fortran] PR40881 - Add two F95 obsolescence warnings
On 08/08/2012 19:12, Tobias Burnus wrote: With this patch, I think the only unimplemented obsolescence warning is for (8) Fixed form source -- see B.2.7. For the latter, I would like to see a possibility to silence that warning, given that there is substantial code around, which is in fixed form but otherwise a completely valid and obsolescent-free code. We could silence it with explicit -ffixed-form. The motivation for implementing this patch was that I did a small obsolescent cleanup of our fixed-form code (which uses some Fortran 2003 features) and I realized that ifort had the shared DO termination warning and gfortran didn't. Build and regtested on x86-64-gnu-linux. OK for the trunk? More comments below. Regarding the general design, I'm not sure it makes sense to distinguish between ST_LABEL_DO_TARGET and ST_LABEL_ENDDO_TARGET. There are no ST_LABEL_GOTO_TARGET or ST_LABEL_WRITE_TARGET after all. diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index b6e2975..9670022 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -146,8 +146,8 @@ ar_type; /* Statement label types. */ typedef enum -{ ST_LABEL_UNKNOWN = 1, ST_LABEL_TARGET, - ST_LABEL_BAD_TARGET, ST_LABEL_FORMAT +{ ST_LABEL_UNKNOWN = 1, ST_LABEL_TARGET, ST_LABEL_DO_TARGET, + ST_LABEL_ENDDO_TARGET, ST_LABEL_BAD_TARGET, ST_LABEL_FORMAT } gfc_sl_type; Please add a comment explaining the different types; something like: The labels referenced in DO statements and defined in END DO statements get types respectively ST_LABEL_DO_TARGET and ST_LABEL_ENDDO_TARGET instead of the generic ST_LABEL_TARGET so that they can be distinguished to issue DO-specific diagnostics. The DO label is a label reference, so ST_LABEL_DO_TARGET is to be used in gfc_st_label::referenced only. The ST_LABEL_ENDDO_TARGET is the corresponding label definition, and is to be used in gfc_st_label::defined only. @@ -3825,8 +3828,11 @@ parse_executable (gfc_statement st) case ST_NONE: unexpected_eof (); - case ST_FORMAT: case ST_DATA: + gfc_notify_std (GFC_STD_F95_OBS, DATA statement at %C after the +first executable statement); + /* Fall through. */ + case ST_FORMAT: case ST_ENTRY: case_executable: accept_statement (st); This diagnostic is more appropriate in verify_st_order (which needs to be called then). diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 455e6c9..135c1e5 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -2213,12 +2214,19 @@ gfc_define_st_label (gfc_st_label *lp, gfc_sl_type type, locus *label_locus) break; case ST_LABEL_TARGET: + case ST_LABEL_ENDDO_TARGET: if (lp-referenced == ST_LABEL_FORMAT) gfc_error (Label %d at %C already referenced as a format label, labelno); else lp-defined = ST_LABEL_TARGET; I think it should be `lp-defined = type;' here. @@ -2254,14 +2262,16 @@ gfc_reference_st_label (gfc_st_label *lp, gfc_sl_type type) lp-where = gfc_current_locus; } - if (label_type == ST_LABEL_FORMAT type == ST_LABEL_TARGET) + if (label_type == ST_LABEL_FORMAT + (type == ST_LABEL_TARGET || type == ST_LABEL_DO_TARGET)) { gfc_error (Label %d at %C previously used as a FORMAT label, labelno); rc = FAILURE; goto done; } - if ((label_type == ST_LABEL_TARGET || label_type == ST_LABEL_BAD_TARGET) + if ((label_type == ST_LABEL_TARGET || label_type == ST_LABEL_DO_TARGET + || label_type == ST_LABEL_BAD_TARGET) type == ST_LABEL_FORMAT) { gfc_error (Label %d at %C previously used as branch target, labelno); label_type is initialized using either lp-referenced or lp-defined. Thus both ST_LABEL_DO_TARGET and ST_LABEL_ENDDO_TARGET should be checked here. Unless they are merged as suggested above. Mikael
Re: [PATCH] Intrinsics for ADCX
Hi guys, This patch generalizes recently commited addcarryx-intrinsic so that it could be generated either via ADCX or common ADC instruction. ADX-* tests are ok, bootstrap is passed. Is it ok for trunk? Changelog entry: 2012-08-09 Michael Zolotukhin michael.v.zolotuk...@intel.com * config/i386/adxintrin.h: Remove guarding __ADX__ check. * config/i386/x86intrin.h: Likewise. * config/i386/i386.c (ix86_init_mmx_sse_builtins): Remove OPTION_MASK_ISA_ADX from needed options for __builtin_ia32_addcarryx_u32 and __builtin_ia32_addcarryx_u64. (ix86_expand_builtin): Use addmode3_carry in expanding of IX86_BUILTIN_ADDCARRYX32 and IX86_BUILTIN_ADDCARRYX64. testsuite/Changelog entry: 2012-08-09 Michael Zolotukhin michael.v.zolotuk...@intel.com * gcc.target/i386/adx-addxcarry32-3.c: New. * gcc.target/i386/adx-addxcarry64-3.c: New. Thanks, Michael On 1 August 2012 20:37, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi Richard, Frankly I don't understand the point of these instructions being added to the ISA at all. I would have understood an add-with-carry that did *not* modify the flags at all, but two separate ones that modify C and O separately is just downright strange. If there is only one carry in flight, they all are equivalent although ADOX is a little less useful in loops. If there are two carries in flight, that’s where the new instructions show their benefit, since they allow accumulation without destroying each other (see next comment). For any number of carries beyond two, you have to start saving restoring carry bits and it degenerates to the first case for some of them. But to the point: I don't understand the point of having this as a builtin. Is the code generated by this builtin any better than plain C? I think this is just like a practice to introduce new intrinsics for new insns. I doubt, that we may generate such things automatically: c1 = 0; c2 = 0; c1 = _adcx64( res[i], src[i], src2[i], c1); c1 = _adcx64( res[i+1], src[i+1], src2[i+1], c1); c2 = _adcx64( res[i], src[i], src2[i], c2); c2 = _adcx64( res[i+1], src[i+1], src2[i+1], c2); And if you're going to have the builtin, why is this restricted to adx anyway? You obviously can produce the same results with the good old fashioned adc instruction as well. We have one intrinsic for both ADCX/ADOX. So, we just picked up first one to use when exanding the built-in Which begs the question of why you've got a separate pattern for the adx anyway. If the insn is so much better, it ought to be used in the same pattern we use for adc now. I believe, we may introduce global variant of ADCX, which may be expanded into either of ADC/ADCX/ADOX on x86 and into analogs on the other ports. K -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. bdw-adx-5.gcc.patch Description: Binary data
Re: [PATCH] Strength reduction part 3 of 4: candidates with unknown strides
On Wed, 2012-08-08 at 19:22 -0700, Janis Johnson wrote: On 08/08/2012 06:41 PM, William J. Schmidt wrote: On Wed, 2012-08-08 at 15:35 -0700, Janis Johnson wrote: On 08/08/2012 03:27 PM, Andrew Pinski wrote: On Wed, Aug 8, 2012 at 3:25 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Aug 1, 2012 at 10:36 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-tree-dom2 -fwrapv } */ +/* { dg-skip-if { ilp32 } { -m32 } { } } */ + This doesn't work on x32 nor Linux/ia32 since -m32 may not be needed for ILP32. This patch works for me. OK to install? This also does not work for mips64 where the options are either -mabi=32 or -mabi=n32 for ILP32. HJL's patch looks correct. Thanks, Andrew There are GCC targets with 16-bit integers. What's the actual set of targets on which this test is meant to run? There's a list of effective-target names based on data type sizes in http://gcc.gnu.org/onlinedocs/gccint/Effective_002dTarget-Keywords.html#Effective_002dTarget-Keywords. Yes, sorry. The test really is only valid when int and long have different sizes. So according to that link we should skip ilp32 and llp64 at a minimum. It isn't clear what we should do for int16 since the size of long isn't specified, so I suppose we should skip that as well. So, perhaps modify HJ's patch to have /* { dg-do compile { target { ! { ilp32 llp64 int16 } } } } */ ? Thanks, Bill That's confusing. Perhaps what you really need is a new effective target for sizeof(int) != sizeof(long). Good idea. I'll work up a patch when I get a moment. Thanks, Bill Janis
Re: [cxx-conversion] Support garbage-collected C++ templates
On Thu, Aug 9, 2012 at 5:03 AM, Richard Guenther richard.guent...@gmail.com wrote: But now with doing it in stages you end up with (this) first stage that complicates gengtype to support a very small subset of C++ types (namely the one special case you need for vec.h). Exactly what I did _not_ want! No. It supports all C++ types. All it needs is the user annotation. I understood that you had the complete killing of gengtype with fire ready (or almost ready). Please finish it instead. No. It was not even close. The full re-write will wait until the branch is merged in trunk. It will touch too many files and the branch is already hard to maintain as it is. Adding support for explicit user annotations is easy enough. Patch coming up. Diego.
Test...
.
[PATCH, alpha]: Prevent another case of linker issues with exception handler
Hello! This problem is similar to [1], but in this case issue occurs when exception handler immediately follows sibcall function. This happens in libstdc++, ./include/ext/pb_ds/detail/binary_heap_/split_join_fn_imps.hpp: 130 This is the reason for testcase failure in [2]: Running target unix FAIL: ext/pb_ds/regression/priority_queue_rand.cc execution test We have to prevent this situation in the same way as in [1]: pad sibcall function call with a nop, when GP load immediately follows the call. Following patch fixes the failure. 2012-08-09 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.c (alpha_pad_noreturn): Rename to ... (alpha_pad_function_end): ... this. Also insert NOP between sibling call and GP load. (alpha_reorg): Update call to alpha_pad_function_end. Expand comment. Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu. OK for mainline and release branches? [1] http://gcc.gnu.org/ml/gcc-patches/2008-12/msg01097.html [2] http://gcc.gnu.org/ml/gcc-testresults/2012-08/msg00583.html Uros.
Re: [cxx-conversion] Support garbage-collected C++ templates
On Thu, Aug 9, 2012 at 2:44 PM, Diego Novillo dnovi...@google.com wrote: On Thu, Aug 9, 2012 at 5:03 AM, Richard Guenther richard.guent...@gmail.com wrote: But now with doing it in stages you end up with (this) first stage that complicates gengtype to support a very small subset of C++ types (namely the one special case you need for vec.h). Exactly what I did _not_ want! No. It supports all C++ types. All it needs is the user annotation. You said it only works for types in the template parameter list and there you only support types (and not integers). Which means I fail to see how it works for VEC(tree). As far as I understand you are not creating the gt_pch_nx overloads for all GTYed types (which includes 'tree'). If you do then I fail to see why it should be restricted at all? I understood that you had the complete killing of gengtype with fire ready (or almost ready). Please finish it instead. No. It was not even close. The full re-write will wait until the branch is merged in trunk. It will touch too many files and the branch is already hard to maintain as it is. Well. So what are exactly the limitations? If I can provide user-defined gc routines for all C++ types and gengtype will pick them up automagically when auto-generating gc routines for other types then fine. What I do not understand is why you need a GTY(()) annotation on C++ types with user-defined gc routines. gengtype should treat all types not marked with GTY(()) as having user-defined gc routines, no? Richard. Adding support for explicit user annotations is easy enough. Patch coming up. Diego.
Re: [PATCH, alpha]: Prevent another case of linker issues with exception handler
On Thu, Aug 9, 2012 at 3:04 PM, Uros Bizjak ubiz...@gmail.com wrote: 2012-08-09 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.c (alpha_pad_noreturn): Rename to ... (alpha_pad_function_end): ... this. Also insert NOP between sibling call and GP load. (alpha_reorg): Update call to alpha_pad_function_end. Expand comment. Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu. OK for mainline and release branches? Now with the patch. Uros. Index: alpha.c === --- alpha.c (revision 190247) +++ alpha.c (working copy) @@ -9258,17 +9258,18 @@ alpha_align_insns (unsigned int max_align, } } -/* Insert an unop between a noreturn function call and GP load. */ +/* Insert an unop between sibcall or noreturn function call and GP load. */ static void -alpha_pad_noreturn (void) +alpha_pad_function_end (void) { rtx insn, next; for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) { if (! (CALL_P (insn) - find_reg_note (insn, REG_NORETURN, NULL_RTX))) + (SIBLING_CALL_P (insn) +|| find_reg_note (insn, REG_NORETURN, NULL_RTX continue; /* Make sure we do not split a call and its corresponding @@ -9300,11 +9301,31 @@ static void static void alpha_reorg (void) { - /* Workaround for a linker error that triggers when an - exception handler immediatelly follows a noreturn function. + /* Workaround for a linker error that triggers when an exception + handler immediatelly follows a sibcall or a noreturn function. +In the sibcall case: + The instruction stream from an object file: + 1d8: 00 00 fb 6b jmp (t12) + 1dc: 00 00 ba 27 ldahgp,0(ra) + 1e0: 00 00 bd 23 lda gp,0(gp) + 1e4: 00 00 7d a7 ldq t12,0(gp) + 1e8: 00 40 5b 6b jsr ra,(t12),1ec __funcZ+0x1ec + + was converted in the final link pass to: + + 12003aa88: 67 fa ff c3 br 120039428 ... + 12003aa8c: 00 00 fe 2f unop + 12003aa90: 00 00 fe 2f unop + 12003aa94: 48 83 7d a7 ldq t12,-31928(gp) + 12003aa98: 00 40 5b 6b jsr ra,(t12),12003aa9c __func+0x1ec + +And in the noreturn case: + + The instruction stream from an object file: + 54: 00 40 5b 6b jsr ra,(t12),58 __func+0x58 58: 00 00 ba 27 ldahgp,0(ra) 5c: 00 00 bd 23 lda gp,0(gp) @@ -9321,11 +9342,11 @@ alpha_reorg (void) GP load instructions were wrongly cleared by the linker relaxation pass. This workaround prevents removal of GP loads by inserting - an unop instruction between a noreturn function call and + an unop instruction between a sibcall or noreturn function call and exception handler prologue. */ if (current_function_has_exception_handlers ()) -alpha_pad_noreturn (); +alpha_pad_function_end (); if (alpha_tp != ALPHA_TP_PROG || flag_exceptions) alpha_handle_trap_shadows ();
Re: Fix PR 53701
On Thu, 9 Aug 2012, Andrey Belevantsev wrote: Hello, The problem in question is uncovered by the recent speculation patch, it is in the handling of expressions blocked by bookkeeping. Those are expressions that become unavailable due to the newly created bookkeeping copies. In the original algorithm the supported insns and transformations cannot lead to this result, but when handling non-separable insns or creating speculative checks that unpredictably block certain insns the situation can arise. We just filter out all such expressions from the final availability set for correctness. The PR happens because the expression being filtered out can be transformed while being moved up, thus we need to look up not only its exact pattern but also all its previous forms saved in its history of changes. The patch does exactly that, I also clarified the comments w.r.t. this situation. Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, too. OK for trunk? Also need to backport this to 4.7 with PR 53975, say on the next week. This is OK. Thanks. Alexander
Re: [SH] PR 50751
Oleg Endo oleg.e...@t-online.de wrote: This patch fixes a minor issue related to the displacement addressing patterns, which leads to useless movt exts.* sequences and one of the predicates wrongly accepting non-mem ops. Tested on rev 190151 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
Re: [SH] PR 39423
Oleg Endo oleg.e...@t-online.de wrote: How about the attached patch? Is that way of dealing with the mems OK? What could be a possible test case for the alias info issue? Tested on rev 190151 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. This patch is OK. Regards, kaz
Re: [SH] PR 51244 - Improve store of floating-point comparison
Oleg Endo oleg.e...@t-online.de wrote: This patch mainly improves stores of negated/inverted floating point comparison results in regs and removes a useless zero-extension after storing the negated T bit in a reg. [snip] Tested on rev 190151 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
Re: [PATCH, libjava] Use accessor functions to manipulate xmlOutputBuffer
Andrew Hughes ahug...@redhat.com writes: OK. As this is a GNU Classpath change, it should go in there first to avoid creating a divergence which will cause later problems in merging. Classpath is regularly merged into gcj as a whole. I found several patches during the last merge which had only been added to gcj (some without ChangeLog entries) and this slowed the process down considerably. Dodji, I can push this to Classpath on your behalf if you don't have commit access. Oops. I committed the patch before I saw your message. Sorry. If you agree, I can revert the commit so that you can commit it to classpath then. I don't think I have commit access to GNU classpath. Sorry for the inconvenience. -- Dodji
[PATCH] Fix PR54027
This fixes PR54027, VRP treating overflow in signed left-shifts undefined. Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Richard. 2012-08-09 Richard Guenther rguent...@suse.de PR tree-optimization/54027 * tree-vrp.c (extract_range_from_binary_expr_1): Merge RSHIFT_EXPR and LSHIFT_EXPR handling, force -fwrapv for the multiplication used to handle LSHIFT_EXPR with a constant. * gcc.dg/torture/pr54027.c: New testcase. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 190252) --- gcc/tree-vrp.c (working copy) *** extract_range_from_binary_expr_1 (value_ *** 2726,2782 extract_range_from_multiplicative_op_1 (vr, code, vr0, vr1); return; } ! else if (code == RSHIFT_EXPR) { /* If we have a RSHIFT_EXPR with any shift values outside [0..prec-1], then drop to VR_VARYING. Outside of this range we get undefined behavior from the shift operation. We cannot even trust SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl shifts, and the operation at the tree level may be widened. */ ! if (vr1.type != VR_RANGE ! || !value_range_nonnegative_p (vr1) ! || TREE_CODE (vr1.max) != INTEGER_CST ! || compare_tree_int (vr1.max, TYPE_PRECISION (expr_type) - 1) == 1) { ! set_value_range_to_varying (vr); ! return; } - - extract_range_from_multiplicative_op_1 (vr, code, vr0, vr1); - return; - } - else if (code == LSHIFT_EXPR) - { - /* If we have a LSHIFT_EXPR with any shift values outside [0..prec-1], -then drop to VR_VARYING. Outside of this range we get undefined -behavior from the shift operation. We cannot even trust -SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl -shifts, and the operation at the tree level may be widened. */ - if (vr1.type != VR_RANGE - || !value_range_nonnegative_p (vr1) - || TREE_CODE (vr1.max) != INTEGER_CST - || compare_tree_int (vr1.max, TYPE_PRECISION (expr_type) - 1) == 1) - { - set_value_range_to_varying (vr); - return; - } - - /* We can map shifts by constants to MULT_EXPR handling. */ - if (range_int_cst_singleton_p (vr1)) - { - value_range_t vr1p = VR_INITIALIZER; - vr1p.type = VR_RANGE; - vr1p.min - = double_int_to_tree (expr_type, - double_int_lshift (double_int_one, -TREE_INT_CST_LOW (vr1.min), -TYPE_PRECISION (expr_type), -false)); - vr1p.max = vr1p.min; - extract_range_from_multiplicative_op_1 (vr, MULT_EXPR, vr0, vr1p); - return; - } - set_value_range_to_varying (vr); return; } --- 2726,2773 extract_range_from_multiplicative_op_1 (vr, code, vr0, vr1); return; } ! else if (code == RSHIFT_EXPR ! || code == LSHIFT_EXPR) { /* If we have a RSHIFT_EXPR with any shift values outside [0..prec-1], then drop to VR_VARYING. Outside of this range we get undefined behavior from the shift operation. We cannot even trust SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl shifts, and the operation at the tree level may be widened. */ ! if (range_int_cst_p (vr1) ! compare_tree_int (vr1.min, 0) = 0 ! compare_tree_int (vr1.max, TYPE_PRECISION (expr_type)) == -1) { ! if (code == RSHIFT_EXPR) ! { ! extract_range_from_multiplicative_op_1 (vr, code, vr0, vr1); ! return; ! } ! /* We can map lshifts by constants to MULT_EXPR handling. */ ! else if (code == LSHIFT_EXPR ! range_int_cst_singleton_p (vr1)) ! { ! bool saved_flag_wrapv; ! value_range_t vr1p = VR_INITIALIZER; ! vr1p.type = VR_RANGE; ! vr1p.min ! = double_int_to_tree (expr_type, ! double_int_lshift ! (double_int_one, !TREE_INT_CST_LOW (vr1.min), !TYPE_PRECISION (expr_type), !false)); ! vr1p.max = vr1p.min; ! /* We have to use a wrapping multiply though as signed overflow !on lshifts is implementation defined in C89. */ ! saved_flag_wrapv = flag_wrapv; ! flag_wrapv = 1; ! extract_range_from_binary_expr_1 (vr, MULT_EXPR, expr_type, !
Re: [PATCH] Intrinsics for ADCX
On 08/09/2012 05:21 AM, Michael Zolotukhin wrote: Changelog entry: 2012-08-09 Michael Zolotukhin michael.v.zolotuk...@intel.com * config/i386/adxintrin.h: Remove guarding __ADX__ check. * config/i386/x86intrin.h: Likewise. * config/i386/i386.c (ix86_init_mmx_sse_builtins): Remove OPTION_MASK_ISA_ADX from needed options for __builtin_ia32_addcarryx_u32 and __builtin_ia32_addcarryx_u64. (ix86_expand_builtin): Use addmode3_carry in expanding of IX86_BUILTIN_ADDCARRYX32 and IX86_BUILTIN_ADDCARRYX64. testsuite/Changelog entry: 2012-08-09 Michael Zolotukhin michael.v.zolotuk...@intel.com * gcc.target/i386/adx-addxcarry32-3.c: New. * gcc.target/i386/adx-addxcarry64-3.c: New. Ok. r~
[PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function
Hi, I've always found it silly that in order to change the current function one has to call push_cfun and pop_cfun which conveniently set and restore the value of cfun and in addition to that also set current_function_decl and usually also cache its old value to restore it back afterwards. I also think that, at least throughout the middle-end, we should strive to have current_function_decl consistent with cfun-decl. There are quite a few places where we are not consistent and I think such situations are prone to nasty surprises as various functions rely on cfun and others on current_function_decl and it's easy to be unaware that one of the two is incorrect at the moment. This week I have therefore decided to try and make push_cfun, pop_cfun and push_struct_function also set the current_function_decl. Being afraid of opening a giant can of worms I only a mid-sized hole and left various set_cfuns for later as well as places where we set current_function_decl without bothering with cfun. After a few debugging sessions I came up with the patch below. The changes are mostly mechanical, let me try and explain some of the difficult or not-quite-nice ones, most of which come from calls from front-ends which generally do not care about cfun all that much. - In order to ensure that pop_cfun will reliable restore the old current_function_decl, push_cfun asserts that cfun and current_function_decl match. pop_cfun then simply restores current_function_decl to new_cfun-decl or NULL_TREE if new_cfun is NULL. To check that the two remain consistent, pop_cfun has a similar (albeit checking) assert. - I had to allow push_cfun(NULL) because in gfc_get_extern_function_decl in fortran/trans-decl.c we momentarily emulate top-level context by doing: current_function_decl = NULL_TREE; push_cfun (cfun); do_something () pop_cfun (); current_function_decl = save_fn_decl; and to keep current_function_decl consistent with cfun, cfun had to be made NULL too. Co I converted the above to push_cfun (NULL) which also sets current_function_decl to NULL_TREE. - I also had to allow push_cfun(NULL) because dwarf2out_abstract_function does just that, just it looks like: push_cfun (DECL_STRUCT_FUNCTION (decl)); but DECL_STRUCT_FUNCTION is usually (always?) NULL for abstract origin functions. But this also means that changed push_cfun sets current_function_decl to NULL, which means the abstract function is not dwarf2outed as it should be. Thus, in perhaps the most awful thunk in this patch I re-set current_function_decl after calling push_cfun. If someone has a better idea how to deal with this, I'm certainly interested. For the same reason I do not assert that current_function matches cfun-decl in pop_cfun if cfun is NULL. - each cfun change also triggers a pair of init_dummy_function_start and expand_dummy_function_end which invoke push_struct_function and pop_cfun respectively. Because we may be in the middle of another push/pop_cfun, the current_function_decl may not match and so the asserts are disabled in these cases, fortunately we can recognize them by looking at value of in_dummy_function. - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls dump_function which in turns calls dump_function_to_file which calls push_cfun. But Ada front end has its idea of the current_function_decl and there is no cfun which is an inconsistency which makes push_cfun assert fail. I solved it by temporarily setting current_function_decl to NULL_TREE. It's just dumping and I thought that dump_function should be considered middle-end and thus middle-end invariants should apply. The patch passes bootstrap and testing on x86_64-linux (all languages + ada + obj-c++) and ia64-linux (c,c++,fortran,objc,obj-c++). There is some confusing jitter in the go testing results which I have not yet looked at (perhaps compare_tests just can't deal with it, there are tests reported both as newly failing and newly working etc...) but I thought that I'd send the patch now anyway to get some feedback in case I was doing something else wrong (I also do not know whether anyone but Ian can modify the go front-end). I have also LTO-built Mozilla Firefox with the patch. Well, what do you think? Martin 2012-08-08 Martin Jambor mjam...@suse.cz * function.c (push_cfun): Check old current_function_decl matches old cfun, set new current_function_decl to the decl of the new cfun. (push_struct_function): Likewise. (pop_cfun): Likewise. (allocate_struct_function): Move call to invoke_set_current_function_hook to the end of the function. * cfgexpand.c (estimated_stack_frame_size): Do not set and restore current_function_decl. * cgraph.c (cgraph_release_function_body): Likewise. * cgraphunit.c (cgraph_process_new_functions): Likewise.
Re: [PATCH] Intrinsics for ADCX
Ok. Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-08/msg00231.html Thanks, K
Re: [PATCH, libjava] Use accessor functions to manipulate xmlOutputBuffer
- Original Message - Andrew Hughes ahug...@redhat.com writes: OK. As this is a GNU Classpath change, it should go in there first to avoid creating a divergence which will cause later problems in merging. Classpath is regularly merged into gcj as a whole. I found several patches during the last merge which had only been added to gcj (some without ChangeLog entries) and this slowed the process down considerably. Dodji, I can push this to Classpath on your behalf if you don't have commit access. Oops. I committed the patch before I saw your message. Sorry. If you agree, I can revert the commit so that you can commit it to classpath then. I don't think I have commit access to GNU classpath. Sorry for the inconvenience. Don't worry about reverting it. I'll add it to Classpath now, then they'll be in sync when we do the next merge. In future, please post changes to files under the libjava/classpath directory to classp...@gnu.org and feel free to ping me directly if you don't get a response in a reasonable timeframe. It just makes my life a bit easier when it comes to doing the merges :-) -- Dodji -- Andrew :) Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: 248BDC07 (https://keys.indymedia.org/) Fingerprint = EC5A 1F5E C0AD 1D15 8F1F 8F91 3B96 A578 248B DC07
Re: [PATCH, alpha]: Prevent another case of wrong relaxation with exception handler
On 08/08/2012 11:23 PM, Uros Bizjak wrote: 2012-08-09 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.c (alpha_pad_noreturn): Rename to ... (alpha_pad_function_end): ... this. Also insert NOP between sibling call and GP load. (alpha_reorg): Update call to alpha_pad_function_end. Expand comment. Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu. OK for mainline and release branches? Ok everywhere. r~
Re: PATCH: PR rtl-optimization/54157: [x32] -maddress-mode=long failures
On Wed, Aug 8, 2012 at 8:11 AM, Richard Sandiford rdsandif...@googlemail.com wrote: H.J. Lu hjl.to...@gmail.com writes: On Wed, Aug 8, 2012 at 6:43 AM, Uros Bizjak ubiz...@gmail.com wrote: Probably we need to backport this patch to 4.7, where x32 is -maddress-mode=long by default. It doesn't fail on 4.7 branch since checking mode on PLUS CONST is new on trunk. However, I think it is a correctness issue. Is this OK to backport to 4.7? Yeah, I agree we should backport it. Richard I am checking this into 4.7 branch. Tested on Linux/x32, Linux/ia32 and Linux/x86-64. Thanks. -- H.J. --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index bc7c36c..44b0d32 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,13 @@ +2012-08-09 H.J. Lu hongjiu...@intel.com + + Backport from mainline + 2012-08-08 Richard Sandiford rdsandif...@googlemail.com + H.J. Lu hongjiu...@intel.com + + PR rtl-optimization/54157 + * combine.c (gen_lowpart_for_combine): Don't return identity + for CONST or symbolic reference. + 2012-08-06 Uros Bizjak ubiz...@gmail.com Backport from mainline diff --git a/gcc/combine.c b/gcc/combine.c index 3d81da8a..67bd776 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -10802,13 +10802,6 @@ gen_lowpart_for_combine (enum machine_mode omode, rtx x) if (omode == imode) return x; - /* Return identity if this is a CONST or symbolic reference. */ - if (omode == Pmode - (GET_CODE (x) == CONST - || GET_CODE (x) == SYMBOL_REF - || GET_CODE (x) == LABEL_REF)) -return x; - /* We can only support MODE being wider than a word if X is a constant integer or has a mode the same size. */ if (GET_MODE_SIZE (omode) UNITS_PER_WORD diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 9fd8113..ef35a62 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,11 @@ +2012-08-09 H.J. Lu hongjiu...@intel.com + + Backport from mainline + 2012-08-08 H.J. Lu hongjiu...@intel.com + + PR rtl-optimization/54157 + * gcc.target/i386/pr54157.c: New file. + 2012-08-01 Uros Bizjak ubiz...@gmail.com Backport from mainline diff --git a/gcc/testsuite/gcc.target/i386/pr54157.c b/gcc/testsuite/gcc.target/i386/pr54157.c new file mode 100644 index 000..59fcd79 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54157.c @@ -0,0 +1,21 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options -O2 -mx32 -ftree-vectorize } */ + +struct s2{ + int n[24 -1][24 -1][24 -1]; +}; + +struct test2{ + struct s2 e; +}; + +struct test2 tmp2[4]; + +void main1 () +{ + int i,j; + + for (i = 0; i 24 -4; i++) + for (j = 0; j 24 -4; j++) + tmp2[2].e.n[1][i][j] = 8; +}
Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function
On Thu, Aug 9, 2012 at 4:26 PM, Martin Jambor mjam...@suse.cz wrote: Hi, I've always found it silly that in order to change the current function one has to call push_cfun and pop_cfun which conveniently set and restore the value of cfun and in addition to that also set current_function_decl and usually also cache its old value to restore it back afterwards. I also think that, at least throughout the middle-end, we should strive to have current_function_decl consistent with cfun-decl. There are quite a few places where we are not consistent and I think such situations are prone to nasty surprises as various functions rely on cfun and others on current_function_decl and it's easy to be unaware that one of the two is incorrect at the moment. This week I have therefore decided to try and make push_cfun, pop_cfun and push_struct_function also set the current_function_decl. Being afraid of opening a giant can of worms I only a mid-sized hole and left various set_cfuns for later as well as places where we set current_function_decl without bothering with cfun. After a few debugging sessions I came up with the patch below. The changes are mostly mechanical, let me try and explain some of the difficult or not-quite-nice ones, most of which come from calls from front-ends which generally do not care about cfun all that much. - In order to ensure that pop_cfun will reliable restore the old current_function_decl, push_cfun asserts that cfun and current_function_decl match. pop_cfun then simply restores current_function_decl to new_cfun-decl or NULL_TREE if new_cfun is NULL. To check that the two remain consistent, pop_cfun has a similar (albeit checking) assert. - I had to allow push_cfun(NULL) because in gfc_get_extern_function_decl in fortran/trans-decl.c we momentarily emulate top-level context by doing: current_function_decl = NULL_TREE; push_cfun (cfun); do_something () pop_cfun (); current_function_decl = save_fn_decl; and to keep current_function_decl consistent with cfun, cfun had to be made NULL too. Co I converted the above to push_cfun (NULL) which also sets current_function_decl to NULL_TREE. - I also had to allow push_cfun(NULL) because dwarf2out_abstract_function does just that, just it looks like: push_cfun (DECL_STRUCT_FUNCTION (decl)); but DECL_STRUCT_FUNCTION is usually (always?) NULL for abstract origin functions. But this also means that changed push_cfun sets current_function_decl to NULL, which means the abstract function is not dwarf2outed as it should be. Thus, in perhaps the most awful thunk in this patch I re-set current_function_decl after calling push_cfun. If someone has a better idea how to deal with this, I'm certainly interested. For the same reason I do not assert that current_function matches cfun-decl in pop_cfun if cfun is NULL. - each cfun change also triggers a pair of init_dummy_function_start and expand_dummy_function_end which invoke push_struct_function and pop_cfun respectively. Because we may be in the middle of another push/pop_cfun, the current_function_decl may not match and so the asserts are disabled in these cases, fortunately we can recognize them by looking at value of in_dummy_function. - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls dump_function which in turns calls dump_function_to_file which calls push_cfun. But Ada front end has its idea of the current_function_decl and there is no cfun which is an inconsistency which makes push_cfun assert fail. I solved it by temporarily setting current_function_decl to NULL_TREE. It's just dumping and I thought that dump_function should be considered middle-end and thus middle-end invariants should apply. The patch passes bootstrap and testing on x86_64-linux (all languages + ada + obj-c++) and ia64-linux (c,c++,fortran,objc,obj-c++). There is some confusing jitter in the go testing results which I have not yet looked at (perhaps compare_tests just can't deal with it, there are tests reported both as newly failing and newly working etc...) but I thought that I'd send the patch now anyway to get some feedback in case I was doing something else wrong (I also do not know whether anyone but Ian can modify the go front-end). I have also LTO-built Mozilla Firefox with the patch. Well, what do you think? Well. We should try to get rid of most push/pop_cfun calls, and the middle-end should never need to look at current_function_decl ... (in practice we have tree.c and fold-const.c which has to because its shared between FE and middle-end). For example the use in estimate_stack_frame_size. Or the uses in IPA passes. It would be nice to figure out which parts need access to cfun/current_function_decl in them (thus, arrange cfun/current_function_decl to be NULL there). Other than that, yes -
Ping Re: Add --no-sysroot-suffix driver option
Ping. This patch http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00182.html is pending review. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, libjava] Use accessor functions to manipulate xmlOutputBuffer
Andrew Hughes ahug...@redhat.com writes: Don't worry about reverting it. I'll add it to Classpath now, then they'll be in sync when we do the next merge. Thank you. In future, please post changes to files under the libjava/classpath directory to classp...@gnu.org and feel free to ping me directly if you don't get a response in a reasonable timeframe. It just makes my life a bit easier when it comes to doing the merges :-) OK, I will do. Sorry for the inconvenience. -- Dodji
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Aug 9, 2012, at 1:22 AM, Richard Guenther wrote: Ah. For simple objects like double_int I prefer to have either all ops mutating or all ops non-mutating. wide_int, which replaces double_int for int types, is always non-mutating, by value interface. In C++, it will be const input parameters, to avoid the copies and retain the performance. We maintain a cache under it, and reuse out of it for the long lived objects, for short lived, we just allocate the on the stack as needed.
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
Hi, On Thu, 9 Aug 2012, Mike Stump wrote: Ah. For simple objects like double_int I prefer to have either all ops mutating or all ops non-mutating. wide_int, which replaces double_int for int types, is always non-mutating, by value interface. In C++, it will be const input parameters, to avoid the copies and retain the performance. We maintain a cache under it, and reuse out of it for the long lived objects, for short lived, we just allocate the on the stack as needed. Hmm. And maintaining a cache is faster than passing/returning/manipulating two registers? Ciao, Michael.
Re: MIPS Android patch
On Fri, Apr 20, 2012 at 6:15 PM, Maxim Kuvyrkov ma...@codesourcery.com wrote: On 20/04/2012, at 1:34 PM, Fu, Chao-Ying wrote: Hi Maxim, Richard, I built cross-toolchains for 3 different targets as follows. 1. mips-linux-gnu 2. mips-linux-gnu --enable-targets=all 3. mips64-linux-gnu These targets are affected by this MIPS Android patch. Then, I checked the output from gcc -dumpspecs before and after applying the patch. The specs have 6 places of differences for Android due to new defines in linux-common.h. I am also building GCC natively, and will test GCC natively later. Any feedback? Thanks! Regards, Chao-ying libgcc/ChangeLog 2012-04-19 Chao-ying Fu f...@mips.com * unwind-dw2-fde-dip.c: Define USE_PT_GNU_EH_FRAME for BIONIC. This piece is trivial, so, given that Richard approved the MIPS changes, you are clear to check in after amending the patch per Richard's comments. Please check in the patch to unwind-dw2-fde-dip.c separately, as it is a change on its own. Thank you, This breaks Android/x86 build: #if defined(USE_PT_GNU_EH_FRAME) #include link.h but Bionic/x86 doesn't have link.h -- H.J.
Re: MIPS Android patch
On Thu, Aug 9, 2012 at 8:45 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Apr 20, 2012 at 6:15 PM, Maxim Kuvyrkov ma...@codesourcery.com wrote: On 20/04/2012, at 1:34 PM, Fu, Chao-Ying wrote: Hi Maxim, Richard, I built cross-toolchains for 3 different targets as follows. 1. mips-linux-gnu 2. mips-linux-gnu --enable-targets=all 3. mips64-linux-gnu These targets are affected by this MIPS Android patch. Then, I checked the output from gcc -dumpspecs before and after applying the patch. The specs have 6 places of differences for Android due to new defines in linux-common.h. I am also building GCC natively, and will test GCC natively later. Any feedback? Thanks! Regards, Chao-ying libgcc/ChangeLog 2012-04-19 Chao-ying Fu f...@mips.com * unwind-dw2-fde-dip.c: Define USE_PT_GNU_EH_FRAME for BIONIC. This piece is trivial, so, given that Richard approved the MIPS changes, you are clear to check in after amending the patch per Richard's comments. Please check in the patch to unwind-dw2-fde-dip.c separately, as it is a change on its own. Thank you, This breaks Android/x86 build: #if defined(USE_PT_GNU_EH_FRAME) #include link.h but Bionic/x86 doesn't have link.h -- H.J. I opened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54209 -- H.J.
Re: [PATCH, MIPS] fix MIPS16 hard-float function stub bugs
On 08/08/2012 03:07 AM, Richard Sandiford wrote: It looks like this patch might have been written before: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00756.html which added: /* If we're calling a locally-defined MIPS16 function, we know that it will return values in both the soft-float and hard-float registers. There is no need to use a stub to move the latter to the former. */ if (fp_code == 0 mips16_local_function_p (fn)) return NULL_RTX; to cope with this. Yes, you are right; this patch does predate yours, and I'd missed that you'd already committed another fix for what looks like the same problem. If so, and out of nervousness :-), did the testcase fail with current trunk before the patch? The testcase bundled with the patch is OK on current trunk. But, I have to admit that the real testcase that motivated this patch was building Android. It's going to take us a while to figure out whether your patch alone is adequate to make that work, so I'll withdraw this patch pending the outcome of those experiments. -Sandra
PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
Hi, Bionic C library doesn't provide link.h. This patch reverts revision 186788: http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00740.html OK to install? Thanks. H.J. --- 2012-08-09 H.J. Lu hongjiu...@intel.com PR bootstrap/54209 * unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Don't define for Bionic C library. diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c index 92f8ab5..f57dc8c 100644 --- a/libgcc/unwind-dw2-fde-dip.c +++ b/libgcc/unwind-dw2-fde-dip.c @@ -54,11 +54,6 @@ #endif #if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ - defined(__BIONIC__) -# define USE_PT_GNU_EH_FRAME -#endif - -#if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ defined(__FreeBSD__) __FreeBSD__ = 7 # define ElfW __ElfN # define USE_PT_GNU_EH_FRAME
Re: Commit: RL78: Include tree-pass.h
The issue is that using the plugin interface makes breakage only detectable when you are able to test a target, not by merely building it. You just described *most* of the bugs I have to deal with.
Re: s390: Avoid CAS boolean output inefficiency
This was caused (or perhaps abetted by) the representation of EQ as NE ^ 1. With the subsequent truncation and zero-extend, I think combine reached its insn limit of 3 before seeing everything it needed to see. This can be 4 now, if you tweak the initial heuristic. -- Eric Botcazou
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Aug 9, 2012, at 8:19 AM, Michael Matz wrote: Hmm. And maintaining a cache is faster than passing/returning/manipulating two registers? For the most part, we merely mirror existing code, check out lookup_const_double and immed_double_const. If the existing code is wrong, love to have someone fix it. :-) Also, bear in mind, on a port with with OImode math for example, on a 32-bit host, it would be 8 registers...
Re: [PATCH][7/6] Allow anonymous SSA names
On 08/09/2012 06:20 AM, Richard Guenther wrote: This converts most users of create_tmp_{var,reg} to use anonymous SSA names. To give you one more reason to look at 6/6 ;) Wow, there's some really nice cleanups in there. r~
[PATCH, i386]: Improve LIMIT_RELOAD_CLASSES
On Sat, Aug 4, 2012 at 2:26 PM, Uros Bizjak ubiz...@gmail.com wrote: Without this, on the new testcase we hit the assert in inline_secondary_memory_needed. The comment before the function states: The macro can't work reliably when one of the CLASSES is class containing registers from multiple units (SSE, MMX, integer). We avoid this by never combining those units in single alternative in the machine description. Ensure that this constraint holds to avoid unexpected surprises. So, this indicates that we shouldn't be using INT_SSE_REGS for a reload class at all, and I expect that at the moment we don't. With the patch, the new find_valid_class_1 discovers INT_SSE_REGS as the best class for the register to hold the SYMBOL_REF, leading to the failed assert. Actually. existing LIMIT_RELOAD_CLASS is way too simple to handle all issues with mixed register sets. Looking at ix86_hard_regno_mode_ok, we have problems with DI and SI mode, which can go int XMM and GENERAL regs, and SF and DF mode, which can go into XMM, FLOAT and GENERAL regs, depending on the availability of units. Attached (RFC) patch handles this limitation by limiting multiple register set modes to the natural mode register set, i.e. DI and SI modes will always return GENERAL_REGS, DF and SF will return either SSE_REGS, or FLOAT_REGS or GENERAL_REGS. Please note, that we don't want to widen i.e. CREG or ADREG narrow classes to full GENERAL_REGS. The patch also improves Q_REGS selection in the same way, and adds a couple of missing registers to various register sets, so the macro works as expected. I have committed the patch to mainline SVN. The testcase options were adjusted to really fail for all default cases of mpfmath on x86. Also, the testcase that fails on 64bit targets was added. 2012-08-09 Uros Bizjak ubiz...@gmail.com * config/i386/i386.h (LIMIT_RELOAD_CLASS): Return preferred single unit register class for classes that contain registers form multiple units. (REG_CLASS_CONTENTS): Add missing frame register to FLOAT_INT_REGS, INT_SSE_REGS and FLOAT_INT_SSE_REGS register classes. testsuite/ChangeLog: 2012-08-09 Uros Bizjak ubiz...@gmail.com * gcc.c-torture/compile/20120727-1.c (dg-options): Add -mfpmath=387 for x86 targets. * gcc.c-torture/compile/20120727-2.c: New test. Re-tested on x86_64-pc-linux-gnu {,-m32} and committed. Uros. Index: config/i386/i386.h === --- config/i386/i386.h (revision 190254) +++ config/i386/i386.h (working copy) @@ -1298,9 +1298,9 @@ { 0x1fe00100,0x1fe000 }, /* FP_TOP_SSE_REG */\ { 0x1fe00200,0x1fe000 }, /* FP_SECOND_SSE_REG */ \ { 0x1fe0ff00,0x1fe000 }, /* FLOAT_SSE_REGS */\ - { 0x1, 0x1fe0 }, /* FLOAT_INT_REGS */\ -{ 0x1fe100ff,0x1fffe0 }, /* INT_SSE_REGS */ \ -{ 0x1fe1,0x1fffe0 }, /* FLOAT_INT_SSE_REGS */\ + { 0x11, 0x1fe0 }, /* FLOAT_INT_REGS */\ +{ 0x1ff100ff,0x1fffe0 }, /* INT_SSE_REGS */ \ +{ 0x1ff1,0x1fffe0 }, /* FLOAT_INT_SSE_REGS */\ { 0x,0x1f } \ } @@ -1378,15 +1378,28 @@ /* Place additional restrictions on the register class to use when it is necessary to be able to hold a value of mode MODE in a reload - register for which class CLASS would ordinarily be used. */ + register for which class CLASS would ordinarily be used. -#define LIMIT_RELOAD_CLASS(MODE, CLASS)\ - ((MODE) == QImode !TARGET_64BIT \ -((CLASS) == ALL_REGS || (CLASS) == GENERAL_REGS \ - || (CLASS) == LEGACY_REGS || (CLASS) == INDEX_REGS) \ - ? Q_REGS\ - : (CLASS) == INT_SSE_REGS ? GENERAL_REGS : (CLASS)) + We avoid classes containing registers from multiple units due to + the limitation in ix86_secondary_memory_needed. We limit these + classes to their natural mode single unit register class, depending + on the unit availability. + Please note that reg_class_subset_p is not commutative, so these + conditions mean ... if (CLASS) includes ALL registers from the + register set. */ + +#define LIMIT_RELOAD_CLASS(MODE, CLASS) \ + (((MODE) == QImode !TARGET_64BIT \ + reg_class_subset_p (Q_REGS, (CLASS))) ? Q_REGS \ + : (((MODE) == SImode || (MODE) == DImode) \ + reg_class_subset_p (GENERAL_REGS, (CLASS))) ? GENERAL_REGS\ + : (SSE_FLOAT_MODE_P (MODE) TARGET_SSE_MATH \ + reg_class_subset_p (SSE_REGS, (CLASS))) ? SSE_REGS
Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
On Thu, Aug 9, 2012 at 11:11 AM, Fu, Chao-Ying f...@mips.com wrote: Hi, Bionic C library doesn't provide link.h. This patch reverts revision 186788: http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00740.html OK to install? Thanks. H.J. --- 2012-08-09 H.J. Lu hongjiu...@intel.com PR bootstrap/54209 * unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Don't define for Bionic C library. diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c index 92f8ab5..f57dc8c 100644 --- a/libgcc/unwind-dw2-fde-dip.c +++ b/libgcc/unwind-dw2-fde-dip.c @@ -54,11 +54,6 @@ #endif #if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ - defined(__BIONIC__) -# define USE_PT_GNU_EH_FRAME -#endif - -#if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ defined(__FreeBSD__) __FreeBSD__ = 7 # define ElfW __ElfN # define USE_PT_GNU_EH_FRAME How about this patch? Just enable it for MIPS that provides link.h in Android NDK. Thanks a lot! Regards, Chao-ying Index: unwind-dw2-fde-dip.c === --- unwind-dw2-fde-dip.c(revision 190260) +++ unwind-dw2-fde-dip.c(working copy) @@ -55,6 +55,7 @@ #if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ defined(__BIONIC__) + defined(__mips__) # define USE_PT_GNU_EH_FRAME #endif Sorry, I forgot \ in the previous patch. Ex: Index: unwind-dw2-fde-dip.c === --- unwind-dw2-fde-dip.c(revision 190260) +++ unwind-dw2-fde-dip.c(working copy) @@ -54,7 +54,8 @@ #endif #if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ - defined(__BIONIC__) + defined(__BIONIC__) \ + defined(__mips__) # define USE_PT_GNU_EH_FRAME #endif Where does mips link.h come from? I didn't see it in AOSP Bionic C library. -- H.J.
RE: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
Where does mips link.h come from? I didn't see it in AOSP Bionic C library. -- H.J. It's from development/ndk/platforms/android-9/arch-mips/include/link.h from AOSP checkout. Regards, Chao-ying
Re: [google/gcc-4_7] Fix problems with -fdebug-types-section and local types
On 12-08-08 19:17 , Cary Coutant wrote: 2012-08-07 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin attribute. (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute from original DIE. (clone_tree_hash): Rename to ... (clone_tree_partial): ... this; change callers. Copy DW_TAG_subprogram DIEs as declarations. gcc/testsuite/ * testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case. * testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add -fdebug-types-section flag. OK for google/gcc-4_7. If the trunk review requires substantive changes, then you can just cherry-pick the subsequent patch later. Diego.
RE: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
On Thu, 9 Aug 2012, Fu, Chao-Ying wrote: How about this patch? Just enable it for MIPS that provides link.h in Android NDK. Thanks a lot! Please don't put this sort of architecture conditional in an architecture-independent source file. In this case it should be fine for libgcc's configure to try compiling a file that #includes link.h (obviously, make sure the configure test gets the right results both when it's present and when it's absent), and use the results of that configure test instead of defined(__mips__). (In a bootstrap where libc headers aren't yet present, inhibit_libc should be defined anyway to disable those libgcc features depending on system headers from libc.) -- Joseph S. Myers jos...@codesourcery.com
[google/gcc-4_7] XFAIL libitm failures
As discussed, this patch XFAILs the libitm failures uncovered by http://gcc.gnu.org/viewcvs?view=revisionrevision=190233. OK for google/gcc-4_7? Ollie 2012-08-09 Ollie Wild a...@google.com * testsuite-management/x86_64-grtev3-linux-gnu.xfail: XFAIL libitm failuires. commit 8d78568138de78f11935f92b3143149733ea0172 Author: Ollie Wild a...@google.com Date: Thu Aug 9 14:38:51 2012 -0500 2012-08-09 Ollie Wild a...@google.com * testsuite-management/x86_64-grtev3-linux-gnu.xfail: XFAIL libitm failuires. diff --git a/contrib/ChangeLog.google-4_7 b/contrib/ChangeLog.google-4_7 index c1664f9..fbfc0f5 100644 --- a/contrib/ChangeLog.google-4_7 +++ b/contrib/ChangeLog.google-4_7 @@ -1,3 +1,8 @@ +2012-08-09 Ollie Wild a...@google.com + + * testsuite-management/x86_64-grtev3-linux-gnu.xfail: XFAIL libitm + failuires. + 2012-08-08 Ollie Wild a...@google.com * testsuite-management/powerpc-grtev3-linux-gnu.xfail: xfail diff --git a/contrib/testsuite-management/x86_64-grtev3-linux-gnu.xfail b/contrib/testsuite-management/x86_64-grtev3-linux-gnu.xfail index 4fa47ec..d68b543 100644 --- a/contrib/testsuite-management/x86_64-grtev3-linux-gnu.xfail +++ b/contrib/testsuite-management/x86_64-grtev3-linux-gnu.xfail @@ -68,3 +68,34 @@ flaky | FAIL: libgomp.graphite/force-parallel-6.c execution test # that is resolved. UNRESOLVED: 23_containers/map/element_access/2.cc compilation failed to produce executable FAIL: 23_containers/map/element_access/2.cc (test for excess errors) + +# libitm failures caused by missing --sysroot. +UNRESOLVED: libitm.c++/dropref.C compilation failed to produce executable +FAIL: libitm.c++/dropref.C (test for excess errors) +UNRESOLVED: libitm.c++/eh-1.C compilation failed to produce executable +FAIL: libitm.c++/eh-1.C (test for excess errors) +FAIL: libitm.c++/throwdown.C (test for excess errors) +FAIL: libitm.c/cancel.c (test for excess errors) +UNRESOLVED: libitm.c/cancel.c compilation failed to produce executable +FAIL: libitm.c/clone-1.c (test for excess errors) +UNRESOLVED: libitm.c/clone-1.c compilation failed to produce executable +FAIL: libitm.c/dropref-2.c (test for excess errors) +UNRESOLVED: libitm.c/dropref-2.c compilation failed to produce executable +UNRESOLVED: libitm.c/dropref.c compilation failed to produce executable +FAIL: libitm.c/dropref.c (test for excess errors) +FAIL: libitm.c/memcpy-1.c (test for excess errors) +UNRESOLVED: libitm.c/memcpy-1.c compilation failed to produce executable +FAIL: libitm.c/memset-1.c (test for excess errors) +UNRESOLVED: libitm.c/memset-1.c compilation failed to produce executable +UNRESOLVED: libitm.c/notx.c compilation failed to produce executable +FAIL: libitm.c/notx.c (test for excess errors) +UNRESOLVED: libitm.c/reentrant.c compilation failed to produce executable +FAIL: libitm.c/reentrant.c (test for excess errors) +FAIL: libitm.c/simple-1.c (test for excess errors) +UNRESOLVED: libitm.c/simple-1.c compilation failed to produce executable +UNRESOLVED: libitm.c/simple-2.c compilation failed to produce executable +FAIL: libitm.c/simple-2.c (test for excess errors) +FAIL: libitm.c/stackundo.c (test for excess errors) +UNRESOLVED: libitm.c/stackundo.c compilation failed to produce executable +UNRESOLVED: libitm.c/txrelease.c compilation failed to produce executable +FAIL: libitm.c/txrelease.c (test for excess errors)
Re: [google/gcc-4_7] XFAIL libitm failures
On 12-08-09 15:42 , Ollie Wild wrote: * testsuite-management/x86_64-grtev3-linux-gnu.xfail: XFAIL libitm failuires. OK. Diego.
Re: [PATCH,i386] fma,fma4 and xop flags
On Wed, Aug 8, 2012 at 1:31 PM, ganesh.gopalasubraman...@amd.com wrote: Bdver2 cpu supports both fma and fma4 instructions. Previous to patch, option -mno-xop removes -mfma4. Similarly, option -mno-fma4 removes -mxop. It looks to me that there is some misunderstanding. AFAICS: -mxop implies -mfma4, but reverse is not true. Please see #define OPTION_MASK_ISA_FMA4_SET \ (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_SSE4A_SET \ | OPTION_MASK_ISA_AVX_SET) #define OPTION_MASK_ISA_XOP_SET \ (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET) So, -mxop sets -mfma4, etc ..., but -mfma4 does NOT enable -mxop. OTOH, #define OPTION_MASK_ISA_FMA4_UNSET \ (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_XOP_UNSET) #define OPTION_MASK_ISA_XOP_UNSET OPTION_MASK_ISA_XOP -mno-fma4 implies -mno-xop, but again reverse is not true. Thus, -mno-xop does NOT imply -mno-fma4. So, the patch conditionally disables -mfma or -mfma4. Enabling -mxop is done by also checking -mfma. Please note that conditional handling of ISA flags belongs to ix86_option_override_internal. However, if someone set -mfma4 together with -mfma on the command line, we should NOT disable selected ISA behind user's back, in the same way as we don't disable anything with -march=i386 -msse4. With -march=bdver2, we already marked that only fma is supported, and if user selected -march=bdver2 -mfma4 on the command line, we shouldn't disable anything. Uros.
Re: [PATCH, libjava] Use accessor functions to manipulate xmlOutputBuffer
- Original Message - Andrew Hughes ahug...@redhat.com writes: Don't worry about reverting it. I'll add it to Classpath now, then they'll be in sync when we do the next merge. Thank you. Done: http://git.savannah.gnu.org/cgit/classpath.git/commit/?id=4d4db712cf4df4feb4d7b98bb1b5b448218500b3 In future, please post changes to files under the libjava/classpath directory to classp...@gnu.org and feel free to ping me directly if you don't get a response in a reasonable timeframe. It just makes my life a bit easier when it comes to doing the merges :-) OK, I will do. Sorry for the inconvenience. No worries. It's not immediately obvious for someone new to the libjava codebase. -- Dodji -- Andrew :) Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: 248BDC07 (https://keys.indymedia.org/) Fingerprint = EC5A 1F5E C0AD 1D15 8F1F 8F91 3B96 A578 248B DC07
Re: [PATCH][7/6] Allow anonymous SSA names
On 12-08-09 09:20 , Richard Guenther wrote: if (interm_type) { /* Create a type conversion HALF_TYPE-INTERM_TYPE. */ ! tmp = create_tmp_reg (interm_type, NULL); ! new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, oprnd, NULL_TREE); oprnd = new_oprnd; --- 1119,1125 if (interm_type) { /* Create a type conversion HALF_TYPE-INTERM_TYPE. */ ! new_oprnd = make_ssa_name (interm_type, NULL); Nice! Any chance that you could go over tree-ssa.texi to refresh the internal docs? (I don't recall what we have documented in there, tbh). Diego.
Re: Value type of map need not be default copyable
On 08/09/2012 10:35 AM, Paolo Carlini wrote: Hi, On 08/09/2012 09:14 AM, Marc Glisse wrote: On Wed, 8 Aug 2012, François Dumont wrote: On 08/08/2012 03:39 PM, Paolo Carlini wrote: On 08/08/2012 03:15 PM, François Dumont wrote: I have also introduce a special std::pair constructor for container usage so that we do not have to include the whole tuple stuff just for associative container implementations. To be clear: sorry, this is not an option. Paolo. Then I can only imagine the attached patch which require to include tuple when including unordered_map or unordered_set. The std::pair(piecewise_construct_t, tuple, tuple) is the only constructor that allow to build a pair using the default constructor for the second member. I agree that the extra constructor would be convenient (I probably would have gone with pair(T,__default_construct_t), the symmetric version, and enough extra constructors to resolve all ambiguities). Maybe LWG would consider doing something. When it does, and the corresponding PR will be *ready* we'll reconsider the issue. After all the *months and months and months* spent by the LWG adding and removing members from pair and tweaking everything wrt the containers and issues *still* popping up (like that with the defaulted copy constructor vs insert constraining), and with the support for scoped allocators still missing from our implementation, we are not adding members to std::pair such easily. Sorry, but personally I'm not available now to further discuss this specific point. I was still hoping that for something as simple as mapped_type() we wouldn't need the full tuple machinery, and I encourage everybody to have another look (while making sure anything we figure out adapts smoothly an consistently to std::map), then in a few days we'll take a final decision. We'll still have chances to further improve the code in time for 4.8.0. + __p = __h-_M_allocate_node(std::piecewise_construct, + std::make_tuple(__k), + std::make_tuple()); Don't you want cref(__k)? It might save a move at some point. Are we already doing that elsewhere? I think we should aim for something simple first, then carefully evaluate if the additional complexity is worth the cost and in case deploy the superior solution consistently everywhere it may apply. Thanks! Paolo. Here is an updated version considering the good catch from Marc. However I prefer to use an explicit instantiation of tuple rather than using cref that would have imply inclusion of functional in addition to tuple. I have also updated the test case to use a type without copy and move constructors. 2012-08-09 François Dumont fdum...@gcc.gnu.org Ollie Wild a...@google.com * include/bits/hashtable.h (_Hashtable::_M_insert_bucket): Replace by ... (_Hashtable::_M_insert_node): ... this, new. (_Hashtable::_M_insert(_Args, true_type)): Use latter. * include/bits/hashtable_policy.h (_Map_base::operator[]): Use latter, emplace the value_type rather than insert. * include/std/unordered_map: Include tuple. * include/std/unordered_set: Likewise. * testsuite/util/testsuite_counter_type.h: New. * testsuite/23_containers/unordered_map/operators/2.cc: New. François Index: include/std/unordered_map === --- include/std/unordered_map (revision 190209) +++ include/std/unordered_map (working copy) @@ -38,6 +38,7 @@ #include utility #include type_traits #include initializer_list +#include tuple #include bits/stl_algobase.h #include bits/allocator.h #include bits/stl_function.h // equal_to, _Identity, _Select1st Index: include/std/unordered_set === --- include/std/unordered_set (revision 190209) +++ include/std/unordered_set (working copy) @@ -38,6 +38,7 @@ #include utility #include type_traits #include initializer_list +#include tuple #include bits/stl_algobase.h #include bits/allocator.h #include bits/stl_function.h // equal_to, _Identity, _Select1st Index: include/bits/hashtable_policy.h === --- include/bits/hashtable_policy.h (revision 190209) +++ include/bits/hashtable_policy.h (working copy) @@ -577,8 +577,14 @@ __node_type* __p = __h-_M_find_node(__n, __k, __code); if (!__p) - return __h-_M_insert_bucket(std::make_pair(__k, mapped_type()), - __n, __code)-second; + { + __p = __h-_M_allocate_node(std::piecewise_construct, + std::tupleconst key_type(__k), + std::make_tuple()); + __h-_M_store_code(__p, __code); + return __h-_M_insert_node(__n, __code, __p)-second; + } + return (__p-_M_v).second; } @@ -598,9 +604,14 @@ __node_type* __p = __h-_M_find_node(__n, __k, __code); if (!__p) - return
[cxx-conversion] Avoid overloaded double_int 'constructor'. (issue6441127)
Convert overloaded double_int::make to non-overloaded from_signed and from_unsigned. This change is intended to preserve the exact semantics of the existing expressions using shwi_to_double_int and uhwi_to_double_int. Tested on x86_64. Index: gcc/ChangeLog 2012-08-09 Lawrence Crowl cr...@google.com * double-int.h (double_int::make): Remove. (double_int::from_signed): New. (double_int::from_unsigned): New. (shwi_to_double_int): Use double_int::from_signed instead of double_int::make. (double_int_minus_one): Likewise. (double_int_zero): Likewise. (double_int_one): Likewise. (double_int_two): Likewise. (double_int_ten): Likewise. (uhwi_to_double_int): Use double_int::from_unsigned instead of double_int::make. Index: gcc/double-int.h === --- gcc/double-int.h(revision 190239) +++ gcc/double-int.h(working copy) @@ -60,10 +60,8 @@ public: Second, the GCC conding conventions prefer explicit conversion, and explicit conversion operators are not available until C++11. */ - static double_int make (unsigned HOST_WIDE_INT cst); - static double_int make (HOST_WIDE_INT cst); - static double_int make (unsigned int cst); - static double_int make (int cst); + static double_int from_unsigned (unsigned HOST_WIDE_INT cst); + static double_int from_signed (HOST_WIDE_INT cst); /* No copy assignment operator or destructor to keep the type a POD. */ @@ -188,7 +186,7 @@ public: HOST_WIDE_INT are filled with the sign bit. */ inline -double_int double_int::make (HOST_WIDE_INT cst) +double_int double_int::from_signed (HOST_WIDE_INT cst) { double_int r; r.low = (unsigned HOST_WIDE_INT) cst; @@ -196,17 +194,11 @@ double_int double_int::make (HOST_WIDE_I return r; } -inline -double_int double_int::make (int cst) -{ - return double_int::make (static_cast HOST_WIDE_INT (cst)); -} - /* FIXME(crowl): Remove after converting callers. */ static inline double_int shwi_to_double_int (HOST_WIDE_INT cst) { - return double_int::make (cst); + return double_int::from_signed (cst); } /* Some useful constants. */ @@ -214,17 +206,17 @@ shwi_to_double_int (HOST_WIDE_INT cst) The problem is that a named constant would not be as optimizable, while the functional syntax is more verbose. */ -#define double_int_minus_one (double_int::make (-1)) -#define double_int_zero (double_int::make (0)) -#define double_int_one (double_int::make (1)) -#define double_int_two (double_int::make (2)) -#define double_int_ten (double_int::make (10)) +#define double_int_minus_one (double_int::from_signed (-1)) +#define double_int_zero (double_int::from_signed (0)) +#define double_int_one (double_int::from_signed (1)) +#define double_int_two (double_int::from_signed (2)) +#define double_int_ten (double_int::from_signed (10)) /* Constructs double_int from unsigned integer CST. The bits over the precision of HOST_WIDE_INT are filled with zeros. */ inline -double_int double_int::make (unsigned HOST_WIDE_INT cst) +double_int double_int::from_unsigned (unsigned HOST_WIDE_INT cst) { double_int r; r.low = cst; @@ -232,17 +224,11 @@ double_int double_int::make (unsigned HO return r; } -inline -double_int double_int::make (unsigned int cst) -{ - return double_int::make (static_cast unsigned HOST_WIDE_INT (cst)); -} - /* FIXME(crowl): Remove after converting callers. */ static inline double_int uhwi_to_double_int (unsigned HOST_WIDE_INT cst) { - return double_int::make (cst); + return double_int::from_unsigned (cst); } inline double_int -- This patch is available for review at http://codereview.appspot.com/6441127
Re: Value type of map need not be default copyable
On Thu, 9 Aug 2012, François Dumont wrote: Here is an updated version considering the good catch from Marc. However I prefer to use an explicit instantiation of tuple rather than using cref that would have imply inclusion of functional in addition to tuple. I wouldn't have used make_tuple at all (tuple() is shorter than make_tuple()), but I wanted to stick to your style as much as possible ;-) I don't know if std:: is needed, but it looks strange to have it only on some functions: std::forward_as_tuple(forwardkey_type(__k)), Looking at this line again, you seem to be using std::forward on something that is not a deduced parameter type. I guess it is equivalent to std::move in this case, it just confuses me a bit. * include/std/unordered_map: Include tuple. * include/std/unordered_set: Likewise. Is it a libstdc++ policy to put all includes in the topmost headers, as opposed to the header where they are used? I never paid much attention to it, I was just surprised because it doesn't match what I do in my code. But since hashtable*.h currently include nothing, it is consistent. Does that help with compile-time? (ok, it is a bit obvious that I pretended to make a review just so I had an excuse to ask a question at the end ;-) -- Marc Glisse
[patch] Use SBITMAP_SIZE in a few places
Hello, SBITMAP_SIZE should be used to get the current size of an sbitmap. Bootstrappedtested on powerpc64-unknown-linux-gnu. Will commit as obvious. Ciao! Steven sbitmap_size.diff Description: Binary data
[google/main, google/gcc-4_7] Fix segfault in linemap lookup
This patch is for the google/main and google/gcc-4_7 branches. New code in GCC 4.7 is calling linemap_lookup with a location_t that may still represent a location-with-discriminator. Before using a location_t value to lookup the line number, it needs to be mapped to a real location_t value. Tested with make check-gcc and validate-failures.py. OK for google/main and google/gcc-4_7? 2012-08-09 Cary Coutant ccout...@google.com gcc/ * tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Check for discriminator. * diagnostic.c (diagnostic_report_current_module): Likewise. Index: gcc/tree-diagnostic.c === --- gcc/tree-diagnostic.c (revision 190262) +++ gcc/tree-diagnostic.c (working copy) @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. #include config.h #include system.h #include coretypes.h +#include input.h #include tree.h #include diagnostic.h #include tree-diagnostic.h @@ -115,6 +116,8 @@ maybe_unwind_expanded_macro_loc (diagnos unsigned ix; loc_map_pair loc, *iter; + if (has_discriminator (where)) +where = map_discriminator_location (where); map = linemap_lookup (line_table, where); if (!linemap_macro_expansion_map_p (map)) return; Index: gcc/diagnostic.c === --- gcc/diagnostic.c(revision 190262) +++ gcc/diagnostic.c(working copy) @@ -270,6 +270,9 @@ diagnostic_report_current_module (diagno if (where = BUILTINS_LOCATION) return; + if (has_discriminator (where)) +where = map_discriminator_location (where); + linemap_resolve_location (line_table, where, LRK_MACRO_DEFINITION_LOCATION, map);
[patch] Fix a couple of VEC_reserve uses, speed up update_ssa a bit
Hello, VEC_reserve allocates an *extra* number of slots. There is unfortunately no VEC_resize op (one of the first things to add after the merge of the cxx branch, I suppose...), so to grow a VEC without increasing the used slots count (the VEC_length) it's necessary to compute the number of extra slots needed and reserve only that number of slots. So something like: VEC_reserve (ssa_name_info_p, heap, info_for_ssa_name, num_ssa_names); on an existing VEC with non-null length is wrong. In the worst case, the VEC_length is already num_ssa_names and the VEC ends up twice as large as necessary. Another thing I noticed, is that in update_ssa() we're sbitmap_zero'ing new_ssa_names and old_ssa_names even after we've already done so in init_update_ssa. This might seem like a micro-optimization, but it cuts the time spent in the timevar tree SSA incremental in half for the test case of PR54146... Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven vec_reserve.diff Description: Binary data
Re: [PATCH] Set correct source location for deallocator calls
On 08/08/2012 12:32 PM, Richard Henderson wrote: On 08/08/2012 09:27 AM, Dehao Chen wrote: Then we should probably assign UNKNOWN_LOCATION for these destructor calls, what do you guys think? I think it's certainly plausible. I can't think what other problems such a change would cause. Jason? cxx_maybe_build_cleanup is already trying to do that. If it's missing some cases then yes, let's fix them too. Jason
Re: [patch] Fix a couple of VEC_reserve uses, speed up update_ssa a bit
On 08/09/2012 03:06 PM, Steven Bosscher wrote: + unsigned old_len = name_to_id ? VEC_length (unsigned, name_to_id) : 0; + VEC_reserve (unsigned, heap, name_to_id, num_ssa_names - old_len); VEC_length already handles NULL input. r~
Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
On Thu, Aug 9, 2012 at 9:39 AM, H.J. Lu hongjiu...@intel.com wrote: Bionic C library doesn't provide link.h. Does Bionic provide dl_iterate_phdr? If it does, I'll just note in passing that it would be straightforward to simply incorporate the required types and constants in unwind-dw2-fde-dip.c directly, and avoid the #include. If it doesn't, then of course nothing will make this code work correctly. Ian
Re: [patch] Fix a couple of VEC_reserve uses, speed up update_ssa a bit
On Fri, Aug 10, 2012 at 12:15 AM, Richard Henderson r...@redhat.com wrote: On 08/09/2012 03:06 PM, Steven Bosscher wrote: + unsigned old_len = name_to_id ? VEC_length (unsigned, name_to_id) : 0; + VEC_reserve (unsigned, heap, name_to_id, num_ssa_names - old_len); VEC_length already handles NULL input. I didn't know that. Consider that hunk changed to this: Index: tree-ssa-pre.c === --- tree-ssa-pre.c (revision 190267) +++ tree-ssa-pre.c (working copy) @@ -249,7 +249,8 @@ alloc_expression_id (pre_expr expr) /* VEC_safe_grow_cleared allocates no headroom. Avoid frequent re-allocations by using VEC_reserve upfront. There is no VEC_quick_grow_cleared unfortunately. */ - VEC_reserve (unsigned, heap, name_to_id, num_ssa_names); + unsigned old_len = VEC_length (unsigned, name_to_id); + VEC_reserve (unsigned, heap, name_to_id, num_ssa_names - old_len); VEC_safe_grow_cleared (unsigned, heap, name_to_id, num_ssa_names); gcc_assert (VEC_index (unsigned, name_to_id, version) == 0); VEC_replace (unsigned, name_to_id, version, expr-id);
Re: [SH] PR 54089 - Reinstate T_REG clobber for left shifts
Oleg Endo oleg.e...@t-online.de wrote: Removing the T_REG clobber from the left shift patterns entirely wasn't such a good idea. Especially if dynamic shifts are not available (anything SH3) incorrect code may be generated. The attached patch adds a T_REG clobbering version of the left shift insn ashlsi3_n. While at it, I consolidated the description of the constant shift sequences, which hopefully makes them easier to read and understand. Tested on rev 190151 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
[v3] fix references to C++11 standard
This just updates some comments to refer to the section numbers in the final C++11 standard. * acinclude.m4: Update references to final C++11 standard. * include/bits/shared_ptr.h: Likewise. * include/bits/shared_ptr_base.h: Likewise. * include/bits/unique_ptr.h: Likewise. * include/std/chrono: Likewise. * include/std/thread: Likewise. Tested x86_64-linux, committed to trunk. commit 0d6bc17d16d85865ed4b6deffd455d3e1e12f430 Author: Jonathan Wakely jwakely@gmail.com Date: Thu Aug 9 23:21:27 2012 +0100 * acinclude.m4: Update references to final C++11 standard. * include/bits/shared_ptr.h: Likewise. * include/bits/shared_ptr_base.h: Likewise. * include/bits/unique_ptr.h: Likewise. * include/std/chrono: Likewise. * include/std/thread: Likewise. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 6632725..1179407 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -1115,16 +1115,16 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [ dnl dnl Check for clock_gettime, nanosleep and sched_yield, used in the -dnl implementation of 20.8.5 [time.clock], and 30.2.2 [thread.thread.this] -dnl in the current C++0x working draft. +dnl implementation of 20.11.7 [time.clock], and 30.3.2 [thread.thread.this] +dnl in the C++11 standard. dnl dnl --enable-libstdcxx-time dnl --enable-libstdcxx-time=yes dnlchecks for the availability of monotonic and realtime clocks, -dnlnanosleep and sched_yield in libc and libposix4 and, in case, links -dnl the latter +dnlnanosleep and sched_yield in libc and libposix4 and, if needed, +dnllinks in the latter. dnl --enable-libstdcxx-time=rt -dnlalso searches (and, in case, links) librt. Note that this is +dnlalso searches (and, if needed, links) librt. Note that this is dnlnot always desirable because, in glibc, for example, in turn it dnltriggers the linking of libpthread too, which activates locking, dnla large overhead for single-thread programs. @@ -1256,8 +1256,8 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [ ]) dnl -dnl Check for gettimeofday, used in the implementation of 20.8.5 -dnl [time.clock] in the current C++0x working draft. +dnl Check for gettimeofday, used in the implementation of 20.11.7 +dnl [time.clock] in the C++11 standard. dnl AC_DEFUN([GLIBCXX_CHECK_GETTIMEOFDAY], [ diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h index e1c1eb9..7843365 100644 --- a/libstdc++-v3/include/bits/shared_ptr.h +++ b/libstdc++-v3/include/bits/shared_ptr.h @@ -321,7 +321,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION allocate_shared(const _Alloc __a, _Args... __args); }; - // 20.8.13.2.7 shared_ptr comparisons + // 20.7.2.2.7 shared_ptr comparisons templatetypename _Tp1, typename _Tp2 inline bool operator==(const shared_ptr_Tp1 __a, @@ -425,13 +425,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct lessshared_ptr_Tp : public _Sp_lessshared_ptr_Tp { }; - // 20.8.13.2.9 shared_ptr specialized algorithms. + // 20.7.2.2.8 shared_ptr specialized algorithms. templatetypename _Tp inline void swap(shared_ptr_Tp __a, shared_ptr_Tp __b) noexcept { __a.swap(__b); } - // 20.8.13.2.10 shared_ptr casts. + // 20.7.2.2.9 shared_ptr casts. templatetypename _Tp, typename _Tp1 inline shared_ptr_Tp static_pointer_cast(const shared_ptr_Tp1 __r) noexcept @@ -511,7 +511,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } }; - // 20.8.13.3.7 weak_ptr specialized algorithms. + // 20.7.2.3.6 weak_ptr specialized algorithms. templatetypename _Tp inline void swap(weak_ptr_Tp __a, weak_ptr_Tp __b) noexcept diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index 1ccd5ef..07ac000 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -1056,7 +1056,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION }; - // 20.8.13.2.7 shared_ptr comparisons + // 20.7.2.2.7 shared_ptr comparisons templatetypename _Tp1, typename _Tp2, _Lock_policy _Lp inline bool operator==(const __shared_ptr_Tp1, _Lp __a, @@ -1348,7 +1348,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __weak_count_Lp _M_refcount;// Reference counter. }; - // 20.8.13.3.7 weak_ptr specialized algorithms. + // 20.7.2.3.6 weak_ptr specialized algorithms. templatetypename _Tp, _Lock_policy _Lp inline void swap(__weak_ptr_Tp, _Lp __a, __weak_ptr_Tp, _Lp __b) noexcept diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h index 9b736d4..242d01e 100644 --- a/libstdc++-v3/include/bits/unique_ptr.h +++ b/libstdc++-v3/include/bits/unique_ptr.h @@ -87,7 +87,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION templatetypename _Up void operator()(_Up*) const = delete;
Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
On Thu, Aug 9, 2012 at 3:17 PM, Ian Lance Taylor i...@google.com wrote: On Thu, Aug 9, 2012 at 9:39 AM, H.J. Lu hongjiu...@intel.com wrote: Bionic C library doesn't provide link.h. Does Bionic provide dl_iterate_phdr? If it does, I'll just note in passing that it would be straightforward to simply incorporate the required types and constants in unwind-dw2-fde-dip.c directly, and avoid the #include. If it doesn't, then of course nothing will make this code work correctly. That is a good idea. Pavel, can you look into it? Thanks. -- H.J.
Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
On Thu, Aug 9, 2012 at 4:01 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 9, 2012 at 3:17 PM, Ian Lance Taylor i...@google.com wrote: On Thu, Aug 9, 2012 at 9:39 AM, H.J. Lu hongjiu...@intel.com wrote: Bionic C library doesn't provide link.h. Does Bionic provide dl_iterate_phdr? If it does, I'll just note in passing that it would be straightforward to simply incorporate the required types and constants in unwind-dw2-fde-dip.c directly, and avoid the #include. If it doesn't, then of course nothing will make this code work correctly. That is a good idea. Pavel, can you look into it? You may find libiberty/simple-object-elf.c to be a useful guide. Ian
Re: Value type of map need not be default copyable
On 08/09/2012 11:22 PM, Marc Glisse wrote: I don't know if std:: is needed, but it looks strange to have it only on some functions: std::forward_as_tuple(forwardkey_type(__k)), Looking at this line again, you seem to be using std::forward on something that is not a deduced parameter type. I guess it is equivalent to std::move in this case, it just confuses me a bit. Wanted to point out that yesterday. Please double check std::move. I realize now that nobody is interested in std::cref, good ;) Thanks! Paolo.
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
Hi, On Thu, 9 Aug 2012, Mike Stump wrote: On Aug 9, 2012, at 8:19 AM, Michael Matz wrote: Hmm. And maintaining a cache is faster than passing/returning/manipulating two registers? For the most part, we merely mirror existing code, check out lookup_const_double and immed_double_const. No, I won't without patches on this list. You keep repeating bragging about wide_int during the last two weeks, without offering anything concrete about it whatsoever. You'll understand that I (or anybody else) can't usefully discuss with you any merits or demerits of the implementation you chose. (can I btw. complain about the retainment of underscores? If it's a base data type, then why not wideint? Make that a testament for the quality of feedback you'll get with the information given) I mean, preparing the audience for an upcoming _suggested_ change in data structure of course is fine. But argueing as if the change happenend already, and what's more concerning, as if the change was even already suggested and agreed upon even though that's not the case, is just bad style. I would suggest to stay conservative about whatever you have (except if it's momentarily materializing), and _especially don't argue against or for or not against or for whatever improvement is suggested on the grounds that you have a better, as of yet secret but surely taking-over-the-world very-soon-now implementation of datastructure X_. Nobody has seen it yet, so you can't expect to get any feedback on it. Certainly that's the thing you need to get it into the code base. If the existing code is wrong, love to have someone fix it. :-) Also, bear in mind, on a port with with OImode math for example, on a 32-bit host, it would be 8 registers... Nice try. But what problem do _you_ want to solve? For instance why should a port with OImode for example be interesting to the FSF? I hope you recognize this as half-rhethorical question, but still, how exactly will wide_int help for the goal (which remains to be shown as useful), how is it implemented?, why isn't it worse than crap on sensible (i.e. 64bit) hosts, and why should everybody not interested in such target pay the price, or why isn't there a price to pay for non-OI-targets? I'm actually more intersted in comments to the first part, but still, comments on OI appreciated. Ciao, Michael.
[rl78] add some checks
RTL checking pointed out a couple of cases where rl78.c was extracting info from rtx without checking the rtx type first. Applied. 2012-08-09 DJ Delorie d...@redhat.com * config/rl78/rl78.c (rl78_alloc_physical_registers): Check for SET before extracting SET_SRC. (rl78_remove_unused_sets): Check for REG before extractnig REGNO. Index: config/rl78/rl78.c === --- config/rl78/rl78.c (revision 190277) +++ config/rl78/rl78.c (working copy) @@ -2217,7 +2217,8 @@ GET_CODE (PATTERN (insn)) != CALL) continue; - if (GET_CODE (SET_SRC (PATTERN (insn))) == ASM_OPERANDS) + if (GET_CODE (PATTERN (insn)) == SET + GET_CODE (SET_SRC (PATTERN (insn))) == ASM_OPERANDS) continue; valloc_method = get_attr_valloc (insn); @@ -2644,7 +2645,7 @@ dest = SET_DEST (insn); - if (REGNO (dest) 23) + if (GET_CODE (dest) != REG || REGNO (dest) 23) continue; if (find_regno_note (insn, REG_UNUSED, REGNO (dest)))
[PATCH] Fix PR54211
Fix a thinko in strength reduction. I was checking the type of the wrong operand to determine whether address arithmetic should be used in replacing expressions. This produced a spurious POINTER_PLUS_EXPR when an address was converted to an unsigned long and back again. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Ok for trunk? Thanks, Bill gcc: 2012-08-09 Bill Schmidt wschm...@linux.vnet.ibm.com PR middle-end/54211 * gimple-ssa-strength-reduction.c (analyze_candidates_and_replace): Use cand_type to determine whether pointer arithmetic will be generated. gcc/testsuite: 2012-08-09 Bill Schmidt wschm...@linux.vnet.ibm.com PR middle-end/54211 * gcc.dg/tree-ssa/pr54211.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/pr54211.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr54211.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr54211.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options -Os } */ + +int a, b; +unsigned char e; +void fn1 () +{ +unsigned char *c=0; +for (;; a++) +{ +unsigned char d = *(c + b); +for (; ed; c++) +goto Found_Top; +} +Found_Top: +if (0) +goto Empty_Bitmap; +for (;; a++) +{ +unsigned char *e = c + b; +for (; c e; c++) +goto Found_Bottom; +c -= b; +} +Found_Bottom: +Empty_Bitmap: +; +} Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 190260) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -2534,7 +2534,7 @@ analyze_candidates_and_replace (void) /* Determine whether we'll be generating pointer arithmetic when replacing candidates. */ address_arithmetic_p = (c-kind == CAND_ADD - POINTER_TYPE_P (TREE_TYPE (c-base_expr))); + POINTER_TYPE_P (c-cand_type)); /* If all candidates have already been replaced under other interpretations, nothing remains to be done. */
[PATCH, testsuite] New effective target long_neq_int
As suggested by Janis regarding testsuite/gcc.dg/tree-ssa/slsr-30.c, this patch adds a new effective target for machines having long and int of differing sizes. Tested on powerpc64-unknown-linux-gnu, where the test passes for -m64 and is skipped for -m32. Ok for trunk? Thanks, Bill doc: 2012-08-09 Bill Schmidt wschm...@linux.vnet.ibm.com * sourcebuild.texi: Document long_neq_int effective target. testsuite: 2012-08-09 Bill Schmidt wschm...@linux.vnet.ibm.com * lib/target-supports.exp (check_effective_target_long_neq_int): New. * gcc.dg/tree-ssa/slsr-30.c: Check for long_neq_int effective target. Index: gcc/doc/sourcebuild.texi === --- gcc/doc/sourcebuild.texi(revision 190260) +++ gcc/doc/sourcebuild.texi(working copy) @@ -1303,6 +1303,9 @@ Target has @code{int} that is at 32 bits or longer @item int16 Target has @code{int} that is 16 bits or shorter. +@item long_neq_int +Target has @code{int} and @code{long} with different sizes. + @item large_double Target supports @code{double} that is longer than @code{float}. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 190260) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -1689,6 +1689,15 @@ proc check_effective_target_llp64 { } { }] } +# Return 1 if long and int have different sizes, +# 0 otherwise. + +proc check_effective_target_long_neq_int { } { +return [check_no_compiler_messages long_ne_int object { + int dummy[sizeof (int) != sizeof (long) ? 1 : -1]; +}] +} + # Return 1 if the target supports long double larger than double, # 0 otherwise. Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c === --- gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c (revision 190260) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c (working copy) @@ -1,7 +1,7 @@ /* Verify straight-line strength reduction fails for simple integer addition with casts thrown in when -fwrapv is used. */ -/* { dg-do compile { target { ! { ilp32 } } } } */ +/* { dg-do compile { target { long_neq_int } } } */ /* { dg-options -O3 -fdump-tree-dom2 -fwrapv } */ long
s390: Use VOIDmode with gen_rtx_SET
Committed as obvious. r~ * config/s390/s390.c (s390_expand_insv): Use VOIDmode in gen_rtx_SET. diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 0ae77a2..d67c0eb 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -4684,9 +4684,8 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src) src = gen_lowpart (mode, src); } - op = gen_rtx_SET (mode, - gen_rtx_ZERO_EXTRACT (mode, dest, op1, op2), - src); + op = gen_rtx_ZERO_EXTRACT (mode, dest, op1, op2), + op = gen_rtx_SET (VOIDmode, op, src); clobber = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, CC_REGNUM)); emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clobber)));
[PATCH 0/7] s390 improvements with r[ioxn]sbg
Only tested visually, by examining assembly diffs of the runtime libraries between successive patches. All told it would appear to be some remarkable code size improvements. Please test. r~ Richard Henderson (7): s390: Constraints, predicates, and op letters for contiguous bitmasks s390: Only use lhs zero_extract in word_mode s390: Use risbgz for AND. s390: Add mode attribute for mode bitsize s390: Implement extzv for z10 s390: Generate rxsbg, and shifted forms of rosbg s390: Generate rnsbg gcc/config/s390/constraints.md | 11 +- gcc/config/s390/predicates.md | 10 + gcc/config/s390/s390-protos.h |1 + gcc/config/s390/s390.c | 108 --- gcc/config/s390/s390.md| 385 ++-- 5 files changed, 353 insertions(+), 162 deletions(-) -- 1.7.7.6
[PATCH 2/7] s390: Only use lhs zero_extract in word_mode
This means that anything targeting extimm or z10 must therefore imply zarch, which implies word_mode == DImode. Then, now that *insv_z10 is no longer dependent on mode, let gas do some arithmetic, rather than doing it in C and generating new rtl. --- gcc/config/s390/s390.md | 45 - 1 files changed, 16 insertions(+), 29 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 76ec9c4..2677fb2 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3364,27 +3364,15 @@ FAIL; }) -(define_insn *insvmode_z10 - [(set (zero_extract:GPR (match_operand:GPR 0 nonimmediate_operand +d) - (match_operand 1 const_int_operandI) - (match_operand 2 const_int_operandI)) - (match_operand:GPR 3 nonimmediate_operand d)) +(define_insn *insv_z10 + [(set (zero_extract:DI + (match_operand:DI 0 nonimmediate_operand +d) + (match_operand 1 const_int_operand ) + (match_operand 2 const_int_operand )) + (match_operand:DI 3 nonimmediate_operand d)) (clobber (reg:CC CC_REGNUM))] - TARGET_Z10 -(INTVAL (operands[1]) + INTVAL (operands[2])) = - GET_MODE_BITSIZE (MODEmode) -{ - int start = INTVAL (operands[2]); - int size = INTVAL (operands[1]); - int offset = 64 - GET_MODE_BITSIZE (MODEmode); - - operands[2] = GEN_INT (offset + start); /* start bit position */ - operands[1] = GEN_INT (offset + start + size - 1); /* end bit position */ - operands[4] = GEN_INT (GET_MODE_BITSIZE (MODEmode) - -start - size); /* left shift count */ - - return risbg\t%0,%3,%b2,%b1,%b4; -} + TARGET_Z10 + risbg\t%0,%3,%2,%2+%1-1,64-%2-%1 [(set_attr op_type RIE) (set_attr z10prop z10_super_E1)]) @@ -3483,15 +3471,14 @@ [(set_attr op_type RIL) (set_attr z10prop z10_fwd_E1)]) -; Update the right-most 32 bit of a DI, or the whole of a SI. -(define_insn *insv_lmode_reg_extimm - [(set (zero_extract:P (match_operand:P 0 register_operand +d) - (const_int 32) - (match_operand 1 const_int_operand n)) - (match_operand:P 2 const_int_operand n))] - TARGET_EXTIMM -BITS_PER_WORD - INTVAL (operands[1]) == 32 - iilf\t%0,%o2 +; Update the right-most 32 bit of a DI. +(define_insn *insv_l_di_reg_extimm + [(set (zero_extract:DI (match_operand:DI 0 register_operand +d) +(const_int 32) +(const_int 32)) + (match_operand:DI 1 const_int_operand n))] + TARGET_EXTIMM + iilf\t%0,%o1 [(set_attr op_type RIL) (set_attr z10prop z10_fwd_A1)]) -- 1.7.7.6
[PATCH 7/7] s390: Generate rnsbg
--- gcc/config/s390/s390.md | 55 +++ 1 files changed, 55 insertions(+), 0 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index d733062..182e7b1 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3462,6 +3462,61 @@ rnoxasbg\t%0,%1,%bfstart2,%bfend2,%3 [(set_attr op_type RIE)]) +;; These two are generated by combine for s.bf = val. +;; ??? For bitfields smaller than 32-bits, we wind up with SImode +;; shifts and ands, which results in some truly awful patterns +;; including subregs of operations. Rather unnecessisarily, IMO. +;; Instead of +;; +;; (set (zero_extract:DI (reg/v:DI 50 [ s ]) +;;(const_int 24 [0x18]) +;;(const_int 0 [0])) +;;(subreg:DI (and:SI (subreg:SI (lshiftrt:DI (reg/v:DI 50 [ s ]) +;;(const_int 40 [0x28])) 4) +;;(reg:SI 4 %r4 [ y+4 ])) 0)) +;; +;; we should instead generate +;; +;; (set (zero_extract:DI (reg/v:DI 50 [ s ]) +;;(const_int 24 [0x18]) +;;(const_int 0 [0])) +;;(and:DI (lshiftrt:DI (reg/v:DI 50 [ s ]) +;;(const_int 40 [0x28])) +;;(subreg:DI (reg:SI 4 %r4 [ y+4 ]) 0))) +;; +;; by noticing that we can push down the outer paradoxical subreg +;; into the operation. + +(define_insn *insv_rnsbg_noshift + [(set (zero_extract:DI + (match_operand:DI 0 nonimmediate_operand +d) + (match_operand 1 const_int_operand ) + (match_operand 2 const_int_operand )) + (and:DI + (match_dup 0) + (match_operand:DI 3 nonimmediate_operand d))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 +INTVAL (operands[1]) + INTVAL (operands[2]) == 64 + rnsbg\t%0,%3,%2,63,0 + [(set_attr op_type RIE)]) + +(define_insn *insv_rnsbg_srl + [(set (zero_extract:DI + (match_operand:DI 0 nonimmediate_operand +d) + (match_operand 1 const_int_operand ) + (match_operand 2 const_int_operand )) + (and:DI + (lshiftrt:DI + (match_dup 0) + (match_operand 3 const_int_operand )) + (match_operand:DI 4 nonimmediate_operand d))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 +INTVAL (operands[3]) == 64 - INTVAL (operands[1]) - INTVAL (operands[2]) + rnsbg\t%0,%4,%2,%2+%1-1,64-%2,%1 + [(set_attr op_type RIE)]) + (define_insn *insvmode_mem_reg [(set (zero_extract:W (match_operand:QI 0 memory_operand +Q,S) (match_operand 1 const_int_operand n,n) -- 1.7.7.6
[PATCH 4/7] s390: Add mode attribute for mode bitsize
Constant fold, and less typing than, GET_MODE_BITSIZE with another mode substitution. --- gcc/config/s390/s390.md | 24 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 6474023..b6e1535 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -522,6 +522,9 @@ (define_mode_attr bfstart [(DI s) (SI t)]) (define_mode_attr bfend [(DI e) (SI f)]) +;; In place of GET_MODE_BITSIZE (MODEmode) +(define_mode_attr bitsize [(DI 64) (SI 32) (HI 16) (QI 8)]) + ;; ;;- Compare instructions. ;; @@ -3317,7 +3320,7 @@ operands[1] = adjust_address (operands[1], BLKmode, 0); set_mem_size (operands[1], size); - operands[2] = GEN_INT (GET_MODE_BITSIZE (MODEmode) - bitsize); + operands[2] = GEN_INT (GPR:bitsize - bitsize); operands[3] = GEN_INT (mask); }) @@ -3344,7 +3347,7 @@ operands[1] = adjust_address (operands[1], BLKmode, 0); set_mem_size (operands[1], size); - operands[2] = GEN_INT (GET_MODE_BITSIZE (MODEmode) - bitsize); + operands[2] = GEN_INT (GPR:bitsize - bitsize); operands[3] = GEN_INT (mask); }) @@ -3532,8 +3535,7 @@ } else if (!TARGET_EXTIMM) { - rtx bitcount = GEN_INT (GET_MODE_BITSIZE (DSI:MODEmode) - - GET_MODE_BITSIZE (HQI:MODEmode)); + rtx bitcount = GEN_INT (DSI:bitsize - HQI:bitsize); operands[1] = gen_lowpart (DSI:MODEmode, operands[1]); emit_insn (gen_ashlDSI:mode3 (operands[0], operands[1], bitcount)); @@ -3635,8 +3637,7 @@ { operands[1] = adjust_address (operands[1], BLKmode, 0); set_mem_size (operands[1], GET_MODE_SIZE (QImode)); - operands[2] = GEN_INT (GET_MODE_BITSIZE (MODEmode) -- GET_MODE_BITSIZE (QImode)); + operands[2] = GEN_INT (GPR:bitsize - BITS_PER_UNIT); }) ; @@ -3747,8 +3748,7 @@ } else if (!TARGET_EXTIMM) { - rtx bitcount = GEN_INT (GET_MODE_BITSIZE(DImode) - - GET_MODE_BITSIZE(MODEmode)); + rtx bitcount = GEN_INT (64 - HQI:bitsize); operands[1] = gen_lowpart (DImode, operands[1]); emit_insn (gen_ashldi3 (operands[0], operands[1], bitcount)); emit_insn (gen_lshrdi3 (operands[0], operands[0], bitcount)); @@ -3765,7 +3765,7 @@ { operands[1] = gen_lowpart (SImode, operands[1]); emit_insn (gen_andsi3 (operands[0], operands[1], - GEN_INT ((1 GET_MODE_BITSIZE(MODEmode)) - 1))); +GEN_INT ((1 HQI:bitsize) - 1))); DONE; } }) @@ -3958,8 +3958,8 @@ REAL_VALUE_TYPE cmp, sub; operands[1] = force_reg (BFP:MODEmode, operands[1]); - real_2expN (cmp, GET_MODE_BITSIZE(GPR:MODEmode) - 1, BFP:MODEmode); - real_2expN (sub, GET_MODE_BITSIZE(GPR:MODEmode), BFP:MODEmode); + real_2expN (cmp, GPR:bitsize - 1, BFP:MODEmode); + real_2expN (sub, GPR:bitsize, BFP:MODEmode); emit_cmp_and_jump_insns (operands[1], CONST_DOUBLE_FROM_REAL_VALUE (cmp, BFP:MODEmode), @@ -4676,7 +4676,7 @@ (CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[2]), 'K', \K\) || CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[2]), 'O', \Os\) || CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[2]), 'C', \C\)) -INTVAL (operands[2]) != -((HOST_WIDE_INT)1 (GET_MODE_BITSIZE(MODEmode) - 1)) +INTVAL (operands[2]) != -((HOST_WIDE_INT)1 (bitsize - 1)) @ aghi\t%0,%h2 aghik\t%0,%1,%h2 -- 1.7.7.6
[PATCH 5/7] s390: Implement extzv for z10
--- gcc/config/s390/predicates.md |4 +++ gcc/config/s390/s390-protos.h |1 + gcc/config/s390/s390.c| 16 gcc/config/s390/s390.md | 55 +++-- 4 files changed, 68 insertions(+), 8 deletions(-) diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md index 333457d..e4632b9 100644 --- a/gcc/config/s390/predicates.md +++ b/gcc/config/s390/predicates.md @@ -101,6 +101,10 @@ return true; }) +(define_predicate nonzero_shift_count_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 1, GET_MODE_BITSIZE (mode) - 1 + ;; Return true if OP a valid operand for the LARL instruction. (define_predicate larl_operand diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 79673d6..97c378f 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -110,5 +110,6 @@ extern bool s390_legitimate_address_without_index_p (rtx); extern bool s390_decompose_shift_count (rtx, rtx *, HOST_WIDE_INT *); extern int s390_branch_condition_mask (rtx); extern int s390_compare_and_branch_condition_mask (rtx); +extern bool s390_extzv_shift_ok (int, int, unsigned HOST_WIDE_INT); #endif /* RTX_CODE */ diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 4e22100..52138d7 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -1308,6 +1308,22 @@ s390_contiguous_bitmask_p (unsigned HOST_WIDE_INT in, int size, return true; } +/* Check whether a rotate of ROTL followed by an AND of CONTIG is equivalent + to a shift followed by the AND. In particular, CONTIG should not overlap + the (rotated) bit 0/bit 63 gap. */ + +bool +s390_extzv_shift_ok (int bitsize, int rotl, unsigned HOST_WIDE_INT contig) +{ + int pos, len; + bool ok; + + ok = s390_contiguous_bitmask_p (contig, bitsize, pos, len); + gcc_assert (ok); + + return (rotl = pos || rotl = pos + len + (64 - bitsize)); +} + /* Check whether we can (and want to) split a double-word move in mode MODE from SRC to DST into two single-word moves, moving the subword FIRST_SUBWORD first. */ diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index b6e1535..ae004ac 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3298,15 +3298,25 @@ [(set_attr op_type RS,RSY) (set_attr z10prop z10_super_E1,z10_super_E1)]) +(define_insn extzv + [(set (match_operand:DI 0 register_operand =d) + (zero_extract:DI + (match_operand:DI 1 register_operand d) + (match_operand 2 const_int_operand ) + (match_operand 3 const_int_operand ))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 + risbg\t%0,%1,63-%3-%2,128+63,63-%3-%2 + [(set_attr op_type RIE) + (set_attr z10prop z10_super_E1)]) -(define_insn_and_split *extzvmode +(define_insn_and_split *pre_z10_extzvmode [(set (match_operand:GPR 0 register_operand =d) (zero_extract:GPR (match_operand:QI 1 s_operand QS) - (match_operand 2 const_int_operand n) + (match_operand 2 nonzero_shift_count_operand ) (const_int 0))) (clobber (reg:CC CC_REGNUM))] - INTVAL (operands[2]) 0 -INTVAL (operands[2]) = GET_MODE_BITSIZE (SImode) + !TARGET_Z10 # reload_completed [(parallel @@ -3324,14 +3334,13 @@ operands[3] = GEN_INT (mask); }) -(define_insn_and_split *extvmode +(define_insn_and_split *pre_z10_extvmode [(set (match_operand:GPR 0 register_operand =d) (sign_extract:GPR (match_operand:QI 1 s_operand QS) - (match_operand 2 const_int_operand n) + (match_operand 2 nonzero_shift_count_operand ) (const_int 0))) (clobber (reg:CC CC_REGNUM))] - INTVAL (operands[2]) 0 -INTVAL (operands[2]) = GET_MODE_BITSIZE (SImode) + !TARGET_Z10 # reload_completed [(parallel @@ -6034,6 +6043,36 @@ (clobber (reg:CC CC_REGNUM))])] s390_narrow_logical_operator (AND, operands[0], operands[1]);) +;; These two are what combine generates for (ashift (zero_extract)). +(define_insn *extzv_mode_srl + [(set (match_operand:DSI 0 register_operand =d) + (and:DSI (lshiftrt:DSI + (match_operand:DSI 1 register_operand d) + (match_operand:DSI 2 nonzero_shift_count_operand )) + (match_operand:DSI 3 contiguous_bitmask_operand ))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 + /* Note that even for the SImode pattern, the rotate is always DImode. */ +s390_extzv_shift_ok (bitsize, 64 - INTVAL (operands[2]), + INTVAL (operands[3])) + risbg\t%0,%1,%bfstart3,128+%bfend3,64-%2 + [(set_attr op_type RIE) + (set_attr z10prop z10_super_E1)]) + +(define_insn *extzv_mode_sll + [(set (match_operand:DSI 0 register_operand =d) + (and:DSI (ashift:DSI + (match_operand:DSI 1 register_operand d) +
[PATCH 1/7] s390: Constraints, predicates, and op letters for contiguous bitmasks
--- gcc/config/s390/constraints.md | 11 - gcc/config/s390/predicates.md |6 +++ gcc/config/s390/s390.c | 92 +++- gcc/config/s390/s390.md| 48 + 4 files changed, 90 insertions(+), 67 deletions(-) diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md index 8564b66..9d416ad 100644 --- a/gcc/config/s390/constraints.md +++ b/gcc/config/s390/constraints.md @@ -45,6 +45,8 @@ ;; H,Q: mode of the part ;; D,S,H: mode of the containing operand ;; 0,F: value of the other parts (F - all bits set) +;; -- +;; xx[DS]q satisfies s390_contiguous_bitmask_p for DImode or SImode ;; ;; The constraint matches if the specified part of a constant ;; has a value different from its other parts. If the letter x @@ -330,8 +332,15 @@ (and (match_code const_int) (match_test s390_N_constraint_str (\xQH0\, ival +(define_constraint NxxDq + @internal + (and (match_code const_int) + (match_test s390_contiguous_bitmask_p (ival, 64, NULL, NULL - +(define_constraint NxxSq + @internal + (and (match_code const_int) + (match_test s390_contiguous_bitmask_p (ival, 32, NULL, NULL ;; ;; Double-letter constraints starting with O follow. diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md index 9d619fb..333457d 100644 --- a/gcc/config/s390/predicates.md +++ b/gcc/config/s390/predicates.md @@ -154,6 +154,12 @@ return false; }) +(define_predicate contiguous_bitmask_operand + (match_code const_int) +{ + return s390_contiguous_bitmask_p (INTVAL (op), GET_MODE_BITSIZE (mode), NULL, NULL); +}) + ;; operators -- ;; Return nonzero if OP is a valid comparison operator diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index d67c0eb..4e22100 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -5286,28 +5286,35 @@ print_operand_address (FILE *file, rtx addr) 'C': print opcode suffix for branch condition. 'D': print opcode suffix for inverse branch condition. 'E': print opcode suffix for branch on index instruction. -'J': print tls_load/tls_gdcall/tls_ldcall suffix 'G': print the size of the operand in bytes. +'J': print tls_load/tls_gdcall/tls_ldcall suffix +'M': print the second word of a TImode operand. +'N': print the second word of a DImode operand. 'O': print only the displacement of a memory reference. 'R': print only the base register of a memory reference. 'S': print S-type memory reference (base+displacement). -'N': print the second word of a DImode operand. -'M': print the second word of a TImode operand. 'Y': print shift count operand. 'b': print integer X as if it's an unsigned byte. 'c': print integer X as if it's an signed byte. -'x': print integer X as if it's an unsigned halfword. +'e': end of DImode contiguous bitmask X. +'f': end of SImode contiguous bitmask X. 'h': print integer X as if it's a signed halfword. 'i': print the first nonzero HImode part of X. 'j': print the first HImode part unequal to -1 of X. 'k': print the first nonzero SImode part of X. 'm': print the first SImode part unequal to -1 of X. -'o': print integer X as if it's an unsigned 32bit word. */ +'o': print integer X as if it's an unsigned 32bit word. +'s': start of DImode contiguous bitmask X. +'t': start of SImode contiguous bitmask X. +'x': print integer X as if it's an unsigned halfword. +*/ void print_operand (FILE *file, rtx x, int code) { + HOST_WIDE_INT ival; + switch (code) { case 'C': @@ -5486,30 +5493,57 @@ print_operand (FILE *file, rtx x, int code) break; case CONST_INT: - if (code == 'b') -fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (x) 0xff); - else if (code == 'c') -fprintf (file, HOST_WIDE_INT_PRINT_DEC, ((INTVAL (x) 0xff) ^ 0x80) - 0x80); - else if (code == 'x') -fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (x) 0x); - else if (code == 'h') -fprintf (file, HOST_WIDE_INT_PRINT_DEC, ((INTVAL (x) 0x) ^ 0x8000) - 0x8000); - else if (code == 'i') - fprintf (file, HOST_WIDE_INT_PRINT_DEC, -s390_extract_part (x, HImode, 0)); - else if (code == 'j') - fprintf (file, HOST_WIDE_INT_PRINT_DEC, -s390_extract_part (x, HImode, -1)); - else if (code == 'k') - fprintf (file, HOST_WIDE_INT_PRINT_DEC, -s390_extract_part (x, SImode, 0)); - else if (code == 'm') - fprintf (file, HOST_WIDE_INT_PRINT_DEC, -s390_extract_part (x, SImode, -1)); - else if (code == 'o') - fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (x) 0x); - else -fprintf (file,
[PATCH 6/7] s390: Generate rxsbg, and shifted forms of rosbg
--- gcc/config/s390/s390.md | 63 +- 1 files changed, 56 insertions(+), 7 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index ae004ac..d733062 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -384,6 +384,9 @@ ;; the same template. (define_code_iterator SHIFT [ashift lshiftrt]) +;; This iterator allow r[ox]sbg to be defined with the same template +(define_code_iterator IXOR [ior xor]) + ;; This iterator and attribute allow to combine most atomic operations. (define_code_iterator ATOMIC [and ior xor plus minus mult]) (define_code_iterator ATOMIC_Z196 [and ior xor plus]) @@ -3402,15 +3405,61 @@ [(set_attr op_type RIE) (set_attr z10prop z10_super_E1)]) -; and op1 with a mask being 1 for the selected bits and 0 for the rest -(define_insn *insvmode_or_z10_noshift - [(set (match_operand:GPR 0 nonimmediate_operand =d) - (ior:GPR (and:GPR (match_operand:GPR 1 nonimmediate_operand d) - (match_operand:GPR 2 contiguous_bitmask_operand )) - (match_operand:GPR 3 nonimmediate_operand 0))) +(define_insn *rnoxasbg_mode_noshift + [(set (match_operand:DSI 0 nonimmediate_operand =d) + (IXOR:DSI + (and:DSI (match_operand:DSI 1 nonimmediate_operand d) + (match_operand:DSI 2 contiguous_bitmask_operand )) + (match_operand:DSI 3 nonimmediate_operand 0))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 + rnoxasbg\t%0,%1,%bfstart2,%bfend2,0 + [(set_attr op_type RIE)]) + +(define_insn *rnoxasbg_di_rotl + [(set (match_operand:DI 0 nonimmediate_operand =d) + (IXOR:DI + (and:DI + (rotate:DI + (match_operand:DI 1 nonimmediate_operand d) + (match_operand:DI 3 const_int_operand )) +(match_operand:DI 2 contiguous_bitmask_operand )) + (match_operand:DI 4 nonimmediate_operand 0))) (clobber (reg:CC CC_REGNUM))] TARGET_Z10 - rosbg\t%0,%1,%bfstart2,%bfend2,0 + rnoxasbg\t%0,%1,%bfstart2,%bfend2,%b3 + [(set_attr op_type RIE)]) + +(define_insn *rnoxasbg_mode_srl + [(set (match_operand:DSI 0 nonimmediate_operand =d) + (IXOR:DSI + (and:DSI + (lshiftrt:DSI + (match_operand:DSI 1 nonimmediate_operand d) + (match_operand:DSI 3 nonzero_shift_count_operand )) +(match_operand:DSI 2 contiguous_bitmask_operand )) + (match_operand:DSI 4 nonimmediate_operand 0))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 +s390_extzv_shift_ok (bitsize, 64 - INTVAL (operands[3]), + INTVAL (operands[2])) + rnoxasbg\t%0,%1,%bfstart2,%bfend2,64-%3 + [(set_attr op_type RIE)]) + +(define_insn *rnoxasbg_mode_sll + [(set (match_operand:DSI 0 nonimmediate_operand =d) + (IXOR:DSI + (and:DSI + (ashift:DSI + (match_operand:DSI 1 nonimmediate_operand d) + (match_operand:DSI 3 nonzero_shift_count_operand )) +(match_operand:DSI 2 contiguous_bitmask_operand )) + (match_operand:DSI 4 nonimmediate_operand 0))) + (clobber (reg:CC CC_REGNUM))] + TARGET_Z10 +s390_extzv_shift_ok (bitsize, INTVAL (operands[3]), + INTVAL (operands[2])) + rnoxasbg\t%0,%1,%bfstart2,%bfend2,%3 [(set_attr op_type RIE)]) (define_insn *insvmode_mem_reg -- 1.7.7.6
[PATCH 3/7] s390: Use risbgz for AND.
--- gcc/config/s390/s390.md | 107 +++ 1 files changed, 62 insertions(+), 45 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 2677fb2..6474023 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -5946,44 +5946,50 @@ (define_insn *anddi3_cc [(set (reg CC_REGNUM) -(compare (and:DI (match_operand:DI 1 nonimmediate_operand %0,d, 0) - (match_operand:DI 2 general_operand d,d,RT)) - (const_int 0))) - (set (match_operand:DI 0 register_operand =d,d, d) +(compare + (and:DI (match_operand:DI 1 nonimmediate_operand %0,d, 0,d) + (match_operand:DI 2 general_operand d,d,RT,NxxDq)) + (const_int 0))) + (set (match_operand:DI 0 register_operand =d,d, d,d) (and:DI (match_dup 1) (match_dup 2)))] - s390_match_ccmode(insn, CCTmode) TARGET_ZARCH + TARGET_ZARCH s390_match_ccmode(insn, CCTmode) @ ngr\t%0,%2 ngrk\t%0,%1,%2 - ng\t%0,%2 - [(set_attr op_type RRE,RRF,RXY) - (set_attr cpu_facility *,z196,*) - (set_attr z10prop z10_super_E1,*,z10_super_E1)]) + ng\t%0,%2 + risbg\t%0,%1,%s2,128+%e2,0 + [(set_attr op_type RRE,RRF,RXY,RIE) + (set_attr cpu_facility *,z196,*,z10) + (set_attr z10prop z10_super_E1,*,z10_super_E1,z10_super_E1)]) (define_insn *anddi3_cconly [(set (reg CC_REGNUM) -(compare (and:DI (match_operand:DI 1 nonimmediate_operand %0,d, 0) - (match_operand:DI 2 general_operand d,d,RT)) +(compare + (and:DI (match_operand:DI 1 nonimmediate_operand %0,d, 0,d) + (match_operand:DI 2 general_operand d,d,RT,NxxDq)) (const_int 0))) - (clobber (match_scratch:DI 0 =d,d, d))] - s390_match_ccmode(insn, CCTmode) TARGET_ZARCH + (clobber (match_scratch:DI 0 =d,d, d,d))] + TARGET_ZARCH +s390_match_ccmode(insn, CCTmode) /* Do not steal TM patterns. */ s390_single_part (operands[2], DImode, HImode, 0) 0 @ ngr\t%0,%2 ngrk\t%0,%1,%2 - ng\t%0,%2 - [(set_attr op_type RRE,RRF,RXY) - (set_attr cpu_facility *,z196,*) - (set_attr z10prop z10_super_E1,*,z10_super_E1)]) + ng\t%0,%2 + risbg\t%0,%1,%s2,128+%e2,0 + [(set_attr op_type RRE,RRF,RXY,RIE) + (set_attr cpu_facility *,z196,*,z10) + (set_attr z10prop z10_super_E1,*,z10_super_E1,z10_super_E1)]) (define_insn *anddi3 [(set (match_operand:DI 0 nonimmediate_operand -=d,d,d,d,d,d,d,d,d,d, d, AQ,Q) -(and:DI (match_operand:DI 1 nonimmediate_operand -%d,o,0,0,0,0,0,0,0,d, 0, 0,0) -(match_operand:DI 2 general_operand -M, M,N0HDF,N1HDF,N2HDF,N3HDF,N0SDF,N1SDF,d,d,RT,NxQDF,Q))) +=d,d,d,d,d,d,d,d,d,d, d,d, AQ,Q) +(and:DI + (match_operand:DI 1 nonimmediate_operand +%d,o,0,0,0,0,0,0,0,d, 0,d,0,0) + (match_operand:DI 2 general_operand +M, M,N0HDF,N1HDF,N2HDF,N3HDF,N0SDF,N1SDF,d,d,RT,NxxDq,NxQDF,Q))) (clobber (reg:CC CC_REGNUM))] TARGET_ZARCH s390_logical_operator_ok_p (operands) @ @@ -5998,10 +6004,11 @@ ngr\t%0,%2 ngrk\t%0,%1,%2 ng\t%0,%2 + risbg\t%0,%1,%s2,128+%e2,0 # # - [(set_attr op_type RRE,RXE,RI,RI,RI,RI,RIL,RIL,RRE,RRF,RXY,SI,SS) - (set_attr cpu_facility *,*,*,*,*,*,extimm,extimm,*,z196,*,*,*) + [(set_attr op_type RRE,RXE,RI,RI,RI,RI,RIL,RIL,RRE,RRF,RXY,RIE,SI,SS) + (set_attr cpu_facility *,*,*,*,*,*,extimm,extimm,*,z196,*,z10,*,*) (set_attr z10prop *, *, z10_super_E1, @@ -6013,6 +6020,7 @@ z10_super_E1, *, z10_super_E1, +z10_super_E1, *, *)]) @@ -6033,10 +6041,12 @@ (define_insn *andsi3_cc [(set (reg CC_REGNUM) -(compare (and:SI (match_operand:SI 1 nonimmediate_operand %0,0,d,0,0) - (match_operand:SI 2 general_operand Os,d,d,R,T)) - (const_int 0))) - (set (match_operand:SI 0 register_operand =d,d,d,d,d) +(compare + (and:SI + (match_operand:SI 1 nonimmediate_operand %0,0,d,0,0,d) +(match_operand:SI 2 general_operand Os,d,d,R,T,NxxSq)) + (const_int 0))) + (set (match_operand:SI 0 register_operand =d,d,d,d,d,d) (and:SI (match_dup 1) (match_dup 2)))] s390_match_ccmode(insn, CCTmode) @ @@ -6044,17 +6054,21 @@ nr\t%0,%2 nrk\t%0,%1,%2 n\t%0,%2 - ny\t%0,%2 - [(set_attr op_type
Re: [cxx-conversion] Make double_int a class with methods and operators. (issue6443093)
On Aug 9, 2012, at 5:00 PM, Michael Matz wrote: On Thu, 9 Aug 2012, Mike Stump wrote: On Aug 9, 2012, at 8:19 AM, Michael Matz wrote: Hmm. And maintaining a cache is faster than passing/returning/manipulating two registers? For the most part, we merely mirror existing code, check out lookup_const_double and immed_double_const. No, I won't without patches on this list. Ah, we are discussing the code in the gcc tree currently. You _can_ comment on it, if you like to. I was only pointing out that this choice we didn't make nor deviate from the code in the top of the tree. If you think it is wrong to cache it, then talking about the code in the top of the tree is the right place to discuss it. Though, you don't have to if you don't want to. You keep repeating bragging Such hostility. Why? I don't get it. I _asked_ about when the cxx branch was going to land, I stated that I liked non-mutating interfaces, I gave a heads up that we have a wide-int class to replace double-int for ints. I _only_ gave a heads up because of the submitted change to the cxx branch conflicts on a larger than expected scale with the wide-int change. I think giving a heads up before the conflict happens is good citizenship. I mean, preparing the audience for an upcoming _suggested_ change in data structure of course is fine. But argueing as if the change happenend already, and what's more concerning, as if the change was even already suggested and agreed upon even though that's not the case, is just bad style. So, let me get this straight, alerting people that I have a patch that conflicts with another posted patch is, bad style? Odd. I saw it listed on page 10 of the etiquette guide, maybe you could update the guide for us. I would suggest to stay conservative about whatever you have (except if it's momentarily materializing), and _especially don't argue against or for or not against or for whatever improvement is suggested Ah, that's a misunderstanding on your part. I was not arguing for, or against the double_int changes. In fact, I'm very supportive of those changes and the entire cxx branch, not that you'd know that, as I think all of the changes are a slam dunk and don't need any support from me. The :-( in the email that you read, was just a comment that someone is going to have to resolve conflicts. Now that we know the timing of the cxx branch landing, we expect, we'll handle the conflicts on the wide-int side. If the timing was different, we'd land the wide-int change first, then the :-( in the heads up comment would be read more as, we're sorry, but we've just scrambled the tree on you, so sorry. Let me be perfectly clear, I support the double_int changes and the entire cxx-conversion branch. No work I may or may not have matters or should be considered in reviewing any patches. I'm a firm believer in the first in, wins method of resolving conflicts. Sorry if anyone thought I was objecting in anyway to the double_int work. Nobody has seen it yet, Actually, that's not true; but, it doesn't matter any. so you can't expect to get any feedback on it. I don't recall asking for feedback on it. The feedback I requested that you quote above, was feedback on the code in the top of the tree.
Re: [PATCH 2/3] Incorporate aggregate jump functions into inlining analysis
Hi, this patch uses the aggregate jump functions created by the previous patch in the series to determine benefits of inlining a particular call graph edge. It has not changed much since the last time I posted it, except for the presence of by_ref flags and removal of checks required by TBAA which we now do not use. The patch works in fairly straightforward way. It ads two flags to struct condition to specify it actually refers to an aggregate passed by value or something passed by reference, in both cases at a particular offset, also newly stored in the structures. Functions which build the predicates specifying under which conditions CFG edges will be taken or individual statements are actually executed then simply also look whether a value comes from an aggregate passed to us in a parameter (either by value or reference) and if so, create appropriate conditions. Later on, predicates are evaluated as before, we only also look at aggregate contents of the jump functions of the edge we are considering to inline when evaluating the predicates, and also remap the offsets of the jump functions when remapping over an ancestor jump function. This patch alone makes us inline the function bar in testcase of PR 48636 in comment #4. It also passes bootstrap and testing on x86_64-linux. I successfully LTO-built Firefox with it too. Thanks for all comments and suggestions, Martin 2012-07-31 Martin Jambor mjam...@suse.cz PR fortran/48636 * ipa-inline.h (condition): New fields offset, agg_contents and by_ref. * ipa-inline-analysis.c (agg_position_info): New type. (add_condition): New parameter aggpos, also store agg_contents, by_ref and offset. (dump_condition): Also dump aggregate conditions. (evaluate_conditions_for_known_args): Also handle aggregate conditions. New parameter known_aggs. (evaluate_properties_for_edge): Gather known aggregate contents. (inline_node_duplication_hook): Pass NULL known_aggs to evaluate_conditions_for_known_args. (unmodified_parm): Split into unmodified_parm and unmodified_parm_1. (unmodified_parm_or_parm_agg_item): New function. (set_cond_stmt_execution_predicate): Handle values passed in aggregates. (set_switch_stmt_execution_predicate): Likewise. (will_be_nonconstant_predicate): Likewise. (estimate_edge_devirt_benefit): Pass new parameter known_aggs to ipa_get_indirect_edge_target. (estimate_calls_size_and_time): New parameter known_aggs, pass it recrsively to itself and to estimate_edge_devirt_benefit. (estimate_node_size_and_time): New vector known_aggs, pass it o functions which need it. (remap_predicate): New parameter offset_map, use it to remap aggregate conditions. (remap_edge_summaries): New parameter offset_map, pass it recursively to itself and to remap_predicate. (inline_merge_summary): Also create and populate vector offset_map. (do_estimate_edge_time): New vector of known aggregate contents, passed to functions which need it. (inline_read_section): Stream new fields of condition. (inline_write_summary): Likewise. * ipa-cp.c (ipa_get_indirect_edge_target): Also examine the aggregate contents. Let all local callers pass NULL for known_aggs. * testsuite/gfortran.dg/pr48636.f90: New test. OK with the following changes. I plan to push out my inline hints code, so it would be nice if you commited soon so I cn resolve conflicts on my side. Index: src/gcc/ipa-inline.h === *** src.orig/gcc/ipa-inline.h --- src/gcc/ipa-inline.h *** along with GCC; see the file COPYING3. *** 28,36 --- 28,45 typedef struct GTY(()) condition { + /* If agg_contents is set, this is the offset from which the used data was +loaded. */ + HOST_WIDE_INT offset; tree val; int operand_num; enum tree_code code; + /* Set if the used data were loaded from an aggregate parameter or from +data received by reference. */ + unsigned agg_contents : 1; + /* If agg_contents is set, this differentiates between loads from data +passed by reference and by value. */ + unsigned by_ref : 1; Do you have any data on memory usage? I was originally concerned about memory use of the whole predicate thingy on WPA level. Eventually we could add simple inheritance on conditions and sort them into mutiple vectors if needed. But I assume it is OK or we will work out on Mozilla builds soonish. One obvious thing is to patch CODE and the bitfields so we fit in 3 64bit words. *** dump_condition (FILE *f, conditions cond *** 519,524 --- 554,561 c = VEC_index (condition, conditions, cond -
Re: [PATCH 3/3] Compute predicates for phi node results in ipa-inline-analysis.c
Hi, this third patch is basically a proof-of-concept aiming at alleviating the following code found in Fortran functions when they look at the contents of array descriptors: bb 2: stride.156_7 = strain_tensor_6(D)-dim[0].stride; if (stride.156_7 != 0) goto bb 3; else goto bb 4; bb 3: bb 4: # stride.156_4 = PHI stride.156_7(3), 1(2) and stride.156_4 is then used for other computations. Currently we compute a predicate for SSA name stride.156_7 but the PHI node stops us from having one for stride.156_4 and those computed from it. This patch looks at phi nodes, and if its pairs of predecessors have the same nearest common dominator, and the condition there is known to be described by a predicate (computed either by set_cond_stmt_execution_predicate or, set_switch_stmt_execution_predicate, we depend on knowing how exactly they behave), we use the parameter and offset from the predicate condition and create one for the PHI node result, provided the arguments of a phi node allow that, of course. Consider: b==0? T/ \F /\ / \ a==0? a==0? T/ \F T/ \F ... \ / ... \ / PHI In this case vale of PHI is determined by a==0, but the condition in common dominator would be b==0. We can work this out from control dependency relation or handle it by propagation engine, but perhaps it is overkill. What about special casing (half) diamond CFG to start with? Path is OK with that change. Honza
Re: [PATCH][7/6] Allow anonymous SSA names
This converts most users of create_tmp_{var,reg} to use anonymous SSA names. To give you one more reason to look at 6/6 ;) Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Very cool. Thanks for the hard work. Did you have time to test the memory use effets? (I have to read whole series perhaps it is there :) Honza
Re: [PATCH, testsuite] New effective target long_neq_int
On 08/09/2012 06:46 PM, William J. Schmidt wrote: As suggested by Janis regarding testsuite/gcc.dg/tree-ssa/slsr-30.c, this patch adds a new effective target for machines having long and int of differing sizes. Tested on powerpc64-unknown-linux-gnu, where the test passes for -m64 and is skipped for -m32. Ok for trunk? OK! Janis Thanks, Bill doc: 2012-08-09 Bill Schmidt wschm...@linux.vnet.ibm.com * sourcebuild.texi: Document long_neq_int effective target. testsuite: 2012-08-09 Bill Schmidt wschm...@linux.vnet.ibm.com * lib/target-supports.exp (check_effective_target_long_neq_int): New. * gcc.dg/tree-ssa/slsr-30.c: Check for long_neq_int effective target. Index: gcc/doc/sourcebuild.texi === --- gcc/doc/sourcebuild.texi (revision 190260) +++ gcc/doc/sourcebuild.texi (working copy) @@ -1303,6 +1303,9 @@ Target has @code{int} that is at 32 bits or longer @item int16 Target has @code{int} that is 16 bits or shorter. +@item long_neq_int +Target has @code{int} and @code{long} with different sizes. + @item large_double Target supports @code{double} that is longer than @code{float}. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 190260) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -1689,6 +1689,15 @@ proc check_effective_target_llp64 { } { }] } +# Return 1 if long and int have different sizes, +# 0 otherwise. + +proc check_effective_target_long_neq_int { } { +return [check_no_compiler_messages long_ne_int object { + int dummy[sizeof (int) != sizeof (long) ? 1 : -1]; +}] +} + # Return 1 if the target supports long double larger than double, # 0 otherwise. Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c === --- gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c (revision 190260) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c (working copy) @@ -1,7 +1,7 @@ /* Verify straight-line strength reduction fails for simple integer addition with casts thrown in when -fwrapv is used. */ -/* { dg-do compile { target { ! { ilp32 } } } } */ +/* { dg-do compile { target { long_neq_int } } } */ /* { dg-options -O3 -fdump-tree-dom2 -fwrapv } */ long
RE: [PATCH,i386] fma,fma4 and xop flags
-mxop implies -mfma4, but reverse is not true. I think this handling went in for bdver1. But, with bdver2, we have both fma and fma4. So for bdver2, -mxop should not be enabling one of them. if someone set -mfma4 together with -mfma on the command line, we should NOT disable selected ISA behind user's back If both -mfma4 and -mfma are enabled, GCC outputs fma4 instructions. This, I think is because fma4 instruction patterns are read before fma instruction patterns from the .md files. So, enabling both -mfma4 and -mfma is not good for bdver2. Moreover, if user tries to use, -mfma -mno-fma4 -mxop, the order in which these options are used becomes crucial. -mxop enables -mfma4 and by instruction patterns fma4 instructions gets listed in the assembly file. For the below test, double a,b,c,d; int fn(){ a = b + c * d ; return a; } #1) Using options -O2 -mno-fma4 -mfma -mxop outputs fma4. (vfmaddsdb(%rip), %xmm2, %xmm1, %xmm0) #2) Using options -O2 -mfma -mno-fma4 -mxop outputs fma4. (vfmaddsdb(%rip), %xmm2, %xmm1, %xmm0) #3) Using options -mxop -mno-fma4 -mfma outpts fma. (vfmadd132sd d(%rip), %xmm1, %xmm0) As we see the order in which the options are used becomes crucial. This is confusing. I haven't really tested other implied options. But, I suspect similar phenomenon in those cases too. IMO, we can directly go by the CPUID flags and enable the flags. This will be a one to one mapping and leave the user with lot more liberty. Please let me know your opinion. Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Friday, August 10, 2012 1:21 AM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH,i386] fma,fma4 and xop flags On Wed, Aug 8, 2012 at 1:31 PM, ganesh.gopalasubraman...@amd.com wrote: Bdver2 cpu supports both fma and fma4 instructions. Previous to patch, option -mno-xop removes -mfma4. Similarly, option -mno-fma4 removes -mxop. It looks to me that there is some misunderstanding. AFAICS: -mxop implies -mfma4, but reverse is not true. Please see #define OPTION_MASK_ISA_FMA4_SET \ (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_SSE4A_SET \ | OPTION_MASK_ISA_AVX_SET) #define OPTION_MASK_ISA_XOP_SET \ (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET) So, -mxop sets -mfma4, etc ..., but -mfma4 does NOT enable -mxop. OTOH, #define OPTION_MASK_ISA_FMA4_UNSET \ (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_XOP_UNSET) #define OPTION_MASK_ISA_XOP_UNSET OPTION_MASK_ISA_XOP -mno-fma4 implies -mno-xop, but again reverse is not true. Thus, -mno-xop does NOT imply -mno-fma4. So, the patch conditionally disables -mfma or -mfma4. Enabling -mxop is done by also checking -mfma. Please note that conditional handling of ISA flags belongs to ix86_option_override_internal. However, if someone set -mfma4 together with -mfma on the command line, we should NOT disable selected ISA behind user's back, in the same way as we don't disable anything with -march=i386 -msse4. With -march=bdver2, we already marked that only fma is supported, and if user selected -march=bdver2 -mfma4 on the command line, we shouldn't disable anything. Uros.