Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output
On 2013-04-16 13:55, Jakub Jelinek wrote: On Tue, Apr 16, 2013 at 03:41:52PM +0400, Maksim Kuznetsov wrote: Richard, Jeff, could you please have a look? I wonder if it %{ and %} shouldn't be better handled in final.c for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific. Yes, please. r~
Re: [patch] Fix PR middle-end/56474
On Wed, Apr 17, 2013 at 1:12 AM, Eric Botcazou ebotca...@adacore.com wrote: For the C family I found exactly one - the layout_type case, and fixed it in the FEs by making empty arrays use [1, 0] domains or signed domains (I don't remember exactly). I believe the layout_type change was to make Ada happy. I'm skeptical, I had to narrow down the initial kludge because it hurted Ada. It may be that enabling overflow detection for even unsigned sizetype was because of Ada as well. After all only Ada changed its sizetype sign recently. Not true, overflow detection has _always_ been enabled for sizetypes. But sizetypes, including unsigned ones, were sign-extended so 0 -1 didn't overflow and we need that behavior back for Ada to work properly, Yeah, well - they were effectively signed. I don't like special casing 0 - 1 in a general compute function. Maybe you want to use size_diffop for the computation? That would result in a signed result and thus no overflow for 0 - 1. But it's not a general compute function, it's size_binop which is meant to be used for sizetypes only and which forces overflow on unsigned types. We need overflow detection for sizetypes but we can also tailor it to fit our needs. I'm not against tailoring it to fit our needs - I'm just against special casing behavior for specific values. That just sounds wrong. Maybe we should detect overflow as if the input and output were signed while computing an unsigned result. As far as I can see int_const_binop_1 does detect overflow as if operations were signed (it passes 'false' as uns to all double-int operations rather than TYPE_UNSIGNED). For example sub_with_overflow simply does neg_double (b.low, b.high, ret.low, ret.high); add_double (low, high, ret.low, ret.high, ret.low, ret.high); *overflow = OVERFLOW_SUM_SIGN (ret.high, b.high, high); which I believe is wrong. Shouldn't it be neg_double (b.low, b.high, ret.low, ret.high); HOST_WIDE_INT tem = ret.high; add_double (low, high, ret.low, ret.high, ret.low, ret.high); *overflow = OVERFLOW_SUM_SIGN (ret.high, tem, high); ? Because we are computing a + (-b) and thus OVERFLOW_SUM_SIGN expects the sign of a and -b, not a and b to verify against the sign of ret. The other option is to for example disable overflow handling for _all_ constants and MINUS_EXPR (and then please PLUS_EXPR as well) in size_binop. Maybe it's only the MULT_EXPR overflow we want to know (byte-to-bit conversion / element size scaling IIRC). Well, we just need 0 - 1 because of the way we compute size expressions for variable-sized arrays. I'm sceptical. Where do you compute the size expression for variable-sized arrays? I suppose with the testcase in the initial patch I can then inspect myself what actually happens? Thanks, Richard. -- Eric Botcazou
Re: [patch] simplify emit_delay_sequence
This patch is also necessary for my new delay-slot scheduler to keep basic block boundaries correctly up-to-date. The emit-rtl API does that already. Cross-tested powerpc64 x mips. Currently running bootstraptest on sparc64-unknown-linux-gnu. OK if it passes? Yes, modulo @@ -538,6 +502,8 @@ emit_delay_sequence (rtx insn, rtx list, int lengt INSN_LOCATION (seq_insn) = INSN_LOCATION (tem); INSN_LOCATION (tem) = 0; + /* Remove any REG_DEAD notes because we can't rely on them now +that the insn has been moved. */ for (note = REG_NOTES (tem); note; note = next) { next = XEXP (note, 1); Did you mean to move the comment instead of duplicating it? -- Eric Botcazou
Re: [patch] Fix ICE during RTL expansion at -O1
+ if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) +goto may_overlap; ick, can TREE_CODE (type1) != RECORD_TYPE happen as well here? Please add a comment similar to the Fortran ??? above. It can happen because we stop at unions (and qualified unions) and for them we cannot disambiguate based on the fields. I'll add a regular comment. Can you please also add at least one testcase as gcc.dg/tree-ssa/ssa-fre-??.c that tests the functionality of this and that wasn't handled before? I suppose it would be sth like struct S { int i; int j; }; struct U { struct S a[10]; } u; u.a[n].i= i; u.a[n].j = j; return u.a[n].i; where we miss to CSE the load from u.a[n].i. Yes, the patch does eliminate the redundant load in .fre1: u.a[n_2(D)].i = i_3(D); u.a[n_2(D)].j = j_5(D); _7 = u.a[n_2(D)].i; return _7; becomes: u.a[n_2(D)].i = i_3(D); u.a[n_2(D)].j = j_5(D); _7 = i_3(D); return _7; Otherwise the patch is ok. Thanks. -- Eric Botcazou
Re: [PATCH] Add a new option -fstack-protector-strong
On 04/17/2013 02:49 AM, Han Shen wrote: + if (flag_stack_protect == 3) +cpp_define (pfile, __SSP_STRONG__=3); if (flag_stack_protect == 2) cpp_define (pfile, __SSP_ALL__=2); 3 and 2 should be replaced by SPCT_FLAG_STRONG and SPCT_FLAG_ALL. I define these SPCT_FLAG_XXX in cfgexpand.c locally, so they are not visible to c-cppbuiltin.c, do you suggest define these inside c-cppbuiltin.c also? I see. Let's use the constants for now. Indentation is off (unless both mail clients I tried are clobbering your patch). I think the GNU coding style prohibits the braces around the single-statement body of the outer 'for. Done with indentation properly on and removed the braces. (GMail composing window drops all the tabs when pasting... I have to use Thunderbird to paste the patch. Hope it is right this time) Thunderbird mangles patches as well, but I was able to repair the damage. When using Thunderbird, please send the patch as a text file attachment. You can put the changelog snippets at the beginning of the file as well. This way, everything is sent out unchanged. Can you make the conditional more similar to the comment, perhaps using a switch statement on the value of the flag_stack_protect variable? That's going to be much easier to read. Re-coded. Now using 'switch-case'. Thanks. I think the comment is now redundant because it matches the code almost word-for-word. 8-) No for 'struct-returning' functions. But I regard this not an issue --- at the programming level, there is no way to get one's hand on the address of a returned structure --- struct Node foo(); struct Node *p = foo(); // compiler error - lvalue required as unary '' operand. C++ const references can bind to rvalues. But I'm more worried about the interaction with the return value optimization. Consider this C++ code: struct S { S(); int a; int b; int c; int d; int e; }; void f1(int *); S f2() { S s; f1(s.a); return s; } S g2(); void g3() { S s = g2(); } void g3b(const S); void g3b() { g3b(g2()); } With your patch and -O2 -fstack-protector-strong, this generates the following assembly: .globl _Z2f2v .type _Z2f2v, @function _Z2f2v: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq%rdi, %rbx call_ZN1SC1Ev movq%rbx, %rdi call_Z2f1Pi movq%rbx, %rax popq%rbx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size _Z2f2v, .-_Z2f2v .p2align 4,,15 .globl _Z2g3v .type _Z2g3v, @function _Z2g3v: .LFB1: .cfi_startproc subq$40, %rsp .cfi_def_cfa_offset 48 movq%rsp, %rdi call_Z2g2v addq$40, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE1: .size _Z2g3v, .-_Z2g3v .p2align 4,,15 .globl _Z3g3bv .type _Z3g3bv, @function _Z3g3bv: .LFB2: .cfi_startproc subq$40, %rsp .cfi_def_cfa_offset 48 movq%rsp, %rdi movq%fs:40, %rax movq%rax, 24(%rsp) xorl%eax, %eax call_Z2g2v movq%rsp, %rdi call_Z3g3bRK1S movq24(%rsp), %rax xorq%fs:40, %rax jne .L9 addq$40, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L9: .cfi_restore_state .p2align 4,,6 call__stack_chk_fail .cfi_endproc .LFE2: .size _Z3g3bv, .-_Z3g3bv Here, g3b() is correctly instrumented, and f2() does not need instrumentation (because space for the returned object is not part of the local frame). But an address on the stack escapes in g3() and is used for the return value of the call to g2(). This requires instrumentation, which is missing in this example. I suppose this can be handled in a follow-up patch if necessary. ChangeLog and patch below -- gcc/ChangeLog 2013-04-16 Han Shen shen...@google.com * cfgexpand.c (record_or_union_type_has_array_p): Helper function to check if a record or union contains an array field. I think the GNU convention is to write only this: * cfgexpand.c (record_or_union_type_has_array_p): New function. (expand_used_vars): Add logic handling '-fstack-protector-strong'. * common.opt (fstack-protector-all): New option. Should be fstack-protector-strong. -- Florian Weimer / Red Hat Product Security Team
RE: [PATCH, AArch64] Compare instruction in shift_extend mode
Hi, I suggest for this one test case either making it compile only and dropping main() such that the pattern match only looks in the assembled output of the cmp_* functions The testcase will check only for assembly pattern of the instruction as per your suggestion. Please find attached the modified patch let me know if there should be any further modifications in it. Thanks, Naveen --- gcc/config/aarch64/aarch64.md 2013-04-17 11:18:29.453576713 +0530 +++ gcc/config/aarch64/aarch64.md 2013-04-17 15:22:36.161492471 +0530 @@ -2311,6 +2311,18 @@ (set_attr mode GPI:MODE)] ) +(define_insn *cmp_swp_optabALLX:mode_shft_GPI:mode + [(set (reg:CC_SWP CC_REGNUM) + (compare:CC_SWP (ashift:GPI + (ANY_EXTEND:GPI + (match_operand:ALLX 0 register_operand r)) + (match_operand:QI 1 aarch64_shift_imm_mode n)) + (match_operand:GPI 2 register_operand r)))] + + cmp\\t%GPI:w2, %GPI:w0, suxtALLX:size %1 + [(set_attr v8type alus_ext) + (set_attr mode GPI:MODE)] +) ;; --- ;; Store-flag and conditional select insns --- gcc/testsuite/gcc.target/aarch64/cmp.c 1970-01-01 05:30:00.0 +0530 +++ gcc/testsuite/gcc.target/aarch64/cmp.c 2013-04-17 15:23:36.121492125 +0530 @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +int +cmp_si_test1 (int a, int b, int c) +{ + if (a b) +return a + c; + else +return a + b + c; +} + +int +cmp_si_test2 (int a, int b, int c) +{ + if ((a 3) b) +return a + c; + else +return a + b + c; +} + +typedef long long s64; + +s64 +cmp_di_test1 (s64 a, s64 b, s64 c) +{ + if (a b) +return a + c; + else +return a + b + c; +} + +s64 +cmp_di_test2 (s64 a, s64 b, s64 c) +{ + if ((a 3) b) +return a + c; + else +return a + b + c; +} + +int +cmp_di_test3 (int a, s64 b, s64 c) +{ + if (a b) +return a + c; + else +return a + b + c; +} + +int +cmp_di_test4 (int a, s64 b, s64 c) +{ + if (((s64)a 3) b) +return a + c; + else +return a + b + c; +} + +/* { dg-final { scan-assembler-times cmp\tw\[0-9\]+, w\[0-9\]+ 2 } } */ +/* { dg-final { scan-assembler-times cmp\tx\[0-9\]+, x\[0-9\]+ 4 } } */
RE: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics
Hi Julian, From: Julian Brown [mailto:jul...@codesourcery.com] Sent: 13 April 2013 15:04 To: Julian Brown Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics On Fri, 12 Apr 2013 20:09:39 +0100 Julian Brown jul...@codesourcery.com wrote: On Fri, 12 Apr 2013 15:19:18 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic to arm_neon.h through the generator ML scripts and also adds the built-ins to which the intrinsics will map to. The generator ML scripts are updated and used to generate the relevant .texi documentation, arm_neon.h and the tests in gcc.target/arm/neon . FWIW, some of the changes to neon*.ml can be simplified somewhat -- my attempt at an improved version of those bits is attached. I'm still not too happy with mode_suffix, but these new instructions require adding semantics to parts of the generator program which weren't really very well-defined to start with :-). I appreciate that it's a bit of a tangle... I thought of an improvement to the mode_suffix part from the last version of the patch, so here it is. I'm done fiddling with this now, so back to you! Thanks for looking at it! My Ocaml-fu is rather limited. It does look cleaner now. Here it is together with all the other parts of the patch, plus some minor formatting changes. Ok for trunk now? gcc/ChangeLog 2013-04-17 Kyrylo Tkachov kyrylo.tkac...@arm.com Julian Brown jul...@codesourcery.com * config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF. (TB_DREG): Add T_V4HF. (v4hf_UP): New macro. (neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. (arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. Handle initialisation of V4HF. Adjust initialisation of reinterpret built-ins. (arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. (arm_vector_mode_supported_p): Handle V4HF. (arm_mangle_map): Handle V4HFmode. * config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF. * config/arm/arm_neon_builtins.def: Add entries for vcvtv4hfv4sf, vcvtv4sfv4hf. * config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern. (neon_vcvtv4hfv4sf): Likewise. * config/arm/neon-gen.ml: Handle half-precision floating point features. * config/arm/neon-testgen.ml: Handle Requires_FP_bit feature. * config/arm/arm_neon.h: Regenerate. * config/arm/neon.ml (type elts): Add F16. (type vectype): Add T_float16x4, T_floatHF. (type vecmode): Add V4HF. (type features): Add Requires_FP_bit feature. (elt_width): Handle F16. (elt_class): Likewise. (elt_of_class_width): Likewise. (mode_of_elt): Refactor. (type_for_elt): Handle F16, fix error messages. (vectype_size): Handle T_float16x4. (vcvt_sh): New function. (ops): Add entries for vcvt_f16_f32, vcvt_f32_f16. (string_of_vectype): Handle T_floatHF, T_float16, T_float16x4. (string_of_mode): Handle V4HF. * doc/arm-neon-intrinsics.texi: Regenerate. gcc/testsuite/ChangeLog 2013-04-17 Kyrylo Tkachov kyrylo.tkac...@arm.com Julian Brown jul...@codesourcery.com * gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated. * gcc.target/arm/neon/vcvtf32_f16.c: Likewise. neon-vcvt-intrinsics.patch Description: Binary data
[PATCH] Fix PR56982, handle setjmp like non-local labels
This fixes PR56982 by properly modeling the control-flow of setjmp. It basically behaves as a non-local goto target so this patch treats it so - it makes it start a basic-block and get abnormal edges from possible sources of non-local gotos. The patch also fixes the bug that longjmp is marked as leaf. Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? What about release branches (after it had some time to settle on trunk of course)? Thanks, Richard. 2013-04-17 Richard Biener rguent...@suse.de PR tree-optimization/56982 * builtins.def (BUILT_IN_LONGJMP): longjmp is not a leaf function. * gimplify.c (gimplify_call_expr): Notice special calls. (gimplify_modify_expr): Likewise. * tree-cfg.c (make_abnormal_goto_edges): Handle setjmp-like abnormal control flow receivers. (call_can_make_abnormal_goto): Handle cfun-calls_setjmp in the same way as cfun-has_nonlocal_labels. (gimple_purge_dead_abnormal_call_edges): Likewise. (stmt_starts_bb_p): Make setjmp-like abnormal control flow receivers start a basic-block. * gcc.c-torture/execute/pr56982.c: New testcase. Index: gcc/gimplify.c === *** gcc/gimplify.c (revision 198021) --- gcc/gimplify.c (working copy) *** gimplify_call_expr (tree *expr_p, gimple *** 2729,2734 --- 2729,2735 gimple_stmt_iterator gsi; call = gimple_build_call_from_tree (*expr_p); gimple_call_set_fntype (call, TREE_TYPE (fnptrtype)); + notice_special_calls (call); gimplify_seq_add_stmt (pre_p, call); gsi = gsi_last (*pre_p); fold_stmt (gsi); *** gimplify_modify_expr (tree *expr_p, gimp *** 4968,4973 --- 4969,4975 STRIP_USELESS_TYPE_CONVERSION (CALL_EXPR_FN (*from_p)); assign = gimple_build_call_from_tree (*from_p); gimple_call_set_fntype (assign, TREE_TYPE (fnptrtype)); + notice_special_calls (assign); if (!gimple_call_noreturn_p (assign)) gimple_call_set_lhs (assign, *to_p); } Index: gcc/tree-cfg.c === *** gcc/tree-cfg.c (revision 198021) --- gcc/tree-cfg.c (working copy) *** make_abnormal_goto_edges (basic_block bb *** 967,991 gimple_stmt_iterator gsi; FOR_EACH_BB (target_bb) ! for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi)) ! { ! gimple label_stmt = gsi_stmt (gsi); ! tree target; ! if (gimple_code (label_stmt) != GIMPLE_LABEL) ! break; ! target = gimple_label_label (label_stmt); ! /* Make an edge to every label block that has been marked as a ! potential target for a computed goto or a non-local goto. */ ! if ((FORCED_LABEL (target) !for_call) ! || (DECL_NONLOCAL (target) for_call)) ! { make_edge (bb, target_bb, EDGE_ABNORMAL); ! break; ! } ! } } /* Create edges for a goto statement at block BB. */ --- 971,1005 gimple_stmt_iterator gsi; FOR_EACH_BB (target_bb) ! { ! for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi)) ! { ! gimple label_stmt = gsi_stmt (gsi); ! tree target; ! if (gimple_code (label_stmt) != GIMPLE_LABEL) ! break; ! target = gimple_label_label (label_stmt); ! /* Make an edge to every label block that has been marked as a !potential target for a computed goto or a non-local goto. */ ! if ((FORCED_LABEL (target) !for_call) ! || (DECL_NONLOCAL (target) for_call)) ! { ! make_edge (bb, target_bb, EDGE_ABNORMAL); ! break; ! } ! } ! if (!gsi_end_p (gsi)) ! { ! /* Make an edge to every setjmp-like call. */ ! gimple call_stmt = gsi_stmt (gsi); ! if (is_gimple_call (call_stmt) ! (gimple_call_flags (call_stmt) ECF_RETURNS_TWICE)) make_edge (bb, target_bb, EDGE_ABNORMAL); ! } ! } } /* Create edges for a goto statement at block BB. */ *** call_can_make_abnormal_goto (gimple t) *** 2147,2153 { /* If the function has no non-local labels, then a call cannot make an abnormal transfer of control. */ ! if (!cfun-has_nonlocal_label) return false; /* Likewise if the call has no side effects. */ --- 2161,2168 { /* If the function has no non-local labels, then a call cannot make an abnormal transfer of control. */ ! if (!cfun-has_nonlocal_label !!cfun-calls_setjmp) return false; /* Likewise if the call has no side effects. */ *** stmt_starts_bb_p (gimple stmt, gimple pr *** 2302,2307 --- 2317,2327 else
[PATCH] Fix PR56921
This fixes PR56921 in a better way and restores the ability to ggc-collect during RTL loop passes. The patch stores the simple-loop-description in a separate member of struct loop and not its aux field which is not scanned by GC. Bootstrapped and tested on x86_64-unknown-linux-gnu and powerpc64-linux-gnu, applied. Richard. 2013-04-17 Richard Biener rguent...@suse.de PR rtl-optimization/56921 * cfgloop.h (struct loop): Add simple_loop_desc member. (struct niter_desc): Mark with GTY(()). (simple_loop_desc): Do not use aux field but simple_loop_desc. * loop-iv.c (get_simple_loop_desc): Likewise. (free_simple_loop_desc): Likewise. Revert 2013-04-16 Richard Biener rguent...@suse.de PR rtl-optimization/56921 * loop-init.c (pass_rtl_move_loop_invariants): Add TODO_do_not_ggc_collect to todo_flags_finish. (pass_rtl_unswitch): Same. (pass_rtl_unroll_and_peel_loops): Same. (pass_rtl_doloop): Same. Index: gcc/cfgloop.h === *** gcc/cfgloop.h (revision 198021) --- gcc/cfgloop.h (working copy) *** struct GTY ((chain_next (%h.next))) lo *** 172,177 --- 172,180 /* Head of the cyclic list of the exits of the loop. */ struct loop_exit *exits; + + /* Number of iteration analysis data for RTL. */ + struct niter_desc *simple_loop_desc; }; /* Flags for state of loop structure. */ *** struct rtx_iv *** 372,378 /* The description of an exit from the loop and of the number of iterations till we take the exit. */ ! struct niter_desc { /* The edge out of the loop. */ edge out_edge; --- 375,381 /* The description of an exit from the loop and of the number of iterations till we take the exit. */ ! struct GTY(()) niter_desc { /* The edge out of the loop. */ edge out_edge; *** extern void free_simple_loop_desc (struc *** 425,431 static inline struct niter_desc * simple_loop_desc (struct loop *loop) { ! return (struct niter_desc *) loop-aux; } /* Accessors for the loop structures. */ --- 428,434 static inline struct niter_desc * simple_loop_desc (struct loop *loop) { ! return loop-simple_loop_desc; } /* Accessors for the loop structures. */ Index: gcc/loop-iv.c === *** gcc/loop-iv.c (revision 198021) --- gcc/loop-iv.c (working copy) *** get_simple_loop_desc (struct loop *loop) *** 3016,3025 /* At least desc-infinite is not always initialized by find_simple_loop_exit. */ ! desc = XCNEW (struct niter_desc); iv_analysis_loop_init (loop); find_simple_exit (loop, desc); ! loop-aux = desc; if (desc-simple_p (desc-assumptions || desc-infinite)) { --- 3016,3025 /* At least desc-infinite is not always initialized by find_simple_loop_exit. */ ! desc = ggc_alloc_cleared_niter_desc (); iv_analysis_loop_init (loop); find_simple_exit (loop, desc); ! loop-simple_loop_desc = desc; if (desc-simple_p (desc-assumptions || desc-infinite)) { *** free_simple_loop_desc (struct loop *loop *** 3069,3074 if (!desc) return; ! free (desc); ! loop-aux = NULL; } --- 3069,3074 if (!desc) return; ! ggc_free (desc); ! loop-simple_loop_desc = NULL; } Index: gcc/loop-init.c === *** gcc/loop-init.c (revision 198021) --- gcc/loop-init.c (working copy) *** struct rtl_opt_pass pass_rtl_move_loop_i *** 434,441 0,/* properties_destroyed */ 0,/* todo_flags_start */ TODO_df_verify | ! TODO_df_finish | TODO_verify_rtl_sharing ! | TODO_do_not_ggc_collect /* todo_flags_finish */ } }; --- 434,440 0,/* properties_destroyed */ 0,/* todo_flags_start */ TODO_df_verify | ! TODO_df_finish | TODO_verify_rtl_sharing /* todo_flags_finish */ } }; *** struct rtl_opt_pass pass_rtl_unswitch = *** 471,478 0,/* properties_provided */ 0,/* properties_destroyed */ 0,/* todo_flags_start */ ! TODO_verify_rtl_sharing ! | TODO_do_not_ggc_collect /* todo_flags_finish */ } }; --- 470,476 0,/* properties_provided */ 0,/* properties_destroyed */ 0,/* todo_flags_start */ ! TODO_verify_rtl_sharing, /* todo_flags_finish */ }
Re: [PATCH][RFC] Handle commutative operations in SLP tree build
On Wed, 10 Apr 2013, Richard Biener wrote: This handles commutative operations during SLP tree build in the way that if one configuration does not match, the build will try again with commutated operands for. This allows to remove the special-casing of commutated loads in a complex addition that was in the end handled as permutation. It of course also applies more generally. Permutation is currently limited to 3 unsuccessful permutes to avoid running into the inherently exponential complexity of tree matching. The gcc.dg/vect/vect-complex-?.c testcases provide some testing coverage (previously handled by the special-casing). I have seen failed SLP in the wild previously but it's usually on larger testcases and dependent on operand order of commutative operands. I've discussed ideas to restrict the cases where we try a permutation with Matz, but I'll rather defer that to an eventual followup. (compute per SSA name a value dependent on the shape of its use-def tree and use that as a quick check whether sub-trees can possibly match) Bootstrap and regtest running on x86_64-unknown-linux-gnu. Any comments? Committed to trunk. Richard. 2013-04-10 Richard Biener rguent...@suse.de * tree-vect-slp.c (vect_build_slp_tree_1): Split out from ... (vect_build_slp_tree): ... here. (vect_build_slp_tree_1): Compute which stmts of the SLP group match. Remove special-casing of mismatched complex loads. (vect_build_slp_tree): Based on the result from vect_build_slp_tree_1 re-try the match with swapped commutative operands. (vect_supported_load_permutation_p): Remove special-casing of mismatched complex loads. (vect_analyze_slp_instance): Adjust.
Re: [patch] RFC: ix86 / x86_64 register pressure aware scheduling
These changes are what we used to try here at Intel after bunch of changes which made pre-alloc scheduler more stable. We benchmarked both register pressure algorithms and overall result was not that promising. We saw number of regressions e.g. for optset -mavx -O3 -funroll-loops -ffast-math -march=corei7 (for spec2000 not only lucas but also applu regressed). And overall gain is negative even for x86_64. For 32 bits picture was worse if I remember correctly. In common we have doubts that this feature is good for OOO machine Thanks, Igor -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Steven Bosscher Sent: Monday, April 15, 2013 11:34 PM To: GCC Patches Cc: H.J. Lu; Uros Bizjak; Jan Hubicha Subject: [patch] RFC: ix86 / x86_64 register pressure aware scheduling Hello, The attached patch enables register pressure aware scheduling for the ix86 and x86_64 targets. It uses the optimistic algorithm to avoid being overly conservative. This is the same as what other CISCy targets, like s390, also do. The motivation for this patch is the excessive spilling I've observed in a few test cases with relatively large basic blocks, e.g. encryption algorithms and codecs. The patch passes bootstrap+testing on x86_64-unknown-linux-gnu and i686-unknown-linux-gnu, with a few new failures due to PR56950. Off-list, Uros, Honza and others have already looked at the patch and benchmarked it. For x86_64 there is an overall improvement for SPEC2k except that lucas regresses, but such a preliminary result is IMHO very promising. Comments/suggestions welcome :-) Ciao! Steven * common/config/i386/i386-common.c (ix86_option_optimization_table): Do not disable insns scheduling. Enable register pressure aware scheduling. * config/i386/i386.c (ix86_option_override): Use the alternative, optimistic scheduling-pressure algorithm by default. Index: common/config/i386/i386-common.c === --- common/config/i386/i386-common.c(revision 197941) +++ common/config/i386/i386-common.c(working copy) @@ -707,9 +707,15 @@ static const struct default_options ix86 { /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, -/* Turn off -fschedule-insns by default. It tends to make the - problem with not enough registers even worse. */ -{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, +/* Enable -fsched-pressure by default for all optimization levels. + Before SCHED_PRESSURE_MODEL register-pressure aware schedule was + available, -fschedule-insns was turned off completely by default for + this port, because scheduling before register allocation tends to + make the problem with not enough registers even worse. However, + for very long basic blocks the scheduler can help bring register + pressure down significantly, and SCHED_PRESSURE_MODEL is still + conservative enough to avoid creating excessive register pressure. */ +{ OPT_LEVELS_ALL, OPT_fsched_pressure, NULL, 1 }, #ifdef SUBTARGET_OPTIMIZATION_OPTIONS SUBTARGET_OPTIMIZATION_OPTIONS, Index: config/i386/i386.c === --- config/i386/i386.c (revision 197941) +++ config/i386/i386.c (working copy) @@ -3936,6 +3936,10 @@ ix86_option_override (void) ix86_option_override_internal (true); + /* Use the alternative scheduling-pressure algorithm by default. */ + maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2, +global_options.x_param_values, +global_options_set.x_param_values); /* This needs to be done at start up. It's convenient to do it here. */ register_pass (insert_vzeroupper_info);
[PATCH, ARM] emit LDRD epilogue instead of a single LDM return
Currently, epilogue is not generated in RTL for function that can return using a single instruction. This patch enables RTL epilogue for such functions on targets that can benefit from using a sequence of LDRD instructions in epilogue instead of a single LDM instruction. No regression on qemu arm-none-eabi with cortex-a15. Ok for trunk? Thanks, Greta gcc/ 2012-10-19 Greta Yorsh Greta.Yorsh at arm.com * config/arm/arm.c (use_return_insn): Return 0 for targets that can benefit from using a sequence of LDRD instructions in epilogue instead of a single LDM instruction.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 866385c..bca92af 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2296,6 +2296,10 @@ use_return_insn (int iscond, rtx sibling) if (IS_INTERRUPT (func_type) (frame_pointer_needed || TARGET_THUMB)) return 0; + if (TARGET_LDRD current_tune-prefer_ldrd_strd + !optimize_function_for_size_p (cfun)) +return 0; + offsets = arm_get_frame_offsets (); stack_adjust = offsets-outgoing_args - offsets-saved_regs;
[PATCH, ARM][10/n] Split scc patterns using cond_exec
This patch converts define_insn into define_insn_and_split to split some alternatives of movsicc_insn and some scc patterns that cannot be expressed using movsicc. The patch emits cond_exec RTL insns. Ok for trunk? Thanks, Greta gcc/ 2013-02-19 Greta Yorsh greta.yo...@arm.com * config/arm/arm.md (movsicc_insn): Convert define_insn into define_insn_and_split. (and_scc,ior_scc,negscc): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 83b36ca..c2e59ed 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -858,7 +858,7 @@ ;; This is the canonicalization of addsi3_compare0_for_combiner when the ;; addend is a constant. -(define_insn *cmpsi2_addneg +(define_insn cmpsi2_addneg [(set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 1 s_register_operand r,r) @@ -1415,7 +1415,7 @@ (set_attr type simple_alu_imm,*,*)] ) -(define_insn *subsi3_compare +(define_insn subsi3_compare [(set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 1 arm_rhs_operand r,r,I) (match_operand:SI 2 arm_rhs_operand I,r,r))) @@ -8619,7 +8619,7 @@ (set_attr type f_selvfp_type)] ) -(define_insn *movsicc_insn +(define_insn_and_split *movsicc_insn [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r) (if_then_else:SI (match_operator 3 arm_comparison_operator @@ -8632,10 +8632,45 @@ mvn%D3\\t%0, #%B2 mov%d3\\t%0, %1 mvn%d3\\t%0, #%B1 - mov%d3\\t%0, %1\;mov%D3\\t%0, %2 - mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 - mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 - mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 + # + # + # + # + ; alt4: mov%d3\\t%0, %1\;mov%D3\\t%0, %2 + ; alt5: mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 + ; alt6: mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 + ; alt7: mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 + reload_completed + [(const_int 0)] + { +enum rtx_code rev_code; +enum machine_mode mode; +rtx rev_cond; + +emit_insn (gen_rtx_COND_EXEC (VOIDmode, + operands[3], + gen_rtx_SET (VOIDmode, + operands[0], + operands[1]))); + +rev_code = GET_CODE (operands[3]); +mode = GET_MODE (operands[4]); +if (mode == CCFPmode || mode == CCFPEmode) + rev_code = reverse_condition_maybe_unordered (rev_code); +else + rev_code = reverse_condition (rev_code); + +rev_cond = gen_rtx_fmt_ee (rev_code, + VOIDmode, + operands[4], + const0_rtx); +emit_insn (gen_rtx_COND_EXEC (VOIDmode, + rev_cond, + gen_rtx_SET (VOIDmode, + operands[0], + operands[2]))); +DONE; + } [(set_attr length 4,4,4,4,8,8,8,8) (set_attr conds use) (set_attr insn mov,mvn,mov,mvn,mov,mov,mvn,mvn) @@ -9604,27 +9639,64 @@ (set_attr type alu_shift,alu_shift_reg)]) -(define_insn *and_scc +(define_insn_and_split *and_scc [(set (match_operand:SI 0 s_register_operand =r) (and:SI (match_operator:SI 1 arm_comparison_operator -[(match_operand 3 cc_register ) (const_int 0)]) - (match_operand:SI 2 s_register_operand r)))] +[(match_operand 2 cc_register ) (const_int 0)]) + (match_operand:SI 3 s_register_operand r)))] TARGET_ARM - mov%D1\\t%0, #0\;and%d1\\t%0, %2, #1 + # ; mov%D1\\t%0, #0\;and%d1\\t%0, %3, #1 + reload_completed + [(cond_exec (match_dup 5) (set (match_dup 0) (const_int 0))) + (cond_exec (match_dup 4) (set (match_dup 0) + (and:SI (match_dup 3) (const_int 1] + { +enum machine_mode mode = GET_MODE (operands[2]); +enum rtx_code rc = GET_CODE (operands[1]); + +/* Note that operands[4] is the same as operands[1], + but with VOIDmode as the result. */ +operands[4] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx); +if (mode == CCFPmode || mode == CCFPEmode) + rc = reverse_condition_maybe_unordered (rc); +else + rc = reverse_condition (rc); +operands[5] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx); + } [(set_attr conds use) (set_attr insn mov) (set_attr length 8)] ) -(define_insn *ior_scc +(define_insn_and_split *ior_scc [(set (match_operand:SI 0 s_register_operand =r,r) - (ior:SI (match_operator:SI 2 arm_comparison_operator -[(match_operand 3 cc_register ) (const_int 0)]) - (match_operand:SI 1 s_register_operand 0,?r)))] + (ior:SI (match_operator:SI 1 arm_comparison_operator +[(match_operand 2 cc_register ) (const_int 0)]) + (match_operand:SI 3 s_register_operand 0,?r)))] TARGET_ARM @ -
New German PO file for 'gcc' (version 4.8.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the German team of translators. The file is available at: http://translationproject.org/latest/gcc/de.po (This file, 'gcc-4.8.0.de.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH] Fix linking with -findirect-dispatch
Bryce McKinlay bmckin...@gmail.com writes: It certainly _did_ work as intended previously. Only by chance, when libtool has to relink the library during install. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
Hi all, here is patch for a recent regression with procedure pointers. Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8? Cheers, Janus 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * interface.c (check_result_characteristics): Get result from interface if present. 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * gfortran.dg/proc_ptr_42.f90: New. pr56814_v2.diff Description: Binary data proc_ptr_42.f90 Description: Binary data
Re: [Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
Janus Weil: here is patch for a recent regression with procedure pointers. Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8? Looks rather obvious. OK - and thanks for the patch. Tobias PS: If you have time, could you review my C_LOC patch at http://gcc.gnu.org/ml/fortran/2013-04/msg00073.html ? 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * interface.c (check_result_characteristics): Get result from interface if present. 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * gfortran.dg/proc_ptr_42.f90: New.
Re: RFA: enable LRA for rs6000 [patch for WRF]
On 13-04-16 6:56 PM, Michael Meissner wrote: I tracked down the bug with the spec 2006 benchmark WRF using the LRA register allocator. At one point LRA has decided to use the CTR to hold a CCmode value: (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411]) (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1} (expr_list:REG_DEAD (reg:CC 66 ctr [4411]) (nil))) Now movcc_internal1 has moves from r-h (which includes ctr/lr) and ctr/lr-r, but it doesn't have a move to cover the nop move of moving the ctr to the ctr. IMHO, LRA should not be generating NOP moves that are later deleted. There are two ways to solve the problem. One is not to let anything but int modes into CTR/LR, which will also eliminate the register allocator from spilling floating point values there (which we've seen in the past, but the last time I tried to eliminate it I couldn't). The following patch does this, and also changes the assertion to call fatal_insn_not_found to make it clearer what the error is. I imagine, I could add a NOP move insn to movcc_internal1, but that just strikes me as wrong. Note, this does not fix the 32-bit failure in dealII, and I also noticed that I can't bootstrap the compiler using --with-cpu=power7, which I will get to tomorrow. 2013-04-16 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to control whether we only allow int modes to go in the CTR, LR, VRSAVE, VSCR registers. * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise. (rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are constrained. (rs6000_option_override_internal): Set -mconstrain-regs if we are using the LRA register allocator. * lra.c (check_rtl): Use fatal_insn_not_found to report constraint does not match. Mike, thanks for the patch and all the SPEC2006 data (which are very useful as I have no access to power machine which can be used for benchmarking). I guess that may be some benchmark scores are lower because of LRA lacks some micro-optimizations which reload implements through many power hooks (e.g. LRA does not use push reload). Although sometimes it is not a bad thing (e.g. LRA does not use SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for other useful things). In general I got impression that power7 is the most difficult port for LRA. If we manage to port it, LRA ports for other targets will be easier. I also reproduced bootstrap failure --with-cpu=power7 and I am going to work on this and after that on SPEC2006 you wrote about.
Re: [PATCH, x86] Use vector moves in memmove expanding
Bootstrap/make check/Specs2k are passing on i686 and x86_64. Thanks for returning to this! glibc has quite comprehensive testsuite for stringop. It may be useful to test it with -minline-all-stringop -mstringop-stategy=vector I tested the patch on my core notebook and my memcpy micro benchmark. Vector loop is not a win since apparenlty we do not produce any SSE code for 64bit compilation. What CPUs and bock sizes this is intended for? Also the internal loop with -march=native seems to come out as: .L7: movq(%rsi,%r8), %rax movq8(%rsi,%r8), %rdx movq48(%rsi,%r8), %r9 movq56(%rsi,%r8), %r10 movdqu 16(%rsi,%r8), %xmm3 movdqu 32(%rsi,%r8), %xmm1 movq%rax, (%rdi,%r8) movq%rdx, 8(%rdi,%r8) movdqa %xmm3, 16(%rdi,%r8) movdqa %xmm1, 32(%rdi,%r8) movq%r9, 48(%rdi,%r8) movq%r10, 56(%rdi,%r8) addq$64, %r8 cmpq%r11, %r8 It is not htat much of SSE enablement since RA seems to home the vars in integer regs. Could you please look into it? Changelog entry: 2013-04-10 Michael Zolotukhin michael.v.zolotuk...@gmail.com * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop. * config/i386/i386.c (expand_set_or_movmem_via_loop): Use adjust_address instead of change_address to keep info about alignment. (emit_strmov): Remove. (emit_memmov): New function. (expand_movmem_epilogue): Refactor to properly handle bigger sizes. (expand_movmem_epilogue): Likewise and return updated rtx for destination. (expand_constant_movmem_prologue): Likewise and return updated rtx for destination and source. (decide_alignment): Refactor, handle vector_loop. (ix86_expand_movmem): Likewise. (ix86_expand_setmem): Likewise. * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg. * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 73a59b5..edb59da 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align) expr = inner; } } + else if (TREE_CODE (expr) == MEM_REF) +{ + tree base = TREE_OPERAND (expr, 0); + tree byte_offset = TREE_OPERAND (expr, 1); + if (TREE_CODE (base) != ADDR_EXPR + || TREE_CODE (byte_offset) != INTEGER_CST) + return -1; + if (!DECL_P (TREE_OPERAND (base, 0)) + || DECL_ALIGN (TREE_OPERAND (base, 0)) align) You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE handling by get_object_alignment? + return -1; + offset += tree_low_cst (byte_offset, 1); +} else return -1; This change out to go independently. I can not review it. I will make first look over the patch shortly, but please send updated patch fixing the problem with integer regs. Honza
Re: [PATCH, ARM][10/n] Split scc patterns using cond_exec
On 17/04/13 14:12, Greta Yorsh wrote: This patch converts define_insn into define_insn_and_split to split some alternatives of movsicc_insn and some scc patterns that cannot be expressed using movsicc. The patch emits cond_exec RTL insns. Ok for trunk? Thanks, Greta gcc/ 2013-02-19 Greta Yorsh greta.yo...@arm.com * config/arm/arm.md (movsicc_insn): Convert define_insn into define_insn_and_split. (and_scc,ior_scc,negscc): Likewise. OK. R.
Re: [PATCH, ARM] emit LDRD epilogue instead of a single LDM return
On 17/04/13 14:12, Greta Yorsh wrote: Currently, epilogue is not generated in RTL for function that can return using a single instruction. This patch enables RTL epilogue for such functions on targets that can benefit from using a sequence of LDRD instructions in epilogue instead of a single LDM instruction. No regression on qemu arm-none-eabi with cortex-a15. Ok for trunk? Thanks, Greta gcc/ 2012-10-19 Greta Yorsh Greta.Yorsh at arm.com * config/arm/arm.c (use_return_insn): Return 0 for targets that can benefit from using a sequence of LDRD instructions in epilogue instead of a single LDM instruction. OK. R.
Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics
On 17/04/13 12:06, Kyrylo Tkachov wrote: Hi Julian, From: Julian Brown [mailto:jul...@codesourcery.com] Sent: 13 April 2013 15:04 To: Julian Brown Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics On Fri, 12 Apr 2013 20:09:39 +0100 Julian Brown jul...@codesourcery.com wrote: On Fri, 12 Apr 2013 15:19:18 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic to arm_neon.h through the generator ML scripts and also adds the built-ins to which the intrinsics will map to. The generator ML scripts are updated and used to generate the relevant .texi documentation, arm_neon.h and the tests in gcc.target/arm/neon . FWIW, some of the changes to neon*.ml can be simplified somewhat -- my attempt at an improved version of those bits is attached. I'm still not too happy with mode_suffix, but these new instructions require adding semantics to parts of the generator program which weren't really very well-defined to start with :-). I appreciate that it's a bit of a tangle... I thought of an improvement to the mode_suffix part from the last version of the patch, so here it is. I'm done fiddling with this now, so back to you! Thanks for looking at it! My Ocaml-fu is rather limited. It does look cleaner now. Here it is together with all the other parts of the patch, plus some minor formatting changes. Ok for trunk now? gcc/ChangeLog 2013-04-17 Kyrylo Tkachov kyrylo.tkac...@arm.com Julian Brown jul...@codesourcery.com * config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF. (TB_DREG): Add T_V4HF. (v4hf_UP): New macro. (neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. (arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. Handle initialisation of V4HF. Adjust initialisation of reinterpret built-ins. (arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. (arm_vector_mode_supported_p): Handle V4HF. (arm_mangle_map): Handle V4HFmode. * config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF. * config/arm/arm_neon_builtins.def: Add entries for vcvtv4hfv4sf, vcvtv4sfv4hf. * config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern. (neon_vcvtv4hfv4sf): Likewise. * config/arm/neon-gen.ml: Handle half-precision floating point features. * config/arm/neon-testgen.ml: Handle Requires_FP_bit feature. * config/arm/arm_neon.h: Regenerate. * config/arm/neon.ml (type elts): Add F16. (type vectype): Add T_float16x4, T_floatHF. (type vecmode): Add V4HF. (type features): Add Requires_FP_bit feature. (elt_width): Handle F16. (elt_class): Likewise. (elt_of_class_width): Likewise. (mode_of_elt): Refactor. (type_for_elt): Handle F16, fix error messages. (vectype_size): Handle T_float16x4. (vcvt_sh): New function. (ops): Add entries for vcvt_f16_f32, vcvt_f32_f16. (string_of_vectype): Handle T_floatHF, T_float16, T_float16x4. (string_of_mode): Handle V4HF. * doc/arm-neon-intrinsics.texi: Regenerate. gcc/testsuite/ChangeLog 2013-04-17 Kyrylo Tkachov kyrylo.tkac...@arm.com Julian Brown jul...@codesourcery.com * gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated. * gcc.target/arm/neon/vcvtf32_f16.c: Likewise. neon-vcvt-intrinsics.patch Please give Julian 24 hours for one final review of the Ocaml bits. Otherwise OK. R.
Re: [PATCH, x86] Use vector moves in memmove expanding
@@ -2392,6 +2392,7 @@ static void ix86_set_current_function (tree); static unsigned int ix86_minimum_incoming_stack_boundary (bool); static enum calling_abi ix86_function_abi (const_tree); +static int smallest_pow2_greater_than (int); Perhaps it is easier to use existing 1ceil_log2? #ifndef SUBTARGET32_DEFAULT_CPU @@ -21829,11 +21830,10 @@ expand_set_or_movmem_via_loop (rtx destmem, rtx srcmem, { rtx out_label, top_label, iter, tmp; enum machine_mode iter_mode = counter_mode (count); - rtx piece_size = GEN_INT (GET_MODE_SIZE (mode) * unroll); + int piece_size_n = GET_MODE_SIZE (mode) * unroll; + rtx piece_size = GEN_INT (piece_size_n); rtx piece_size_mask = GEN_INT (~((GET_MODE_SIZE (mode) * unroll) - 1)); rtx size; - rtx x_addr; - rtx y_addr; int i; top_label = gen_label_rtx (); @@ -21854,13 +21854,18 @@ expand_set_or_movmem_via_loop (rtx destmem, rtx srcmem, emit_label (top_label); tmp = convert_modes (Pmode, iter_mode, iter, true); - x_addr = gen_rtx_PLUS (Pmode, destptr, tmp); - destmem = change_address (destmem, mode, x_addr); + + /* This assert could be relaxed - in this case we'll need to compute + smallest power of two, containing in PIECE_SIZE_N and pass it to + offset_address. */ + gcc_assert ((piece_size_n (piece_size_n - 1)) == 0); + destmem = offset_address (destmem, tmp, piece_size_n); + destmem = adjust_address (destmem, mode, 0); if (srcmem) { - y_addr = gen_rtx_PLUS (Pmode, srcptr, copy_rtx (tmp)); - srcmem = change_address (srcmem, mode, y_addr); + srcmem = offset_address (srcmem, copy_rtx (tmp), piece_size_n); + srcmem = adjust_address (srcmem, mode, 0); /* When unrolling for chips that reorder memory reads and writes, we can save registers by using single temporary. @@ -22039,13 +22044,61 @@ expand_setmem_via_rep_stos (rtx destmem, rtx destptr, rtx value, emit_insn (gen_rep_stos (destptr, countreg, destmem, value, destexp)); } This change looks OK and can go into manline independnetly. Just please ensure that changing the way address is computed is not making us to preserve alias set. Memmove can not rely on the alias set of the src/destination objects. -static void -emit_strmov (rtx destmem, rtx srcmem, -rtx destptr, rtx srcptr, enum machine_mode mode, int offset) -{ - rtx src = adjust_automodify_address_nv (srcmem, mode, srcptr, offset); - rtx dest = adjust_automodify_address_nv (destmem, mode, destptr, offset); - emit_insn (gen_strmov (destptr, dest, srcptr, src)); +/* This function emits moves to copy SIZE_TO_MOVE bytes from SRCMEM to + DESTMEM. + SRC is passed by pointer to be updated on return. + Return value is updated DST. */ +static rtx +emit_memmov (rtx destmem, rtx *srcmem, rtx destptr, rtx srcptr, +HOST_WIDE_INT size_to_move) +{ + rtx dst = destmem, src = *srcmem, adjust, tempreg; + enum insn_code code; + enum machine_mode move_mode; + int piece_size, i; + + /* Find the widest mode in which we could perform moves. + Start with the biggest power of 2 less than SIZE_TO_MOVE and half + it until move of such size is supported. */ + piece_size = smallest_pow2_greater_than (size_to_move) 1; + move_mode = mode_for_size (piece_size * BITS_PER_UNIT, MODE_INT, 0); I suppose this is a problem with SSE moves ending up in integer register, since you get TImode rather than vectorized mode, like V8QImode in here. Why not stick with the original mode parmaeter? + code = optab_handler (mov_optab, move_mode); + while (code == CODE_FOR_nothing piece_size 1) +{ + piece_size = 1; + move_mode = mode_for_size (piece_size * BITS_PER_UNIT, MODE_INT, 0); + code = optab_handler (mov_optab, move_mode); +} + gcc_assert (code != CODE_FOR_nothing); + + dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0); + src = adjust_automodify_address_nv (src, move_mode, srcptr, 0); + + /* Emit moves. We'll need SIZE_TO_MOVE/PIECE_SIZES moves. */ + gcc_assert (size_to_move % piece_size == 0); + adjust = GEN_INT (piece_size); + for (i = 0; i size_to_move; i += piece_size) +{ + /* We move from memory to memory, so we'll need to do it via +a temporary register. */ + tempreg = gen_reg_rtx (move_mode); + emit_insn (GEN_FCN (code) (tempreg, src)); + emit_insn (GEN_FCN (code) (dst, tempreg)); + + emit_move_insn (destptr, + gen_rtx_PLUS (Pmode, copy_rtx (destptr), adjust)); + emit_move_insn (srcptr, + gen_rtx_PLUS (Pmode, copy_rtx (srcptr), adjust)); + + dst = adjust_automodify_address_nv (dst, move_mode, destptr, + piece_size); + src = adjust_automodify_address_nv (src, move_mode, srcptr, + piece_size); +} + + /* Update DST and SRC rtx. */ + *srcmem = src; + return dst; Doesn't this effectively kill support for
[PATCH, PR 42371] Remove references to functions from symbol table during inlining
Hi, the patch below is a fix for PR 42371 removing references to functions from symbol table when we know that all their uses were inlined. This then allows us to remove out-of-line copies of functions when they are not needed. The patch adds ability to count the uses of a parameter value that are all well described by jump functions and call graph indirect edges (or mark that we do not have uses under control). These counts are then combined or transferred to a new helper structure (ipa_cst_ref_desc) associated with constant jump functions when functions are inlined and decremented when the reference is used to turn an indirect call graph edges into a direct ones. When the count eventually drops to zero, the other information in the structure are used to identify and remove an item in the appropriate reference list in the symbol table. The data structures allow for non-trivial duplication when whole trees of inlined call graph nodes are duplicated. I currently allocate the new structures only when a constant pointer to a function is passed as a parameter. It is trivial to have one for all references but I'm not sure what good that would be, at least not now. Perhaps once we will also attempt to track references to virtual tables but it will be much more difficult to prove it does not escape somewhere in the callee(s). Bootstrapped and tested on x86_64-linux without any problems. OK for trunk? Thanks, Martin 2013-03-22 Martin Jambor mjam...@suse.cz PR middle-end/42371 * ipa-prop.h (IPA_UNDESCRIBED_USE): New macro. (ipa_constant_data): New type. (ipa_jump_func): Use ipa_constant_data to hold information about constant jump functions. (ipa_get_jf_constant): Adjust to jump function type changes. (ipa_get_jf_constant_rdesc): New function. (ipa_param_descriptor): New field controlled_uses. (ipa_get_controlled_uses): New function. (ipa_set_controlled_uses): Likewise. * ipa-ref.h (ipa_find_reference): Declare. * ipa-prop.c (ipa_cst_ref_desc): New type. (ipa_print_node_jump_functions_for_edge): Adjust for jump function type changes. (ipa_set_jf_constant): Likewise. Also create reference descriptions. New parameter cs. Adjust all callers. (ipa_analyze_params_uses): Detect uncontrolled and controlled uses. (remove_described_reference): New function. (jfunc_rdesc_usable): Likewise. (try_make_edge_direct_simple_call): Decrement controlled use count, attempt to remove reference if it hits zero. (combine_controlled_uses_counters): New function. (propagate_controlled_uses): Likewise. (ipa_propagate_indirect_call_infos): Call propagate_controlled_uses. (ipa_edge_duplication_hook): Duplicate reference descriptions. (ipa_print_node_params): Print described use counter. (ipa_write_jump_function): Adjust to jump function type changes. (ipa_read_jump_function): New parameter CS, pass it to ipa_set_jf_constant. Adjust caller. (ipa_write_node_info): Stream controlled use count (ipa_read_node_info): Likewise. * cgraph.c (cgraph_mark_address_taken_node): Bail out instead of asserting. * ipa-cp.c (ipcp_discover_new_direct_edges): Decrement controlled use count. Remove cloning-added reference if it reaches zero. * ipa-ref.c (ipa_find_reference): New function. testsuite/ * gcc.dg/ipa/remref-0.c: New test. * gcc.dg/ipa/remref-1a.c: Likewise. * gcc.dg/ipa/remref-1b.c: Likewise. * gcc.dg/ipa/remref-2a.c: Likewise. * gcc.dg/ipa/remref-2b.c: Likewise. Index: src/gcc/ipa-prop.c === *** src.orig/gcc/ipa-prop.c --- src/gcc/ipa-prop.c *** static struct cgraph_2edge_hook_list *ed *** 62,67 --- 62,83 static struct cgraph_2node_hook_list *node_duplication_hook_holder; static struct cgraph_node_hook_list *function_insertion_hook_holder; + /* Description of a reference to an IPA constant. */ + struct ipa_cst_ref_desc + { + /* Edge that corresponds to the statement which took the reference. */ + struct cgraph_edge *cs; + /* Linked list of duplicates created when call graph edges are cloned. */ + struct ipa_cst_ref_desc *next_duplicate; + /* Number of references in IPA structures, IPA_UNDESCRIBED_USE if the value + if out of control. */ + int refcount; + }; + + /* Allocation pool for reference descriptions. */ + + static alloc_pool ipa_refdesc_pool; + /* Return index of the formal whose tree is PTREE in function which corresponds to INFO. */ *** ipa_print_node_jump_functions_for_edge ( *** 175,181 } else if (type == IPA_JF_CONST) { ! tree val = jump_func-value.constant; fprintf (f, CONST: );
[PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue
Hi, I have discovered that scheduling pass_cprop_hardreg before pass_thread_prologue_and_epilogue leads to significant increases in numbers of performed shrink-wrappings. For one it solves PR 10474 (at least on x86_64-linux) but it also boosts the number of shrink-wrappings performed during gcc bootstrap by nearly 80% (3165-5692 functions). It is also necessary (although not sufficient) to perform shrink-wrapping in at least one function in the povray benchmark. The reason why it helps so much is that before register allocation there are instructions moving the value of actual arguments from originally hard register (e.g. SI, DI, etc.) to a pseudo at the beginning of each function. When the argument is live across a function call, the pseudo is likely to be assigned to a callee-saved register and then also accessed from that register, even in the first BB, making it require prologue, though it could be fetched from the original one. When we convert all uses (at least in the first BB) to the original register, the preparatory stage of shrink wrapping is often capable of moving the register moves to a later BB, thus creating fast paths which do not require prologue and epilogue. We believe this change in the pipeline should not bring about any negative effects. During gcc bootstrap, the number of instructions changed by pass_cprop_hardreg dropped but by only 1.2%. We have also ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all run time differences could be attributed to noise. The changes in binary sizes were also small: || Trunk produced | New || | Benchmark |binary size | binary size | % diff | |++-+| | 400.perlbench |6219603 | 6136803 | -1.33 | | 401.bzip2 | 359291 | 351659 | -2.12 | | 403.gcc| 16249718 |15915774 | -2.06 | | 410.bwaves | 145249 | 145769 | 0.36 | | 416.gamess | 40269686 |40270270 | 0.00 | | 429.mcf| 97142 | 97126 | -0.02 | | 433.milc | 715444 | 713236 | -0.31 | | 434.zeusmp |1444596 | 1444676 | 0.01 | | 435.gromacs|6609207 | 6470039 | -2.11 | | 436.cactusADM |4571319 | 4532607 | -0.85 | | 437.leslie3d | 492197 | 492357 | 0.03 | | 444.namd |1001921 | 1007001 | 0.51 | | 445.gobmk |8193495 | 8163839 | -0.36 | | 450.soplex |5565070 | 5530734 | -0.62 | | 453.povray |7468446 | 7340142 | -1.72 | | 454.calculix |8474754 | 8464954 | -0.12 | | 456.hmmer |1662315 | 1650147 | -0.73 | | 458.sjeng | 623065 | 620817 | -0.36 | | 459.GemsFDTD |1456669 | 1461573 | 0.34 | | 462.libquantum | 249809 | 248401 | -0.56 | | 464.h264ref|2784806 | 2772806 | -0.43 | | 465.tonto | 15511395 |15480899 | -0.20 | | 470.lbm| 64327 | 64215 | -0.17 | | 471.omnetpp|5325418 | 5293874 | -0.59 | | 473.astar | 365853 | 363261 | -0.71 | | 481.wrf| 22002287 |21950783 | -0.23 | | 482.sphinx3|1153616 | 1145248 | -0.73 | | 483.xalancbmk | 62458676 |62001540 | -0.73 | |++-+| | TOTAL | 221535374 | 220130550 | -0.63 | I have successfully bootstrapped and tested the patch on x86-64-linux. Is it OK for trunk? Or should I also examine some other aspect? Thanks, Martin 2013-03-28 Martin Jambor mjam...@suse.cz PR middle-end/10474 * passes.c (init_optimization_passes): Move pass_cprop_hardreg before pass_thread_prologue_and_epilogue. testsuite/ * gcc.dg/pr10474.c: New test. Index: src/gcc/passes.c === --- src.orig/gcc/passes.c +++ src/gcc/passes.c @@ -1630,6 +1630,7 @@ init_optimization_passes (void) NEXT_PASS (pass_ree); NEXT_PASS (pass_compare_elim_after_reload); NEXT_PASS (pass_branch_target_load_optimize1); + NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_thread_prologue_and_epilogue); NEXT_PASS (pass_rtl_dse2); NEXT_PASS (pass_stack_adjustments); @@ -1637,7 +1638,6 @@ init_optimization_passes (void) NEXT_PASS (pass_peephole2); NEXT_PASS (pass_if_after_reload); NEXT_PASS (pass_regrename); - NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_fast_rtl_dce); NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_branch_target_load_optimize2); Index: src/gcc/testsuite/gcc.dg/pr10474.c
[PATCH, PR 56718] Middle-end intraprocedural type-based devirtualization
Hi, this patch implements type-based devirtualization done intra-procedurally. This is normally done by the front-end except in cases when opportunities for this transformation are created by early-inlining. Because we handle this situation at IPA-level (especially in inlining but also in IPA-CP), this can currently mean that some code runs much faster when compiled without early-inlining, e.g. the PR 56718 testcase. This patch piggy-backs on PRE OBJ_TYPE_REF folding and when that fails, it calls into ipa-cp routines that simply use the infrastructure we have for construction of known-type jump functions to figure out the type of the object involved. If that is known, the appropriate method is looked up in the VMT obtained from its BINFO. From now on I'll concentrate on tracking of VMT pointers within objects rather than on type-based techniques in my devirtualization efforts. Nevertheless, I do believe we want to have this patch in because (as opposed to VMT tracking) it can also work when constructors are in a different compilation unit. Even these situations do not occur that often, the effects can be quite substantial when we miss an inlining opportunity, the testcase in the PR 56718 is rather artificial but 2.5 times quicker with the patch. A very similar patch has passed bootstrap and testing on x86_64-linux without any problems, I'm bootstrapping and testing this exact one now. OK for trunk if it passes? Thanks, Martin 2013-03-25 Martin Jambor mjam...@suse.cz PR tree-optimization/56718 * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, renamed and made public. adjusted all callers. (ipa_intraprocedural_devirtualization): New function. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declare. (ipa_intraprocedural_devirtualization): Likewise. * Makefile.in (tree-ssa-pre.o): Add ipa-prop.h to dependencies. testsuite/ * g++.dg/ipa/imm-devirt-1.C: New test. * g++.dg/ipa/imm-devirt-2.C: Likewise. *** /tmp/ITGyHx_Makefile.in 2013-04-16 00:02:39.0 +0200 --- gcc/Makefile.in 2013-04-15 15:02:53.079696533 +0200 *** tree-ssa-pre.o : tree-ssa-pre.c $(TREE_F *** 2369,2375 $(TM_H) coretypes.h $(TREE_PASS_H) $(FLAGS_H) langhooks.h \ $(CFGLOOP_H) alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) $(HASH_TABLE_H) \ $(GIMPLE_H) $(TREE_INLINE_H) tree-iterator.h tree-ssa-sccvn.h $(PARAMS_H) \ !$(DBGCNT_H) tree-scalar-evolution.h $(GIMPLE_PRETTY_PRINT_H) domwalk.h tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H) $(CONFIG_H) \ $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) \ $(TM_H) coretypes.h $(DUMPFILE_H) $(FLAGS_H) $(CFGLOOP_H) \ --- 2369,2376 $(TM_H) coretypes.h $(TREE_PASS_H) $(FLAGS_H) langhooks.h \ $(CFGLOOP_H) alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) $(HASH_TABLE_H) \ $(GIMPLE_H) $(TREE_INLINE_H) tree-iterator.h tree-ssa-sccvn.h $(PARAMS_H) \ !$(DBGCNT_H) tree-scalar-evolution.h $(GIMPLE_PRETTY_PRINT_H) domwalk.h \ !$(IPA_PROP_H) tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H) $(CONFIG_H) \ $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) \ $(TM_H) coretypes.h $(DUMPFILE_H) $(FLAGS_H) $(CFGLOOP_H) \ *** /tmp/CO3mFA_ipa-cp.c2013-04-16 00:02:39.0 +0200 --- gcc/ipa-cp.c2013-04-15 15:02:53.070696564 +0200 *** ipa_get_jf_ancestor_result (struct ipa_j *** 791,810 return NULL_TREE; } - /* Extract the acual BINFO being described by JFUNC which must be a known type -jump function. */ - - static tree - ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc) - { - tree base_binfo = TYPE_BINFO (ipa_get_jf_known_type_base_type (jfunc)); - if (!base_binfo) - return NULL_TREE; - return get_binfo_at_offset (base_binfo, - ipa_get_jf_known_type_offset (jfunc), - ipa_get_jf_known_type_component_type (jfunc)); - } - /* Determine whether JFUNC evaluates to a known value (that is either a constant or a binfo) and if so, return it. Otherwise return NULL. INFO describes the caller node so that pass-through jump functions can be --- 791,796 *** ipa_value_from_jfunc (struct ipa_node_pa *** 816,822 if (jfunc-type == IPA_JF_CONST) return ipa_get_jf_constant (jfunc); else if (jfunc-type == IPA_JF_KNOWN_TYPE) ! return ipa_value_from_known_type_jfunc (jfunc); else if (jfunc-type == IPA_JF_PASS_THROUGH || jfunc-type == IPA_JF_ANCESTOR) { --- 802,808 if (jfunc-type == IPA_JF_CONST) return ipa_get_jf_constant (jfunc); else if (jfunc-type == IPA_JF_KNOWN_TYPE) ! return ipa_binfo_from_known_type_jfunc (jfunc); else if (jfunc-type == IPA_JF_PASS_THROUGH || jfunc-type == IPA_JF_ANCESTOR) { *** propagate_scalar_accross_jump_function ( *** 1103,1109
[google] Add libgcov interface for accessing profile directory (issue8726046)
Patch to add interface for querying the profile directory prefix specified to the -fprofile-generate= option. Google ref b/8629045. Tested with regression tests and internal test. Ok for google branches? 2013-04-17 Teresa Johnson tejohn...@google.com * libgcc/libgcov.c (__gcov_get_profile_prefix): New function. * gcc/tree-profile.c (tree_init_instrumentation): New function. * gcc/coverage.c (get_const_string_type): Make non-static. (coverage_init): Invoke tree_init_instrumentation(). * gcc/coverage.h (get_const_string_type): Declare. (tree_init_instrumentation): Ditto. Index: libgcc/libgcov.c === --- libgcc/libgcov.c(revision 197640) +++ libgcc/libgcov.c(working copy) @@ -183,6 +183,14 @@ unsigned int __gcov_sampling_enabled () return __gcov_has_sampling; } +/* Profile directory prefix specified to -fprofile-generate=. */ +extern char * __gcov_profile_prefix; + +char *__gcov_get_profile_prefix () +{ + return __gcov_profile_prefix; +} + /* Per thread sample counter. */ THREAD_PREFIX gcov_unsigned_t __gcov_sample_counter = 0; Index: gcc/tree-profile.c === --- gcc/tree-profile.c (revision 197640) +++ gcc/tree-profile.c (working copy) @@ -165,6 +165,9 @@ static struct pointer_set_t *instrumentation_to_be /* extern __thread gcov_unsigned_t __gcov_sample_counter */ static GTY(()) tree gcov_sample_counter_decl = NULL_TREE; +/* extern gcov_unsigned_t __gcov_profile_prefix */ +static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE; + /* extern gcov_unsigned_t __gcov_sampling_period */ static tree GTY(()) gcov_sampling_period_decl = NULL_TREE; @@ -407,6 +410,41 @@ cleanup_instrumentation_sampling (void) } } +/* Initialization function for FDO instrumentation. */ + +void +tree_init_instrumentation (void) +{ + if (!gcov_profile_prefix_decl) +{ + tree prefix_ptr; + int prefix_len; + tree prefix_string; + + /* Construct an initializer for __gcov_profile_prefix. */ + gcov_profile_prefix_decl = +build_decl (UNKNOWN_LOCATION, VAR_DECL, +get_identifier (__gcov_profile_prefix), +get_const_string_type ()); + TREE_PUBLIC (gcov_profile_prefix_decl) = 1; + DECL_ARTIFICIAL (gcov_profile_prefix_decl) = 1; + make_decl_one_only (gcov_profile_prefix_decl, + DECL_ASSEMBLER_NAME (gcov_profile_prefix_decl)); + TREE_STATIC (gcov_profile_prefix_decl) = 1; + + prefix_len = strlen (profile_data_prefix); + prefix_string = build_string (prefix_len + 1, profile_data_prefix); + TREE_TYPE (prefix_string) = build_array_type + (char_type_node, build_index_type + (build_int_cst (NULL_TREE, prefix_len))); + prefix_ptr = build1 (ADDR_EXPR, get_const_string_type (), + prefix_string); + + DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr; + varpool_finalize_decl (gcov_profile_prefix_decl); +} +} + /* Initialization function for FDO sampling. */ void Index: gcc/coverage.c === --- gcc/coverage.c (revision 197640) +++ gcc/coverage.c (working copy) @@ -227,7 +227,7 @@ get_gcov_unsigned_t (void) /* Return the type node for const char *. */ -static tree +tree get_const_string_type (void) { return build_pointer_type @@ -2879,6 +2879,7 @@ coverage_init (const char *filename, const char* s /* Define variables which are referenced at runtime by libgcov. */ if (profiling_enabled_p ()) { + tree_init_instrumentation (); tree_init_dyn_ipa_parameters (); init_pmu_profiling (); tree_init_instrumentation_sampling (); Index: gcc/coverage.h === --- gcc/coverage.h (revision 197640) +++ gcc/coverage.h (working copy) @@ -82,6 +82,7 @@ extern bool pmu_data_present (void); extern tree get_gcov_type (void); extern tree get_gcov_unsigned_t (void); +extern tree get_const_string_type (void); /* Mark this module as containing asm statements. */ extern void coverage_has_asm_stmt (void); @@ -89,5 +90,6 @@ extern void coverage_has_asm_stmt (void); /* Defined in tree-profile.c. */ extern void tree_init_instrumentation_sampling (void); extern void tree_init_dyn_ipa_parameters (void); +extern void tree_init_instrumentation (void); #endif -- This patch is available for review at http://codereview.appspot.com/8726046
Re: RFA: enable LRA for rs6000 [patch for WRF]
On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote: Mike, thanks for the patch and all the SPEC2006 data (which are very useful as I have no access to power machine which can be used for benchmarking). I guess that may be some benchmark scores are lower because of LRA lacks some micro-optimizations which reload implements through many power hooks (e.g. LRA does not use push reload). Although sometimes it is not a bad thing (e.g. LRA does not use SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for other useful things). SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7, where we need a larger stack slot to hold spilled SDmode values (power6 did not have the LFIWZX instruction that is needed to load SDmode values into floating point registers). In general I got impression that power7 is the most difficult port for LRA. If we manage to port it, LRA ports for other targets will be easier. I dunno, has LRA been ported to the SH yet? I also reproduced bootstrap failure --with-cpu=power7 and I am going to work on this and after that on SPEC2006 you wrote about. Ok. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
here is patch for a recent regression with procedure pointers. Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8? Looks rather obvious. OK - and thanks for the patch. Thanks for the review. Committed as r198032. PS: If you have time, could you review my C_LOC patch at http://gcc.gnu.org/ml/fortran/2013-04/msg00073.html ? Will try to have a look soon ... Cheers, Janus 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * interface.c (check_result_characteristics): Get result from interface if present. 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * gfortran.dg/proc_ptr_42.f90: New.
Re: [PATCH] backport darwin12 fixes to gcc-4_7-branch
On Jun 18, 2012, at 9:36 AM, Jack Howarth howa...@bromo.med.uc.edu wrote: The attached patch backports... http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01710.html http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01707.html which implement changes for darwin12 and later. Bootstrap and regression tested on current x86_64-apple-darwin12. http://gcc.gnu.org/ml/gcc-testresults/2012-06/msg01547.html Okay for gcc-4_7-branch? Committed 198005. Committed 198012. Committed 198031. http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01181.html is the original request, since it's been a while since people saw it last.
Re: [google] Add libgcov interface for accessing profile directory (issue8726046)
looks ok. David On Wed, Apr 17, 2013 at 9:00 AM, Teresa Johnson tejohn...@google.com wrote: Patch to add interface for querying the profile directory prefix specified to the -fprofile-generate= option. Google ref b/8629045. Tested with regression tests and internal test. Ok for google branches? 2013-04-17 Teresa Johnson tejohn...@google.com * libgcc/libgcov.c (__gcov_get_profile_prefix): New function. * gcc/tree-profile.c (tree_init_instrumentation): New function. * gcc/coverage.c (get_const_string_type): Make non-static. (coverage_init): Invoke tree_init_instrumentation(). * gcc/coverage.h (get_const_string_type): Declare. (tree_init_instrumentation): Ditto. Index: libgcc/libgcov.c === --- libgcc/libgcov.c(revision 197640) +++ libgcc/libgcov.c(working copy) @@ -183,6 +183,14 @@ unsigned int __gcov_sampling_enabled () return __gcov_has_sampling; } +/* Profile directory prefix specified to -fprofile-generate=. */ +extern char * __gcov_profile_prefix; + +char *__gcov_get_profile_prefix () +{ + return __gcov_profile_prefix; +} + /* Per thread sample counter. */ THREAD_PREFIX gcov_unsigned_t __gcov_sample_counter = 0; Index: gcc/tree-profile.c === --- gcc/tree-profile.c (revision 197640) +++ gcc/tree-profile.c (working copy) @@ -165,6 +165,9 @@ static struct pointer_set_t *instrumentation_to_be /* extern __thread gcov_unsigned_t __gcov_sample_counter */ static GTY(()) tree gcov_sample_counter_decl = NULL_TREE; +/* extern gcov_unsigned_t __gcov_profile_prefix */ +static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE; + /* extern gcov_unsigned_t __gcov_sampling_period */ static tree GTY(()) gcov_sampling_period_decl = NULL_TREE; @@ -407,6 +410,41 @@ cleanup_instrumentation_sampling (void) } } +/* Initialization function for FDO instrumentation. */ + +void +tree_init_instrumentation (void) +{ + if (!gcov_profile_prefix_decl) +{ + tree prefix_ptr; + int prefix_len; + tree prefix_string; + + /* Construct an initializer for __gcov_profile_prefix. */ + gcov_profile_prefix_decl = +build_decl (UNKNOWN_LOCATION, VAR_DECL, +get_identifier (__gcov_profile_prefix), +get_const_string_type ()); + TREE_PUBLIC (gcov_profile_prefix_decl) = 1; + DECL_ARTIFICIAL (gcov_profile_prefix_decl) = 1; + make_decl_one_only (gcov_profile_prefix_decl, + DECL_ASSEMBLER_NAME (gcov_profile_prefix_decl)); + TREE_STATIC (gcov_profile_prefix_decl) = 1; + + prefix_len = strlen (profile_data_prefix); + prefix_string = build_string (prefix_len + 1, profile_data_prefix); + TREE_TYPE (prefix_string) = build_array_type + (char_type_node, build_index_type + (build_int_cst (NULL_TREE, prefix_len))); + prefix_ptr = build1 (ADDR_EXPR, get_const_string_type (), + prefix_string); + + DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr; + varpool_finalize_decl (gcov_profile_prefix_decl); +} +} + /* Initialization function for FDO sampling. */ void Index: gcc/coverage.c === --- gcc/coverage.c (revision 197640) +++ gcc/coverage.c (working copy) @@ -227,7 +227,7 @@ get_gcov_unsigned_t (void) /* Return the type node for const char *. */ -static tree +tree get_const_string_type (void) { return build_pointer_type @@ -2879,6 +2879,7 @@ coverage_init (const char *filename, const char* s /* Define variables which are referenced at runtime by libgcov. */ if (profiling_enabled_p ()) { + tree_init_instrumentation (); tree_init_dyn_ipa_parameters (); init_pmu_profiling (); tree_init_instrumentation_sampling (); Index: gcc/coverage.h === --- gcc/coverage.h (revision 197640) +++ gcc/coverage.h (working copy) @@ -82,6 +82,7 @@ extern bool pmu_data_present (void); extern tree get_gcov_type (void); extern tree get_gcov_unsigned_t (void); +extern tree get_const_string_type (void); /* Mark this module as containing asm statements. */ extern void coverage_has_asm_stmt (void); @@ -89,5 +90,6 @@ extern void coverage_has_asm_stmt (void); /* Defined in tree-profile.c. */ extern void tree_init_instrumentation_sampling (void); extern void tree_init_dyn_ipa_parameters (void); +extern void tree_init_instrumentation (void); #endif -- This patch is available for review at http://codereview.appspot.com/8726046
Re: [patch, fortran] Really fix PR 56782
Hi Mikael, Attached is what I had in mind. And a testcase (the '|| expr-expr_type == EXPR_FUNCTION' in copy_walk_reduction_arg appeared wrong to me, and it was seemingly). Yes, you can also do it this way. OK if accompanied by a ChangeLog entry and my test cases, as well :-) Thomas
Re: [GOOGLE] Add AutoFDO based indirect call value profiling to GCC
https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c File gcc/auto-profile.c (right): https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode161 gcc/auto-profile.c:161: }; Why not sharing the same hist_type enum in value-prof.h? https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode171 gcc/auto-profile.c:171: int count; int -- gcov_type? https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode519 gcc/auto-profile.c:519: static void New line above. https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode550 gcc/auto-profile.c:550: if (actual_count 2) Are the targets sorted? https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode570 gcc/auto-profile.c:570: static void afdo_vpt (gimple stmt, struct gcov_hist *v, int hist_size) New line. https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode914 gcc/auto-profile.c:914: for the given POS_STACK. */ Document new parameters. https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode1193 gcc/auto-profile.c:1193: gcov_functions[i].stacks[j].hist_size = gcov_read_unsigned (); Why is the data not optional? https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode1203 gcc/auto-profile.c:1203: file_names[gcov_read_counter()]; missing white space (many other instances) https://codereview.appspot.com/8814043/diff/1/gcc/tree-inline.c File gcc/tree-inline.c (right): https://codereview.appspot.com/8814043/diff/1/gcc/tree-inline.c#newcode1833 gcc/tree-inline.c:1833: gcov_type count = afdo_get_bb_count (copy_basic_block, false); comment about the 'false' parameter. https://codereview.appspot.com/8814043/diff/1/gcc/value-prof.c File gcc/value-prof.c (right): https://codereview.appspot.com/8814043/diff/1/gcc/value-prof.c#newcode1223 gcc/value-prof.c:1223: if (!strcmp(fname, name)) You can use assembler_name_hash to find the cgraph_node https://codereview.appspot.com/8814043/diff/1/gcc/value-prof.c#newcode1626 gcc/value-prof.c:1626: direct_call1 = find_func_by_global_id (val1); May be add a parameter to find_func_by_global_id (val, is_auto_fdo)? https://codereview.appspot.com/8814043/ On Tue, Apr 16, 2013 at 8:41 PM, Dehao Chen de...@google.com wrote: This patch implements indirect call promotion for AutoFDO. Bootstrapped and passed gcc regression tests. Is it okay for gcc-4_7 branch? Thanks, Dehao http://codereview.appspot.com/8814043 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d17b250..c217846 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3039,7 +3039,7 @@ coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(DIAGNOSTIC_CORE_H) intl.h gt-coverage.h $(TARGET_H) $(AUTO_PROFILE_H) auto-profile.o : auto-profile.c $(CONFIG_H) $(SYSTEM_H) $(FLAGS_H) \ $(BASIC_BLOCK_H) $(DIAGNOSTIC_CORE_H) $(GCOV_IO_H) $(INPUT_H) $(PROFILE_H) \ - $(LANGHOOKS_H) $(OPTS_H) $(TREE_PASS_H) $(CGRAPH_H) $(GIMPLE_H) \ + $(LANGHOOKS_H) $(OPTS_H) $(TREE_PASS_H) $(CGRAPH_H) $(GIMPLE_H) value-prof.h \ $(AUTO_PROFILE_H) cselib.o : cselib.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(RECOG_H) \ diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index cf17370..38f4209 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include gimple.h #include cgraph.h #include tree-flow.h +#include value-prof.h #include auto-profile.h /* The following routines implements AutoFDO optimization. @@ -109,6 +110,8 @@ struct gcov_stack const char *callee_name; struct gcov_callsite_pos *stack; gcov_unsigned_t size; + struct gcov_hist *hist; + gcov_unsigned_t hist_size; gcov_type num_inst; gcov_type count; gcov_type max_count; @@ -150,6 +153,24 @@ struct afdo_module char **strings; }; +enum afdo_histogram_type +{ + CALL_HIST = 1, + STRINGOP_HIST, + DIVMOD_HIST +}; + +struct gcov_hist +{ + enum afdo_histogram_type type; + union +{ + const char *func_name; + unsigned long long value; +} value; + int count; +}; + /* Store the file name strings read from the profile data file. */ static const char **file_names; @@ -493,6 +514,64 @@ read_aux_modules (void) } } +/* From AutoFDO profiles, find values inside STMT for that we want to measure + histograms for indirect-call optimization. */ +static void +afdo_indirect_call (gimple stmt, struct gcov_hist *values, int hist_size) +{ + tree callee; + int i, total = 0; + int actual_count = 0; + histogram_value hist; + + if (gimple_code (stmt) != GIMPLE_CALL + || gimple_call_fndecl (stmt) != NULL_TREE) +return; + + callee = gimple_call_fn (stmt); + + for (i = 0; i hist_size; i++) +if (values[i].type == CALL_HIST) + break; + + if (i == hist_size) +return; +
Re: [PATCH] Add a new option -fstack-protector-strong
Thanks. On Wed, Apr 17, 2013 at 2:26 AM, Florian Weimer fwei...@redhat.com wrote: On 04/17/2013 02:49 AM, Han Shen wrote: Indentation is off (unless both mail clients I tried are clobbering your patch). I think the GNU coding style prohibits the braces around the single-statement body of the outer 'for. Done with indentation properly on and removed the braces. (GMail composing window drops all the tabs when pasting... I have to use Thunderbird to paste the patch. Hope it is right this time) Thunderbird mangles patches as well, but I was able to repair the damage. When using Thunderbird, please send the patch as a text file attachment. You can put the changelog snippets at the beginning of the file as well. This way, everything is sent out unchanged. Patch sent as attachment. Can you make the conditional more similar to the comment, perhaps using a switch statement on the value of the flag_stack_protect variable? That's going to be much easier to read. Re-coded. Now using 'switch-case'. Thanks. I think the comment is now redundant because it matches the code almost word-for-word. 8-) Comment deleted. No for 'struct-returning' functions. But I regard this not an issue --- at the programming level, there is no way to get one's hand on the address of a returned structure --- struct Node foo(); struct Node *p = foo(); // compiler error - lvalue required as unary '' operand. C++ const references can bind to rvalues. 2013-04-11 Jakub Jelinek ja...@redhat.com But I'm more worried about the interaction with the return value optimization. Consider this C++ code: struct S { S(); int a; int b; int c; int d; int e; }; void f1(int *); S f2() { S s; f1(s.a); return s; } S g2(); void g3() { S s = g2(); } void g3b(const S); void g3b() { g3b(g2()); } With your patch and -O2 -fstack-protector-strong, this generates the following assembly: .globl _Z2f2v .type _Z2f2v, @function _Z2f2v: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq%rdi, %rbx call_ZN1SC1Ev movq%rbx, %rdi call_Z2f1Pi movq%rbx, %rax popq%rbx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size _Z2f2v, .-_Z2f2v .p2align 4,,15 .globl _Z2g3v .type _Z2g3v, @function _Z2g3v: .LFB1: .cfi_startproc subq$40, %rsp .cfi_def_cfa_offset 48 movq%rsp, %rdi call_Z2g2v addq$40, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE1: .size _Z2g3v, .-_Z2g3v .p2align 4,,15 .globl _Z3g3bv .type _Z3g3bv, @function _Z3g3bv: .LFB2: .cfi_startproc subq$40, %rsp .cfi_def_cfa_offset 48 movq%rsp, %rdi movq%fs:40, %rax movq%rax, 24(%rsp) xorl%eax, %eax call_Z2g2v movq%rsp, %rdi call_Z3g3bRK1S movq24(%rsp), %rax xorq%fs:40, %rax jne .L9 addq$40, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L9: .cfi_restore_state .p2align 4,,6 call__stack_chk_fail .cfi_endproc .LFE2: .size _Z3g3bv, .-_Z3g3bv Here, g3b() is correctly instrumented, and f2() does not need instrumentation (because space for the returned object is not part of the local frame). But an address on the stack escapes in g3() and is used for the return value of the call to g2(). This requires instrumentation, which is missing in this example. I suppose this can be handled in a follow-up patch if necessary. Thanks for the case. Yeah, I see the problem here - in cfgexpand phase - where functions being scanned for stack protection - the tree representation has no indication that a local address is computed, just as listed below - void g3() () { struct S s; bb 2: s = g2 (); [return slot optimization] s ={v} {CLOBBER}; return; } One solution might be to scan for the [return slot optimization] tag in the tree. I'll post later another patch to address this. ChangeLog and patch below -- gcc/ChangeLog 2013-04-16 Han Shen shen...@google.com * cfgexpand.c (record_or_union_type_has_array_p): Helper function to check if a record or union contains an array field. I think the GNU convention is to write only this: * cfgexpand.c (record_or_union_type_has_array_p): New function. Done. (expand_used_vars): Add logic handling '-fstack-protector-strong'. * common.opt (fstack-protector-all): New option. Should be fstack-protector-strong. Done. -- Florian Weimer / Red Hat Product Security Team Patch attached as plain txt. Thanks,
Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue
On 04/17/2013 09:49 AM, Martin Jambor wrote: The reason why it helps so much is that before register allocation there are instructions moving the value of actual arguments from originally hard register (e.g. SI, DI, etc.) to a pseudo at the beginning of each function. When the argument is live across a function call, the pseudo is likely to be assigned to a callee-saved register and then also accessed from that register, even in the first BB, making it require prologue, though it could be fetched from the original one. When we convert all uses (at least in the first BB) to the original register, the preparatory stage of shrink wrapping is often capable of moving the register moves to a later BB, thus creating fast paths which do not require prologue and epilogue. I noticed similar effects when looking at range splitting. Being able to move those calls into a deeper control level in the CFG would definitely be an improvement. We believe this change in the pipeline should not bring about any negative effects. During gcc bootstrap, the number of instructions changed by pass_cprop_hardreg dropped but by only 1.2%. We have also ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all run time differences could be attributed to noise. The changes in binary sizes were also small: Did anyone ponder just doing the hard register propagation on argument registers prior the prologue/epilogue handling, then the full blown propagation pass in its current location in the pipeline? That would get you the benefit you're seeking and minimize other effects. Of course if you try that and get effectively the same results as moving the full propagation pass before prologue/epilogue handling then the complexity of only propagating argument registers early is clearly not needed and we'd probably want to go with your patch as-is. jeff
Re: Fix std::pair std::is_copy_assignable behavior
Hi Here is an other proposal to fix std::is_copy_assignablestd::pair. This is not perfect because I have adapted it to current compiler behavior but it is still better than current behavior and enough to commit the unordered C++11 allocator adaptation afterward. It will give me more time to work on the Standard modification proposal to avoid the partial template specialization used for the moment. Thanks to current resolution of DR 1402 we can already define the move assignment operator as default, if deleted the template move assignment operator will be considered and potentially used instead. 2013-04-17 François Dumont fdum...@gcc.gnu.org * include/bits/stl_pair.h (operator=(const pair)): Add noexcept qualification. (operator=(pair)): Use default implementation. (template operator=(const pair)): Add noexcept qualification. Enable if is_assignableT, const U true for both parameter types. (template operator=(pair)): Add noexcept qualification. Enable if is_assignableT, U true for both parameter types. (std::is_copy_assignable, std::is_move_assignable): Add partial specialization. * testsuite/20_util/pair/is_move_assignable.cc: New. * testsuite/20_util/pair/is_copy_assignable.cc: Likewise. * testsuite/20_util/pair/is_assignable.cc: Likewise. * testsuite/20_util/pair/is_nothrow_move_assignable.cc: Likewise. * testsuite/20_util/pair/assign_neg.cc: Likewise. * testsuite/20_util/pair/is_nothrow_copy_assignable.cc: Likewise. * testsuite/20_util/pair/assign.cc: Likewise. Tested under Linux x86_64. Ok to commit ? François Index: include/bits/stl_pair.h === --- include/bits/stl_pair.h (revision 197829) +++ include/bits/stl_pair.h (working copy) @@ -156,6 +156,8 @@ pair operator=(const pair __p) + noexcept(__and_is_nothrow_copy_assignable_T1, + is_nothrow_copy_assignable_T2::value) { first = __p.first; second = __p.second; @@ -163,18 +165,15 @@ } pair - operator=(pair __p) - noexcept(__and_is_nothrow_move_assignable_T1, - is_nothrow_move_assignable_T2::value) - { - first = std::forwardfirst_type(__p.first); - second = std::forwardsecond_type(__p.second); - return *this; - } + operator=(pair) = default; templateclass _U1, class _U2 - pair + typename enable_if__and_is_assignable_T1, const _U1, + is_assignable_T2, const _U2::value, + pair::type operator=(const pair_U1, _U2 __p) + noexcept(__and_is_nothrow_assignable_T1, const _U1, + is_nothrow_assignable_T2, const _U2::value) { first = __p.first; second = __p.second; @@ -182,8 +181,12 @@ } templateclass _U1, class _U2 - pair + typename enable_if__and_is_assignable_T1, _U1, + is_assignable_T2, _U2::value, + pair::type operator=(pair_U1, _U2 __p) + noexcept(__and_is_nothrow_assignable_T1, _U1, + is_nothrow_assignable_T2, _U2::value) { first = std::forward_U1(__p.first); second = std::forward_U2(__p.second); @@ -287,6 +290,19 @@ { return pair_T1, _T2(__x, __y); } #endif +#if __cplusplus = 201103L + templateclass _T1, class _T2 +struct is_copy_assignablestd::pair_T1, _T2 + : public __and_std::is_copy_assignable_T1, + std::is_copy_assignable_T2::type +{ }; + + templateclass _T1, class _T2 +struct is_move_assignablestd::pair_T1, _T2 + : public __and_std::is_move_assignable_T1, + std::is_move_assignable_T2::type +{ }; +#endif /// @} _GLIBCXX_END_NAMESPACE_VERSION Index: testsuite/20_util/pair/is_assignable.cc === --- testsuite/20_util/pair/is_assignable.cc (revision 0) +++ testsuite/20_util/pair/is_assignable.cc (revision 0) @@ -0,0 +1,48 @@ +// { dg-do compile } +// { dg-options -std=c++11 } + +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +#include type_traits +#include utility + +struct T1 +{ +}; + +struct T2 +{ + T2 + operator=(const T1); +}; + +typedef std::pairint, int pt1; +typedef std::pairshort, short pt2; +typedef std::pairint, double pt3; +typedef std::pairint, T1 pt4; +typedef
Re: Fix std::pair std::is_copy_assignable behavior
On 4/17/13 8:10 PM, François Dumont wrote: Hi Here is an other proposal to fix std::is_copy_assignablestd::pair. Sorry, I'm still missing something very, very basic: which behavior is conforming, the current one or what we would get instead? If the former, is there a DR arguing for the latter? Paolo.
Re: [patch] simplify emit_delay_sequence
On Wed, Apr 17, 2013 at 10:39 AM, Eric Botcazou ebotca...@adacore.com wrote: @@ -538,6 +502,8 @@ emit_delay_sequence (rtx insn, rtx list, int lengt INSN_LOCATION (seq_insn) = INSN_LOCATION (tem); INSN_LOCATION (tem) = 0; + /* Remove any REG_DEAD notes because we can't rely on them now +that the insn has been moved. */ for (note = REG_NOTES (tem); note; note = next) { next = XEXP (note, 1); Did you mean to move the comment instead of duplicating it? No, it's a merge mistake. My copy of this function in sched-dbr doesn't have the switch (it doesn't delete insns and re-inserts them, it simply moves them, so updating label notes is not necessary) and I just copied the comment before the loop. Then I merged back some of the changes to reorg.c and messed up :-) Thanks for the review. Committed. Ciao! Steven
Re: Fix std::pair std::is_copy_assignable behavior
On 04/17/2013 09:18 PM, Paolo Carlini wrote: On 4/17/13 8:10 PM, François Dumont wrote: Hi Here is an other proposal to fix std::is_copy_assignablestd::pair. Sorry, I'm still missing something very, very basic: which behavior is conforming, the current one or what we would get instead? If the former, is there a DR arguing for the latter? Paolo. The behavior I am targeting is std::is_copy_asignablestd::pairconst int, int to be std::false_type for instance. I have added test for many other use cases. More generally I need that when std::is_copy_assignableT is std::true_type then writing a = b, with a and b being T, does compile. Otherwise this patch just make std::pair match the Standard requirements like at 20.3.2.17. Do you want me to add a bug report in Bugzilla first ? François
Re: Fix std::pair std::is_copy_assignable behavior
Hi, On 4/17/13 8:43 PM, François Dumont wrote: On 04/17/2013 09:18 PM, Paolo Carlini wrote: On 4/17/13 8:10 PM, François Dumont wrote: Hi Here is an other proposal to fix std::is_copy_assignablestd::pair. Sorry, I'm still missing something very, very basic: which behavior is conforming, the current one or what we would get instead? If the former, is there a DR arguing for the latter? Paolo. The behavior I am targeting is std::is_copy_asignablestd::pairconst int, int to be std::false_type for instance. I have added test for many other use cases. More generally I need that when std::is_copy_assignableT is std::true_type then writing a = b, with a and b being T, does compile. Otherwise this patch just make std::pair match the Standard requirements like at 20.3.2.17. Do you want me to add a bug report in Bugzilla first ? I'm not talking about GCC's Bugzilla, I meant an ISO DR: if we don't have a DR and preferably a vague support of LWG people, I think we should not change the behavior of our std::is_copy_assignable only because it helps our implementation of other facilities. Paolo.
Re: [patch] PR56729
On Fri, Mar 29, 2013 at 1:05 PM, Steven Bosscher wrote: It looks like there are places in the middle end that use remove_insn on insns that are not actually emitted. This breaks the assert I added in df_insn_delete. The patch disables the assert for now. The comment before the assert is now even messier than before but I think it's better to explain why the assert cannot work than to remove the comment and the assert altogether. This is no longer necessary, now that remove_insn doesn't use df_insn_delete. Only a small patch to lower-subreg.c is needed to restore the check. Bootstrappedtested (unix{,-m32}) on powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven * lower-subreg.c (resolve_simple_move): If called self-recursive, do not delete_insn insns that have not yet been emitted, only unlink them with remove_insn. * df-scan.c (df_insn_delete): Revert r197492. Index: lower-subreg.c === --- lower-subreg.c (revision 198002) +++ lower-subreg.c (working copy) @@ -1069,7 +1069,13 @@ resolve_simple_move (rtx set, rtx insn) emit_insn_before (insns, insn); - delete_insn (insn); + /* If we get here via self-recursion, then INSN is not yet in the insns + chain and delete_insn will fail. We only want to remove INSN from the + current sequence. See PR56738. */ + if (in_sequence_p ()) +remove_insn (insn); + else +delete_insn (insn); return insns; } Index: df-scan.c === --- df-scan.c (revision 198002) +++ df-scan.c (working copy) @@ -1158,17 +1158,7 @@ df_insn_delete (rtx insn) In any case, we expect BB to be non-NULL at least up to register allocation, so disallow a non-NULL BB up to there. Not perfect but better than nothing... */ - /* ??? bb can also be NULL if lower-subreg.c:resolve_simple_mov emits - an insn into a sequence and then does delete_insn on it. Not sure - if that makes sense, but for now it means this assert cannot work. - See PR56738. - Disable for now but revisit before the end of GCC 4.9 stage1. */ -#if 0 gcc_checking_assert (bb != NULL || reload_completed); -#else - if (bb == NULL) -return; -#endif df_grow_bb_info (df_scan); df_grow_reg_info ();
Re: [patch] PR56729
On 04/17/2013 02:12 PM, Steven Bosscher wrote: On Fri, Mar 29, 2013 at 1:05 PM, Steven Bosscher wrote: It looks like there are places in the middle end that use remove_insn on insns that are not actually emitted. This breaks the assert I added in df_insn_delete. The patch disables the assert for now. The comment before the assert is now even messier than before but I think it's better to explain why the assert cannot work than to remove the comment and the assert altogether. This is no longer necessary, now that remove_insn doesn't use df_insn_delete. Only a small patch to lower-subreg.c is needed to restore the check. Bootstrappedtested (unix{,-m32}) on powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven * lower-subreg.c (resolve_simple_move): If called self-recursive, do not delete_insn insns that have not yet been emitted, only unlink them with remove_insn. * df-scan.c (df_insn_delete): Revert r197492. OK. Jeff
Re: Fix std::pair std::is_copy_assignable behavior
On 04/17/2013 10:02 PM, Paolo Carlini wrote: Hi, On 4/17/13 8:43 PM, François Dumont wrote: On 04/17/2013 09:18 PM, Paolo Carlini wrote: On 4/17/13 8:10 PM, François Dumont wrote: Hi Here is an other proposal to fix std::is_copy_assignablestd::pair. Sorry, I'm still missing something very, very basic: which behavior is conforming, the current one or what we would get instead? If the former, is there a DR arguing for the latter? Paolo. The behavior I am targeting is std::is_copy_asignablestd::pairconst int, int to be std::false_type for instance. I have added test for many other use cases. More generally I need that when std::is_copy_assignableT is std::true_type then writing a = b, with a and b being T, does compile. Otherwise this patch just make std::pair match the Standard requirements like at 20.3.2.17. Do you want me to add a bug report in Bugzilla first ? I'm not talking about GCC's Bugzilla, I meant an ISO DR: if we don't have a DR and preferably a vague support of LWG people, I think we should not change the behavior of our std::is_copy_assignable only because it helps our implementation of other facilities. Paolo. I really don't get it. Is current behavior really Standard compliant ? I don't think so and would like to fix it. The Standard says that pair operator=(const pair) requires that both is_copy_assignablefirst_type and is_copy_assignablesecond_type to be true. With std::pairconst int, int, is_copy_assignableconst int is false and then the operator is ill formed. It is if you try to call it but if you check is_copy_assignablepairconst int, int it says true. Do you see it as a correct behavior ? Do you really think that it requires an ISO DR ? François
Re: [PATCH, tree-ssa] Avoid -Wuninitialized warning in try_unroll_loop_completely()
On 04/15/2013 08:35 PM, Chung-Ju Wu wrote: You are right. After doing survey on http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings and reading related discussion thread, I realized it is a complicated detection and this is a false positive case. I was using gcc-4.6.3, which is provided by Ubuntu 12.04, and the warning is displayed during the compilation process. As I tried to build another native gcc by myself with current main trunk and used it to compile tree-ssa-loop-ivcanon.c again, there is no such warning at all. (See attachment for my console output.) So I am wondering if my patch is still valuable since such false positive warning is already fixed on trunk. Thanks for checking on this stuff. My preference would be to not add the initialization unless we're currently seeing false positive warnings with the trunk. Anytime we add a dummy initialization like this to avoid a false positive, we run the risk of missing real bugs later if the nearby code is changed in such a way as to expose an uninitialized use, which we'd like to catch, but can't if we're inserted a dummy initialization. Jeff
Re: [Patch, Fortran] PR39505 - add support for !GCC$ attributes NO_ARG_CHECK
Hi Tobias, Is my -f(no-)directives patch okay? Or do you envision something else? In principle, it is OK; the only question is what the default should be :-) For OpenMP, we require an option to change the semantics of a program based on special comments. Currently, we do not do so for directives which do the same thing. So, what should we do? Does anybody else have an opinion here? I'm willing to go with the majority here. Thomas
Re: [Patch, Fortran] PR39505 - add support for !GCC$ attributes NO_ARG_CHECK
Thomas Koenig wrote: Is my -f(no-)directives patch okay? Or do you envision something else? In principle, it is OK; the only question is what the default should be :-) I am in favor of on. For OpenMP, we require an option to change the semantics of a program based on special comments. Currently, we do not do so for directives which do the same thing. Well, I see a difference here: (Nearly) all program using OpenMP work also as serial program (-fno-openmp). On the other hand, without !GCC$ attributes directives (or the C equivalent: __attribute__((...))), the program does not work properly. The attributes are used to be able to express some feature which is not available in the standard but still in some way required: stdcall, fastcall, dllimport, dllexport. unused arguments, weak bindings (not yet for Fortran), disabling argument checking (only Fortran), etc. Tobias
Re: [patch] fix verify_rtl_sharing handling of SEQUENCEs
On Tue, Apr 16, 2013 at 6:39 PM, Steven Bosscher wrote: My new delay branch scheduler uses TODO_verify_rtl_sharing but it turns out that verify_rtl_sharing doesn't handle SEQUENCEs correctly: Clearing the used-flags is done correctly, but verifying insns in the SEQUENCE fails. The problem is that every insn in the SEQUENCE is marked used via PATTERN(SEQUENCE) and also via PREV_INSN/NEXT_INSN of the insns in the SEQUENCE. Fixed with the attached patch. Bootstrapepdtested on sparc64-unknown-linux-gnu, and cross-builttested on mips-elf, both with TODO_verify_rtl_sharing added to the passes in reorg.c. Will commit as obvious. Andreas Krebbel's patch (http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00714.html) makes the problem go away for me, so I'm not going to commit this patch after all. Hmm, no it hasn't. What happens is this: reset_all_used_flags resets the used flags via mark_used_flags, which doesn't mark or unmark insns: case DEBUG_INSN: case INSN: case JUMP_INSN: case CALL_INSN: case NOTE: case LABEL_REF: case BARRIER: /* The chain of insns is not being copied. */ return; But verify_rtx_sharing sets the used flag on insns if they are reached via a SEQUENCE. So the first verify_rtl_sharing call with SEQUENCEs in the insn chain passes, but a second call will fail because the used flags on insns in the SEQUENCE isn't cleared. So, updated patch attached, will commit after testing on sparc64-linux. Ciao! Steven * emit-rtl.c (reset_insn_used_flags): New function. (reset_all_used_flags): Use it. (verify_insn_sharing): New function. (verify_rtl_sharing): Fix verification for SEQUENCEs. Index: emit-rtl.c === --- emit-rtl.c (revision 198036) +++ emit-rtl.c (working copy) @@ -2596,6 +2596,18 @@ verify_rtx_sharing (rtx orig, rtx insn) return; } +/* Reset used-flags for INSN. */ + +static void +reset_insn_used_flags (rtx insn) +{ + gcc_assert (INSN_P (insn)); + reset_used_flags (PATTERN (insn)); + reset_used_flags (REG_NOTES (insn)); + if (CALL_P (insn)) +reset_used_flags (CALL_INSN_FUNCTION_USAGE (insn)); +} + /* Go through all the RTL insn bodies and clear all the USED bits. */ static void @@ -2606,28 +2618,30 @@ reset_all_used_flags (void) for (p = get_insns (); p; p = NEXT_INSN (p)) if (INSN_P (p)) { - reset_used_flags (PATTERN (p)); - reset_used_flags (REG_NOTES (p)); - if (CALL_P (p)) - reset_used_flags (CALL_INSN_FUNCTION_USAGE (p)); - if (GET_CODE (PATTERN (p)) == SEQUENCE) - { - int i; - rtx q, sequence = PATTERN (p); - - for (i = 0; i XVECLEN (sequence, 0); i++) - { - q = XVECEXP (sequence, 0, i); - gcc_assert (INSN_P (q)); - reset_used_flags (PATTERN (q)); - reset_used_flags (REG_NOTES (q)); - if (CALL_P (q)) - reset_used_flags (CALL_INSN_FUNCTION_USAGE (q)); - } - } + rtx pat = PATTERN (p); + if (GET_CODE (pat) != SEQUENCE) + reset_insn_used_flags (p); + else + { + gcc_assert (REG_NOTES (p) == NULL); + for (int i = 0; i XVECLEN (pat, 0); i++) + reset_insn_used_flags (XVECEXP (pat, 0, i)); + } } } +/* Verify sharing in INSN. */ + +static void +verify_insn_sharing (rtx insn) +{ + gcc_assert (INSN_P (insn)); + reset_used_flags (PATTERN (insn)); + reset_used_flags (REG_NOTES (insn)); + if (CALL_P (insn)) +reset_used_flags (CALL_INSN_FUNCTION_USAGE (insn)); +} + /* Go through all the RTL insn bodies and check that there is no unexpected sharing in between the subexpressions. */ @@ -2643,10 +2657,12 @@ verify_rtl_sharing (void) for (p = get_insns (); p; p = NEXT_INSN (p)) if (INSN_P (p)) { - verify_rtx_sharing (PATTERN (p), p); - verify_rtx_sharing (REG_NOTES (p), p); - if (CALL_P (p)) - verify_rtx_sharing (CALL_INSN_FUNCTION_USAGE (p), p); + rtx pat = PATTERN (p); + if (GET_CODE (pat) != SEQUENCE) + verify_insn_sharing (p); + else + for (i = 0; i XVECLEN (pat, 0); i++) + verify_insn_sharing (XVECEXP (pat, 0, i)); } reset_all_used_flags ();
Re: GCC does not support *mmintrin.h with function specific opts
+HJ On Tue, Apr 16, 2013 at 1:54 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, I have attached an updated patch that addresses all the comments raised. On Fri, Apr 12, 2013 at 1:58 AM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Apr 11, 2013 at 12:05:41PM -0700, Sriraman Tallam wrote: I have attached a patch that fixes this. I have added an option -mgenerate-builtins that will do two things. It will define a macro __ALL_ISA__ which will expose the *intrin.h functions. It will also expose all the target specific builtins. -mgenerate-builtins will not affect code generation. 1) this shouldn't be an option, either it can be made to work reliably, then it should be done always, or it can't, then it shouldn't be done Ok, it is on by default now. There is a way to turn it off, with -mno-generate-builtins. 2) have you verified that if you always generate all builtins, that the builtins not supported by the ISA selected from the command line are created with the right vector modes? This issue does not arise. When the target builtin is expanded, it is checked if the ISA support is there, either via function specific target opts or global target opts. If not, an error is raised. Test case added for this, please see intrinsic_4.c in patch. 3) the *intrin.h headers in the case where the guarding macro isn't defined should be surrounded by something like #ifndef __FMA4__ #pragma GCC push options #pragma GCC target(fma4) #endif ... #ifndef __FMA4__ #pragma GCC pop options #endif so that everything that is in the headers is compiled with the ISA in question I do not think this should be done because it will break the inlining ability of the header function and cause issues if the caller does not specify the required ISA. The fact that the header functions are marked extern __inline, with gnu_inline guarantees that a body will not be generated and they will be inlined. If the caller does not have the required ISA, appropriate errors will be raised. Test cases added, see intrinsics_1.c, intrinsics_2.c 4) what happens if you use the various vector types typedefed in the *intrin.h headers in code that doesn't support those ISAs? As TYPE_MODE for VECTOR_TYPE is a function call, perhaps it will just be handled as generic BLKmode vectors, which is desirable I think I checked some tests here. With -mno-sse for instance, vector types are not permitted in function arguments and return values and gcc raises a warning/error in each case. With return values, gcc always gives an error if a SSE register is required in a return value. I even fixed this message to not do it for functions marked as extern inline, with gnu_inline keyword as a body for them will not be generated. 5) what happens if you use a target builtin in a function not supporting the corresponding ISA, do you get proper error explaining what you are doing wrong? Yes, please sse intrinsic_4.c test in patch. 6) what happens if you use some intrinsics in a function not supporting the corresponding ISA? Dunno if the inliner chooses not to inline it and error out because it is always_inline, or what exactly will happen then Same deal here. The intrinsic function will, guaranteed, to be inlined into the caller which will be a corresponding builtin call. That builtin call will trigger an error if the ISA is not supported. Thanks Sri For all this you certainly need testcases. Jakub
[PATCH ARM]Extend thumb1_reorg to save more comparison instructions
Hi, Before thumb1_reorg, ARM backend uses peephole to save comparison instructions when a flag setting move is found before branch instruction. Since we are using thumb1_reog now, it can be extended to catch more opportunities by searching flag setting move instruction before branch, rather than only the exact one before branch. For example: mov r0, r1 //other insns does not kill r0 branch if (r0 == 0) //other insns Tested on thumb1, is it OK? 2013-04-18 Bin Cheng bin.ch...@arm.com * config/arm/arm.c (thumb1_reorg): Search for flag setting insn before branch in same basic block. Check both src and dest of the move insn. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 197562) +++ gcc/config/arm/arm.c(working copy) @@ -14026,6 +14026,7 @@ thumb1_reorg (void) rtx set, dest, src; rtx pat, op0; rtx prev, insn = BB_END (bb); + bool insn_clobbered = false; while (insn != BB_HEAD (bb) DEBUG_INSN_P (insn)) insn = PREV_INSN (insn); @@ -14034,12 +14035,29 @@ thumb1_reorg (void) if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) continue; - /* Find the first non-note insn before INSN in basic block BB. */ + /* Get the register with which we are comparing. */ + pat = PATTERN (insn); + op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); + + /* Find the first flag setting insn before INSN in basic block BB. */ gcc_assert (insn != BB_HEAD (bb)); - prev = PREV_INSN (insn); - while (prev != BB_HEAD (bb) (NOTE_P (prev) || DEBUG_INSN_P (prev))) - prev = PREV_INSN (prev); + for (prev = PREV_INSN (insn); + (!insn_clobbered +prev != BB_HEAD (bb) +(NOTE_P (prev) + || DEBUG_INSN_P (prev) + || (GET_CODE (prev) == SET +get_attr_conds (prev) == CONDS_NOCOND))); + prev = PREV_INSN (prev)) + { + if (reg_set_p (op0, prev)) + insn_clobbered = true; + } + /* Skip if op0 is clobbered by insn other than prev. */ + if (insn_clobbered) + continue; + set = single_set (prev); if (!set) continue; @@ -14050,12 +14068,9 @@ thumb1_reorg (void) || !low_register_operand (src, SImode)) continue; - pat = PATTERN (insn); - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); /* Rewrite move into subtract of 0 if its operand is compared with ZERO -in INSN. Don't need to check dest since cprop_hardreg pass propagates -src into INSN. */ - if (REGNO (op0) == REGNO (src)) +in INSN. Both src and dest of the move insn are checked. */ + if (REGNO (op0) == REGNO (src) || REGNO (op0) == REGNO (dest)) { dest = copy_rtx (dest); src = copy_rtx (src);