Re: Propagate profile counts after switch case expansion (issue5896043)
Ping. On Mon, Apr 9, 2012 at 2:33 PM, Easwaran Raman wrote: > On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman wrote: >> On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka wrote: This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? >>> >>> Hi, >>> while this is resonable thing to do, I belive it would make most sense >>> to make switch lowering at gimple level: i.e. reorganize the existing >>> code to produce gimple statement and change expansion code to produce >>> tablejump from every gimple switch statement that was left in the >>> code. >>> >>> This would be both cleaner and would allow gimple optimizers to improve the >>> generated code. Incrementally it would also allow us to improve switch >>> exansion >>> that is quite baroque and not realy producing very good code in some common >>> cases. >>> >>> If you would be interested working on this (i.e. reorganizing the expansion >>> code to produce gimple), I would be very happy. If not, I can review the >>> profile updating part for mainline, since in any case this is desriable >>> thing >>> to do. >> >> I am planning to explore improvements to switch expansion (peeling >> some cases and using jump tables for the rest, for example) and I >> think the reorganization you suggest is the right way to do such >> improvements. But until I can spend time on it and get it done, I >> would like this patch to get reviewed and checked in. >> >> Thanks, >> Easwaran > > Ping.
Re: Propagate profile counts after switch case expansion (issue5896043)
On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman wrote: > On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka wrote: >>> This patch propagates execution count of thee case labels of a >>> switch-case statement after its expansion. Bootstraps and all >>> tests pass. OK for trunk? >> >> Hi, >> while this is resonable thing to do, I belive it would make most sense >> to make switch lowering at gimple level: i.e. reorganize the existing >> code to produce gimple statement and change expansion code to produce >> tablejump from every gimple switch statement that was left in the >> code. >> >> This would be both cleaner and would allow gimple optimizers to improve the >> generated code. Incrementally it would also allow us to improve switch >> exansion >> that is quite baroque and not realy producing very good code in some common >> cases. >> >> If you would be interested working on this (i.e. reorganizing the expansion >> code to produce gimple), I would be very happy. If not, I can review the >> profile updating part for mainline, since in any case this is desriable thing >> to do. > > I am planning to explore improvements to switch expansion (peeling > some cases and using jump tables for the rest, for example) and I > think the reorganization you suggest is the right way to do such > improvements. But until I can spend time on it and get it done, I > would like this patch to get reviewed and checked in. > > Thanks, > Easwaran Ping.
Re: Propagate profile counts after switch case expansion (issue5896043)
On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka wrote: >> This patch propagates execution count of thee case labels of a >> switch-case statement after its expansion. Bootstraps and all >> tests pass. OK for trunk? > > Hi, > while this is resonable thing to do, I belive it would make most sense > to make switch lowering at gimple level: i.e. reorganize the existing > code to produce gimple statement and change expansion code to produce > tablejump from every gimple switch statement that was left in the > code. > > This would be both cleaner and would allow gimple optimizers to improve the > generated code. Incrementally it would also allow us to improve switch > exansion > that is quite baroque and not realy producing very good code in some common > cases. > > If you would be interested working on this (i.e. reorganizing the expansion > code to produce gimple), I would be very happy. If not, I can review the > profile updating part for mainline, since in any case this is desriable thing > to do. I am planning to explore improvements to switch expansion (peeling some cases and using jump tables for the rest, for example) and I think the reorganization you suggest is the right way to do such improvements. But until I can spend time on it and get it done, I would like this patch to get reviewed and checked in. Thanks, Easwaran > Honza
Re: Propagate profile counts after switch case expansion (issue5896043)
> This patch propagates execution count of thee case labels of a > switch-case statement after its expansion. Bootstraps and all > tests pass. OK for trunk? Hi, while this is resonable thing to do, I belive it would make most sense to make switch lowering at gimple level: i.e. reorganize the existing code to produce gimple statement and change expansion code to produce tablejump from every gimple switch statement that was left in the code. This would be both cleaner and would allow gimple optimizers to improve the generated code. Incrementally it would also allow us to improve switch exansion that is quite baroque and not realy producing very good code in some common cases. If you would be interested working on this (i.e. reorganizing the expansion code to produce gimple), I would be very happy. If not, I can review the profile updating part for mainline, since in any case this is desriable thing to do. Honza
Re: Propagate profile counts after switch case expansion (issue5896043)
> Do you mean use the weights to decide the shape of the binary tree Yes. > (similar to COST_TABLE heuristic)? COST_TABLE should die I hope. > I am planning to send a separate patch for that. Great. -Andi
Re: Propagate profile counts after switch case expansion (issue5896043)
On Fri, Mar 23, 2012 at 3:29 PM, Andi Kleen wrote: > Easwaran Raman writes: > >> Some more background on this patch: Right now, while the execution >> counts of different case labels of a switch statement are obtained >> during profile collection, they are not propagated to RTL. Instead, >> counts are regenerated at the RTL level using static heuristics that >> tend to weigh branches equally which can cause poor optimization of >> hot code. This patch ensures that the counts collected during profile >> collection are correctly propagated allowing hot code to be better >> optimized by RTL optimizations. Patch tested on x86_64. > > I think your patch doesn't use the probably to weight the decision > tree for non tablejump, right? I looked at this some time ago, > but the patch always had problems. Do you mean use the weights to decide the shape of the binary tree (similar to COST_TABLE heuristic)? I am planning to send a separate patch for that. This one just makes sure that the profile counts are propagated correctly. So you will still have a situation where a branch corresponding to an infrequently executed case dominates a frequently executed case, but the BB of the cases gets the right profile weight. - Easwaran > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only
Re: Propagate profile counts after switch case expansion (issue5896043)
Easwaran Raman writes: > Some more background on this patch: Right now, while the execution > counts of different case labels of a switch statement are obtained > during profile collection, they are not propagated to RTL. Instead, > counts are regenerated at the RTL level using static heuristics that > tend to weigh branches equally which can cause poor optimization of > hot code. This patch ensures that the counts collected during profile > collection are correctly propagated allowing hot code to be better > optimized by RTL optimizations. Patch tested on x86_64. I think your patch doesn't use the probably to weight the decision tree for non tablejump, right? I looked at this some time ago, but the patch always had problems. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: Propagate profile counts after switch case expansion (issue5896043)
Some more background on this patch: Right now, while the execution counts of different case labels of a switch statement are obtained during profile collection, they are not propagated to RTL. Instead, counts are regenerated at the RTL level using static heuristics that tend to weigh branches equally which can cause poor optimization of hot code. This patch ensures that the counts collected during profile collection are correctly propagated allowing hot code to be better optimized by RTL optimizations. Patch tested on x86_64. - Easwaran On Fri, Mar 23, 2012 at 10:43 AM, Easwaran Raman wrote: > This patch propagates execution count of thee case labels of a > switch-case statement after its expansion. Bootstraps and all > tests pass. OK for trunk? > > 2012-03-23 Easwaran Raman > > * cfgbuild.c (non_zero_profile_counts): New function. > (compute_outgoing_frequencies): If at least one successor of a > BB has non-zero profile count, retain the counts. > * expr.c (do_tablejump): Add a REG_BR_PROB note on the > jump to default label. > (try_tablejump): Add a parameter to specify the probability > of jumping to the default label. > * expr.h (try_tablejump): Add a new parameter. > * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. > (add_case_node): Pass execution count of the case node and use > it to initialize COUNT field. > (case_probability): New macro. > (expand_case): Propagate execution counts to generated > branches using REG_BR_PROB notes. > (emit_case_nodes): Likewise. > (do_jump_if_equal): Pass probability for REG_BR_PROB note. > (compute_subtree_counts): New function to compute > SUBTREE_COUNT fields of case nodes. > (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the > given probability to the last generated instruction. > > gcc/testsuite/ChangeLog: > 2012-03-23 Easwaran Raman > * gcc.dg/tree-prof/switch-case-1.c: New test case. > * gcc.dg/tree-prof/switch-case-2.c: New test case. > > diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c > index 692fea8..d75fbda 100644 > --- a/gcc/cfgbuild.c > +++ b/gcc/cfgbuild.c > @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb) > purge_dead_tablejump_edges (bb, table); > } > > +/* Check if there is at least one edge in EDGES with a non-zero count > + field. */ > + > +static bool > +non_zero_profile_counts ( VEC(edge,gc) *edges) { > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE(e, ei, edges) > + { > + if (e->count > 0) > + return true; > + } > + return false; > +} > + > /* Assume that frequency of basic block B is known. Compute frequencies > and probabilities of outgoing edges. */ > > @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b) > e->count = b->count; > return; > } > + else if (non_zero_profile_counts (b->succs)){ > + /*Profile counts already set, but REG_NOTE missing. Retain the counts. > */ > + return; > + } > guess_outgoing_edge_probabilities (b); > if (b->count) > FOR_EACH_EDGE (e, ei, b->succs) > diff --git a/gcc/expr.c b/gcc/expr.c > index f9de908..fb8eef9 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode); > #ifdef PUSH_ROUNDING > static void emit_single_push_insn (enum machine_mode, rtx, tree); > #endif > -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); > +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); > static rtx const_vector_from_tree (tree); > static void write_complex_part (rtx, rtx, bool); > > @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree > minval, tree range, > TABLE_LABEL is a CODE_LABEL rtx for the table itself. > > DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the > - index value is out of range. */ > + index value is out of range. > + DEFAULT_PROBABILITY is the probability of jumping to the > + DEFAULT_LABEL. */ > > static void > do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, > - rtx default_label) > + rtx default_label, int default_probability) > { > rtx temp, vector; > > @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx > range, rtx table_label, > the maximum value of the range. */ > > if (default_label) > - emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, > - default_label); > + { > + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, > + default_label); > + if (default_probability != -1) > + { > + rtx jump_insn = get_last_insn(); > + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT > (default_probability)); > + } > + } > + > > /* If index is in range, it must fit in Pmode.
Propagate profile counts after switch case expansion (issue5896043)
This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? 2012-03-23 Easwaran Raman * cfgbuild.c (non_zero_profile_counts): New function. (compute_outgoing_frequencies): If at least one successor of a BB has non-zero profile count, retain the counts. * expr.c (do_tablejump): Add a REG_BR_PROB note on the jump to default label. (try_tablejump): Add a parameter to specify the probability of jumping to the default label. * expr.h (try_tablejump): Add a new parameter. * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. (add_case_node): Pass execution count of the case node and use it to initialize COUNT field. (case_probability): New macro. (expand_case): Propagate execution counts to generated branches using REG_BR_PROB notes. (emit_case_nodes): Likewise. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (compute_subtree_counts): New function to compute SUBTREE_COUNT fields of case nodes. (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the given probability to the last generated instruction. gcc/testsuite/ChangeLog: 2012-03-23 Easwaran Raman * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c index 692fea8..d75fbda 100644 --- a/gcc/cfgbuild.c +++ b/gcc/cfgbuild.c @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb) purge_dead_tablejump_edges (bb, table); } +/* Check if there is at least one edge in EDGES with a non-zero count + field. */ + +static bool +non_zero_profile_counts ( VEC(edge,gc) *edges) { + edge e; + edge_iterator ei; + FOR_EACH_EDGE(e, ei, edges) +{ + if (e->count > 0) +return true; +} + return false; +} + /* Assume that frequency of basic block B is known. Compute frequencies and probabilities of outgoing edges. */ @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b) e->count = b->count; return; } + else if (non_zero_profile_counts (b->succs)){ +/*Profile counts already set, but REG_NOTE missing. Retain the counts. */ +return; + } guess_outgoing_edge_probabilities (b); if (b->count) FOR_EACH_EDGE (e, ei, b->succs) diff --git a/gcc/expr.c b/gcc/expr.c index f9de908..fb8eef9 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode); #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree minval, tree range, TABLE_LABEL is a CODE_LABEL rtx for the table itself. DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the - index value is out of range. */ + index value is out of range. + DEFAULT_PROBABILITY is the probability of jumping to the + DEFAULT_LABEL. */ static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, -default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); +} +} + /* If index is in range, it must fit in Pmode. Convert to Pmode so we can index with it. */ @@ -10758,7 +10768,7 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, int try_tablejump (tree index_type, tree index_expr, tree minval, tree range, - rtx table_label, rtx default_label) + rtx table_label, rtx default_label, int default_probability) { rtx index; @@ -10776,7 +10786,7 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range, TYPE_MODE (TREE_TYPE (range)), expand_normal (range), TYPE_UNSIGNED (TREE_TYPE (range))), - table_label, default_label); + table_label, default_label, default_probability); return 1;