Re: Propagate profile counts after switch case expansion (issue5896043)

2012-04-16 Thread Easwaran Raman
Ping.

On Mon, Apr 9, 2012 at 2:33 PM, Easwaran Raman era...@google.com wrote:
 On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman era...@google.com wrote:
 On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 Hi,
 while this is resonable thing to do, I belive it would make most sense
 to make switch lowering at gimple level: i.e. reorganize the existing
 code to produce gimple statement and change expansion code to produce
 tablejump from every gimple switch statement that was left in the
 code.

 This would be both cleaner and would allow gimple optimizers to improve the
 generated code. Incrementally it would also allow us to improve switch 
 exansion
 that is quite baroque and not realy producing very good code in some common
 cases.

 If you would be interested working on this (i.e. reorganizing the expansion
 code to produce gimple), I would be very happy. If not, I can review the
 profile updating part for mainline, since in any case this is desriable 
 thing
 to do.

 I am planning to explore improvements to switch expansion (peeling
 some cases and using jump tables for the rest, for example) and I
 think the reorganization you suggest is the right way to do such
 improvements. But until I can spend time on it and get it done, I
 would like this patch to get reviewed and checked in.

 Thanks,
 Easwaran

 Ping.


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-04-09 Thread Easwaran Raman
On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman era...@google.com wrote:
 On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 Hi,
 while this is resonable thing to do, I belive it would make most sense
 to make switch lowering at gimple level: i.e. reorganize the existing
 code to produce gimple statement and change expansion code to produce
 tablejump from every gimple switch statement that was left in the
 code.

 This would be both cleaner and would allow gimple optimizers to improve the
 generated code. Incrementally it would also allow us to improve switch 
 exansion
 that is quite baroque and not realy producing very good code in some common
 cases.

 If you would be interested working on this (i.e. reorganizing the expansion
 code to produce gimple), I would be very happy. If not, I can review the
 profile updating part for mainline, since in any case this is desriable thing
 to do.

 I am planning to explore improvements to switch expansion (peeling
 some cases and using jump tables for the rest, for example) and I
 think the reorganization you suggest is the right way to do such
 improvements. But until I can spend time on it and get it done, I
 would like this patch to get reviewed and checked in.

 Thanks,
 Easwaran

Ping.


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-25 Thread Jan Hubicka
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

Hi,
while this is resonable thing to do, I belive it would make most sense
to make switch lowering at gimple level: i.e. reorganize the existing
code to produce gimple statement and change expansion code to produce
tablejump from every gimple switch statement that was left in the
code.

This would be both cleaner and would allow gimple optimizers to improve the
generated code. Incrementally it would also allow us to improve switch exansion
that is quite baroque and not realy producing very good code in some common
cases.

If you would be interested working on this (i.e. reorganizing the expansion
code to produce gimple), I would be very happy. If not, I can review the
profile updating part for mainline, since in any case this is desriable thing
to do.

Honza


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-25 Thread Easwaran Raman
On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 Hi,
 while this is resonable thing to do, I belive it would make most sense
 to make switch lowering at gimple level: i.e. reorganize the existing
 code to produce gimple statement and change expansion code to produce
 tablejump from every gimple switch statement that was left in the
 code.

 This would be both cleaner and would allow gimple optimizers to improve the
 generated code. Incrementally it would also allow us to improve switch 
 exansion
 that is quite baroque and not realy producing very good code in some common
 cases.

 If you would be interested working on this (i.e. reorganizing the expansion
 code to produce gimple), I would be very happy. If not, I can review the
 profile updating part for mainline, since in any case this is desriable thing
 to do.

I am planning to explore improvements to switch expansion (peeling
some cases and using jump tables for the rest, for example) and I
think the reorganization you suggest is the right way to do such
improvements. But until I can spend time on it and get it done, I
would like this patch to get reviewed and checked in.

Thanks,
Easwaran

 Honza


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman
Some more background on this patch: Right now, while the execution
counts of different case labels of a switch statement are obtained
during profile collection, they are not propagated to RTL. Instead,
counts are regenerated at the RTL level using static heuristics that
tend to weigh branches equally which can cause poor optimization of
hot code. This patch ensures that the counts collected during profile
collection are correctly propagated allowing hot code to be better
optimized by RTL optimizations.  Patch tested on x86_64.

- Easwaran

On Fri, Mar 23, 2012 at 10:43 AM, Easwaran Raman era...@google.com wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 2012-03-23   Easwaran Raman  era...@google.com

        * cfgbuild.c (non_zero_profile_counts): New function.
        (compute_outgoing_frequencies): If at least one successor of a
        BB has non-zero profile count, retain the counts.
        * expr.c (do_tablejump): Add a REG_BR_PROB note on the
        jump to default label.
        (try_tablejump): Add a parameter to specify the probability
        of jumping to the default label.
        * expr.h (try_tablejump): Add a new parameter.
        * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
        (add_case_node): Pass execution count of the case node and use
        it to initialize COUNT field.
        (case_probability): New macro.
        (expand_case): Propagate execution counts to generated
        branches using REG_BR_PROB notes.
        (emit_case_nodes): Likewise.
        (do_jump_if_equal): Pass probability for REG_BR_PROB note.
        (compute_subtree_counts): New function to compute
        SUBTREE_COUNT fields of case nodes.
        (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the
        given probability to the last generated instruction.

 gcc/testsuite/ChangeLog:
 2012-03-23   Easwaran Raman  era...@google.com
        * gcc.dg/tree-prof/switch-case-1.c: New test case.
        * gcc.dg/tree-prof/switch-case-2.c: New test case.

 diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
 index 692fea8..d75fbda 100644
 --- a/gcc/cfgbuild.c
 +++ b/gcc/cfgbuild.c
 @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb)
     purge_dead_tablejump_edges (bb, table);
  }

 +/* Check if there is at least one edge in EDGES with a non-zero count
 +   field.  */
 +
 +static bool
 +non_zero_profile_counts ( VEC(edge,gc) *edges) {
 +  edge e;
 +  edge_iterator ei;
 +  FOR_EACH_EDGE(e, ei, edges)
 +    {
 +      if (e-count  0)
 +        return true;
 +    }
 +  return false;
 +}
 +
  /*  Assume that frequency of basic block B is known.  Compute frequencies
     and probabilities of outgoing edges.  */

 @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b)
       e-count = b-count;
       return;
     }
 +  else if (non_zero_profile_counts (b-succs)){
 +    /*Profile counts already set, but REG_NOTE missing. Retain the counts.  
 */
 +    return;
 +  }
   guess_outgoing_edge_probabilities (b);
   if (b-count)
     FOR_EACH_EDGE (e, ei, b-succs)
 diff --git a/gcc/expr.c b/gcc/expr.c
 index f9de908..fb8eef9 100644
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode);
  #ifdef PUSH_ROUNDING
  static void emit_single_push_insn (enum machine_mode, rtx, tree);
  #endif
 -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
 +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
  static rtx const_vector_from_tree (tree);
  static void write_complex_part (rtx, rtx, bool);

 @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree 
 minval, tree range,
    TABLE_LABEL is a CODE_LABEL rtx for the table itself.

    DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the
 -   index value is out of range.  */
 +   index value is out of range.
 +   DEFAULT_PROBABILITY is the probability of jumping to the
 +   DEFAULT_LABEL.  */

  static void
  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
 -             rtx default_label)
 +             rtx default_label, int default_probability)
  {
   rtx temp, vector;

 @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
 range, rtx table_label,
      the maximum value of the range.  */

   if (default_label)
 -    emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 -                            default_label);
 +    {
 +      emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 +                              default_label);
 +      if (default_probability != -1)
 +        {
 +          rtx jump_insn = get_last_insn();
 +          add_reg_note (jump_insn, REG_BR_PROB, GEN_INT 
 (default_probability));
 +        }
 +    }
 +

   /* If index is in range, it must fit in Pmode.
      Convert to Pmode so we can index with it.  */
 @@ -10758,7 +10768,7 

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Andi Kleen
Easwaran Raman era...@google.com writes:

 Some more background on this patch: Right now, while the execution
 counts of different case labels of a switch statement are obtained
 during profile collection, they are not propagated to RTL. Instead,
 counts are regenerated at the RTL level using static heuristics that
 tend to weigh branches equally which can cause poor optimization of
 hot code. This patch ensures that the counts collected during profile
 collection are correctly propagated allowing hot code to be better
 optimized by RTL optimizations.  Patch tested on x86_64.

I think your patch doesn't use the probably to weight the decision 
tree for non tablejump, right? I looked at this some time ago,
but the patch always had problems.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman
On Fri, Mar 23, 2012 at 3:29 PM, Andi Kleen a...@firstfloor.org wrote:
 Easwaran Raman era...@google.com writes:

 Some more background on this patch: Right now, while the execution
 counts of different case labels of a switch statement are obtained
 during profile collection, they are not propagated to RTL. Instead,
 counts are regenerated at the RTL level using static heuristics that
 tend to weigh branches equally which can cause poor optimization of
 hot code. This patch ensures that the counts collected during profile
 collection are correctly propagated allowing hot code to be better
 optimized by RTL optimizations.  Patch tested on x86_64.

 I think your patch doesn't use the probably to weight the decision
 tree for non tablejump, right? I looked at this some time ago,
 but the patch always had problems.

Do you mean use the weights to decide the shape of the binary tree
(similar to COST_TABLE heuristic)? I am planning to send a separate
patch for that. This one just makes sure that the profile counts are
propagated correctly. So you will still have a situation where a
branch corresponding to an infrequently executed case dominates a
frequently executed case, but the BB of the cases gets the right
profile weight.

- Easwaran

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Andi Kleen
 Do you mean use the weights to decide the shape of the binary tree

Yes.

 (similar to COST_TABLE heuristic)?

COST_TABLE should die I hope.

 I am planning to send a separate patch for that. 

Great.

-Andi