Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-06 Thread Ilya Enkovich
Hello,

2011/9/2 Uros Bizjak ubiz...@gmail.com:
 I assume that you need to split tune attribute to int and FP part to
 handle reassociation for other targets, since Atom handles both in the
 same way.

 Please also describe function return value in the comment (and perhaps
 in documentation, too).

 OK with this addition.

 Thanks,
 Uros.


Here is patch with added comment for hook implementation. Is it OK?

Thanks,
Ilya
---
gcc/

2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): Likewise.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_uint_mode_1): New default hook.

* hooks.c (hook_int_uint_mode_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(swap_ops_for_binary_stmt): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(rewrite_expr_tree): Refactored. Part of code moved into
swap_ops_for_binary_stmt.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

gcc/testsuite/

2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-06 Thread Uros Bizjak
On Tue, Sep 6, 2011 at 2:39 PM, Ilya Enkovich enkovich@gmail.com wrote:

 I assume that you need to split tune attribute to int and FP part to
 handle reassociation for other targets, since Atom handles both in the
 same way.

 Please also describe function return value in the comment (and perhaps
 in documentation, too).

 OK with this addition.

 Thanks,
 Uros.


 Here is patch with added comment for hook implementation. Is it OK?

 Thanks,
 Ilya
 ---
 gcc/

 2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

 gcc/testsuite/

 2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.


OK.

Thanks,
Uros.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-06 Thread Ilya Enkovich
2011/9/6 Uros Bizjak ubiz...@gmail.com:

 OK.

 Thanks,
 Uros.


Could please someone check it in for me?

Thanks,
Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-06 Thread H.J. Lu
On Tue, Sep 6, 2011 at 9:07 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/9/6 Uros Bizjak ubiz...@gmail.com:

 OK.

 Thanks,
 Uros.


 Could please someone check it in for me?


I checked it in for you.

-- 
H.J.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-02 Thread Richard Guenther
On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 12:27 PM, Ilya Enkovich enkovich@gmail.com wrote:

 this seems to not allow cycles_best to drop with lower width, but
 that it can't should be an implementation detail of get_required_cycles.
 To make it not so, can you add a comment before the loop, like

  /* get_required_cycles is monotonically increasing with lower width
     so we can perform a binary search for the minimal width that still
     results in the optimal cycle count.  */


 Fixed. Thanks!


 With the above change the non-x86 specifc parts are ok.  Please get
 approval for them from a x86 maintainer.


 Could please someone review x86 part?

 I assume that you need to split tune attribute to int and FP part to
 handle reassociation for other targets, since Atom handles both in the
 same way.

 Please also describe function return value in the comment (and perhaps
 in documentation, too).

 OK with this addition.

Btw, I would expect integer add and integer multiply to have different
settings for some targets which would mean splitting this up even
further ...

Richard.

 Thanks,
 Uros.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-02 Thread Ilya Enkovich
2011/9/2 Richard Guenther richard.guent...@gmail.com:
On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I assume that you need to split tune attribute to int and FP part to
 handle reassociation for other targets, since Atom handles both in the
 same way.

 Please also describe function return value in the comment (and perhaps
 in documentation, too).

 OK with this addition.

 Btw, I would expect integer add and integer multiply to have different
 settings for some targets which would mean splitting this up even
 further ...

Which tune attributes are meant here? Is it X86_TUNE_REASSOC_* flags
or new command line param?

Thanks,
Ilya

 Richard.

 Thanks,
 Uros.




Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-02 Thread Richard Guenther
On Fri, Sep 2, 2011 at 3:45 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/9/2 Richard Guenther richard.guent...@gmail.com:
On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I assume that you need to split tune attribute to int and FP part to
 handle reassociation for other targets, since Atom handles both in the
 same way.

 Please also describe function return value in the comment (and perhaps
 in documentation, too).

 OK with this addition.

 Btw, I would expect integer add and integer multiply to have different
 settings for some targets which would mean splitting this up even
 further ...

 Which tune attributes are meant here? Is it X86_TUNE_REASSOC_* flags
 or new command line param?

The X86_TUNE_REASSOC_* flags.  The setting surely depends on the
number of available execution units and/or whether the instructions
are pipelined or not.

Richard.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-02 Thread Ilya Enkovich
2011/9/2 Richard Guenther richard.guent...@gmail.com:
 On Fri, Sep 2, 2011 at 3:45 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/9/2 Richard Guenther richard.guent...@gmail.com:
On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I assume that you need to split tune attribute to int and FP part to
 handle reassociation for other targets, since Atom handles both in the
 same way.

 Please also describe function return value in the comment (and perhaps
 in documentation, too).

 OK with this addition.

 Btw, I would expect integer add and integer multiply to have different
 settings for some targets which would mean splitting this up even
 further ...

 Which tune attributes are meant here? Is it X86_TUNE_REASSOC_* flags
 or new command line param?

 The X86_TUNE_REASSOC_* flags.  The setting surely depends on the
 number of available execution units and/or whether the instructions
 are pipelined or not.

Now I'm not sure it is a good idea to use these tune flags at all
because we may need a plenty of them. We have a lot of types including
vector ones and a variety of opcodes. Any combination may require
delicate tuning. Some sort of separate table would suit better here.
Though I doubt we should introduce it right now filled with 1s.

Ilya


 Richard.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-09-01 Thread Ilya Enkovich

 this seems to not allow cycles_best to drop with lower width, but
 that it can't should be an implementation detail of get_required_cycles.
 To make it not so, can you add a comment before the loop, like

  /* get_required_cycles is monotonically increasing with lower width
     so we can perform a binary search for the minimal width that still
     results in the optimal cycle count.  */


Fixed. Thanks!


 With the above change the non-x86 specifc parts are ok.  Please get
 approval for them from a x86 maintainer.


Could please someone review x86 part?

Thanks,
Ilya
---
gcc/

2011-09-01  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): Likewise.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_uint_mode_1): New default hook.

* hooks.c (hook_int_uint_mode_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(swap_ops_for_binary_stmt): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(rewrite_expr_tree): Refactored. Part of code moved into
swap_ops_for_binary_stmt.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

gcc/testsuite/

2011-09-01  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-31 Thread Ilya Enkovich
Hello Richard,

Thanks for the review!

 The fact that you have to adjust gcc.dg/tree-ssa/pr38533.c looks problematic
 to me.  Can you investigate the problem report, look at the geneated
 code with the atom default of the param and see whether it's still
 reasonably fixed (maybe you'd already done this)?
This test was created as a regression test to the problem in
linearize_expr_tree which moves all statements down to the first
modified one during reassociation increasing registers pressure. Test
has a lot of definitions which are all ORed like this:
  def r1
  def r2
  s = r1 or r2
  def r3
  s = s or r3
  def r4
  s = s or r4
  ...
and it checks that register pressure is not increased after
reassociation by simple scan of two sequential defs. If we use
reassociation width higher than 1 then test will fails because we need
to increase register pressure to get parallelism and finally get code
like this:
  def r1
  def r2
  def r3
  t1 = r1 or r2
  s = s or r3
  def r4
  def r5
  s = s or t1
  t1 = r4 or r5
  ...
So, I had to fix a test.

 +  /* Check if we may use less width and still compute sequence for
 +     the same time.  It will allow us to reduce registers usage.  */
 +  while (width  1
 +         get_required_cycles (ops_num, width - 1) == cycles_best)
 +    width--;

 I suppose get_required_cycles () is monotonic in width?  Would it
 make sense to binary search the best width then (I realize the
 default is 1 and the only target differing has 2, but ...)?  Maybe at
 least add a comment to this effect?  Or not decrement by one
 but instead divide by two on each iteration (which is the same for 1 and 2 
 ...)?
 It's also all a mapping that is constant - we should be able to
 pre-compute it for all values, eventually compressing it to a
 much simpler formula width = f (cpu_width, ops_num)?
I replaced sequential search with a binary search. I did not pay much
attention to this function because do not think it is really time
consuming compared to other parts of reassociation phase. Am I too
optimistic? If you think it might significantly affect compile time I
can introduce a table of pre-computed values (or make it later as a
separate fix).

I made all other fixes as you suggested.

Bootstrapped and checked on x86_64-linux.

Thanks,
Ilya
---
gcc/

2011-08-31  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): Likewise.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_uint_mode_1): New default hook.

* hooks.c (hook_int_uint_mode_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(swap_ops_for_binary_stmt): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(rewrite_expr_tree): Refactored. Part of code moved into
swap_ops_for_binary_stmt.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

gcc/testsuite/

2011-08-31  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-30 Thread Richard Guenther
On Wed, Aug 10, 2011 at 4:49 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hello,

 Here is a new version of the patch. Changes from the previous version
 (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html):
  - updated to trunk
  - TODO_remove_unused_locals flag was removed from todo_flags_finish
 of reassoc pass

 Bootstrapped and checked on x86_64-linux.

The fact that you have to adjust gcc.dg/tree-ssa/pr38533.c looks problematic
to me.  Can you investigate the problem report, look at the geneated
code with the atom default of the param and see whether it's still
reasonably fixed (maybe you'd already done this)?

+  /* Check if we may use less width and still compute sequence for
+ the same time.  It will allow us to reduce registers usage.  */
+  while (width  1
+ get_required_cycles (ops_num, width - 1) == cycles_best)
+width--;

I suppose get_required_cycles () is monotonic in width?  Would it
make sense to binary search the best width then (I realize the
default is 1 and the only target differing has 2, but ...)?  Maybe at
least add a comment to this effect?  Or not decrement by one
but instead divide by two on each iteration (which is the same for 1 and 2 ...)?
It's also all a mapping that is constant - we should be able to
pre-compute it for all values, eventually compressing it to a
much simpler formula width = f (cpu_width, ops_num)?

+static void
+rewrite_expr_tree_parallel (gimple stmt, int width,
+   VEC(operand_entry_t, heap) * ops)
+{
+  enum tree_code opcode = gimple_assign_rhs_code (stmt);
+  int op_num = VEC_length (operand_entry_t, ops);
+  int stmt_num = op_num - 1;
+  gimple *stmts = XNEWVEC (gimple, stmt_num);

use XALLOCAVEC here and remove the free call.

+  if (ready_stmts_end == 0 
+ (i - stmt_index = width || op_index  1))

s go to the next line, not at the end

+  else
+   {
+ tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc);
+ add_referenced_var (var);

please re-use a common variable for the whole chain, they will
all necessarily have compatible type.  Creating and maintaining
decls is expensive (also avoid giving them names, thus just
pass NULL instead of reassoc).

+   {
+ enum machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+ int width = get_reassociation_width (ops, rhs_code, mode);

as you are passing in the mode here, consider passing the number
of operands in ops as well instead of ops.  This way we might consider
moving this whole function to the target or to generic code.  Similar
move the dump printing in get_reassociation_width

+  if (dump_file  (dump_flags  TDF_DETAILS))
+fprintf (dump_file,
+Width = %d was chosen for reassociation\n, width);

here to the caller.

Sorry for taking so long to review this again.  I'll promise to be quick
once you post an update.

Thanks,
Richard.

 Thanks,
 Ilya
 ---
 gcc/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

 gcc/testsuite/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-26 Thread Ilya Enkovich
Double ping.

2011/8/19 Ilya Enkovich enkovich@gmail.com:
 Ping.

 2011/8/10 Ilya Enkovich enkovich@gmail.com:
 Hello,

 Here is a new version of the patch. Changes from the previous version
 (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html):
  - updated to trunk
  - TODO_remove_unused_locals flag was removed from todo_flags_finish
 of reassoc pass

 Bootstrapped and checked on x86_64-linux.

 Thanks,
 Ilya
 ---
 gcc/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

 gcc/testsuite/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.




Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-19 Thread Ilya Enkovich
Ping.

2011/8/10 Ilya Enkovich enkovich@gmail.com:
 Hello,

 Here is a new version of the patch. Changes from the previous version
 (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html):
  - updated to trunk
  - TODO_remove_unused_locals flag was removed from todo_flags_finish
 of reassoc pass

 Bootstrapped and checked on x86_64-linux.

 Thanks,
 Ilya
 ---
 gcc/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

 gcc/testsuite/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-10 Thread Ilya Enkovich
Hello,

Here is a new version of the patch. Changes from the previous version
(http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html):
 - updated to trunk
 - TODO_remove_unused_locals flag was removed from todo_flags_finish
of reassoc pass

Bootstrapped and checked on x86_64-linux.

Thanks,
Ilya
---
gcc/

2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): Likewise.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_uint_mode_1): New default hook.

* hooks.c (hook_int_uint_mode_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(swap_ops_for_binary_stmt): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(rewrite_expr_tree): Refactored. Part of code moved into
swap_ops_for_binary_stmt.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

gcc/testsuite/

2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-05 Thread Richard Guenther
On Tue, Jul 19, 2011 at 4:46 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hello Richard,

 Thanks a lot for the review!

 it's not easy to follow the flow of this function, esp. I wonder

 +      else
 +       {
 +         tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc);
 +         add_referenced_var (var);
 +         stmts[i] = build_and_add_sum (var, op1, op2, opcode);
 +       }

 what you need to create new stmts for, and if you possibly create
 multiple temporary variables here.

 -  TODO_verify_ssa
 +  TODO_remove_unused_locals
 +    | TODO_verify_ssa

 why?


 Current rewrite_expr_tree uses original statements and  just change
 order of operands (tree leafs) usage. To keep data flow correct it
 moves all statements down to the last one. I tried such approach but
 it did not work well because of increased registers pressure in some
 cases (all operands are live at the same time right before the first
 statement of our sequence). Then I tried to move existing statements
 to the appropriate places (just after the last operand definition) but
 it appeared to be not so easy. It is easy to guarantee consistence in
 the final result but not easy to keep data flow consistent after each
 statement move. And if we do not keep code consistent after each move
 then we may get wrong debug statement generated. After few tries I
 decided that the most easy for both understanding and implementation
 way is to recreate statements and remove the old chain. So, I used
 build_and_add_sum to create all statements except the last one which
 never needs recreation. I added TODO_remove_unused_locals since temps
 used in original statements should be removed. Is this approach OK?

Well, it's enough to delay that to later passes that do this, so I'd prefer
to not change this in this patch.

 You don't seem to handle the special-cases of rewrite_expr_tree
 for the leafs of your tree, especially the PHI node special-casing.
 I think you may run into vectorization issues here.

 I do not see any reason why these cases are checked in
 rewrite_expr_tree. It is optimization of operands order and may be it
 will be good to move it somewhere. Is it OK if I move related code to
 the end of optimize_ops_list method?

I suppose yes.  Maybe the cases are also obsoleted by the improved
loop PHI handling.

Richard.

 Thanks
 Ilya



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-03 Thread Ilya Enkovich
Ping.

2011/7/26 Ilya Enkovich enkovich@gmail.com:
 Hello,

 Here is updated patch for tree reassoc phase. Update includes coding
 style fixes, comments update, target hook arguments change. I also
 moved a part of code from rewrite_expr_tree into separate function to
 reuse it in rewrite_expr_tree_parallel for grouping operands with the
 same rank.

 Bootstrapped and checked on x86_64-linux.

 Ilya
 --
 gcc/

 2011-07-26  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

        (pass_reassoc): TODO_remove_unused_locals flag added.

 gcc/testsuite/

 2011-07-26  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-26 Thread Ilya Enkovich
Hello,

Here is updated patch for tree reassoc phase. Update includes coding
style fixes, comments update, target hook arguments change. I also
moved a part of code from rewrite_expr_tree into separate function to
reuse it in rewrite_expr_tree_parallel for grouping operands with the
same rank.

Bootstrapped and checked on x86_64-linux.

Ilya
--
gcc/

2011-07-26  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): Likewise.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_uint_mode_1): New default hook.

* hooks.c (hook_int_uint_mode_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(swap_ops_for_binary_stmt): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(rewrite_expr_tree): Refactored. Part of code moved into
swap_ops_for_binary_stmt.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-26  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-21 Thread Joseph S. Myers
On Thu, 14 Jul 2011, Ilya Enkovich wrote:

   * doc/tm.texi.in (reassociation_width): New hook documentation.

Unless the documentation for a new hook is based on pre-existing GFDL-only 
text, it should go in target.def not tm.texi.in, with the only thing added 
to tm.texi.in being a single @hook line.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-19 Thread Richard Guenther
On Thu, Jul 14, 2011 at 5:00 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/7/14 Richard Guenther richard.guent...@gmail.com:

 But then how comes the option to override it is useful?  It isn't dependent
 on the particular case.  At least the option should be a --param.

 Richard.


 Option is still useful if you want to try feature on platform with no
 hook implemented and for other performance experiments. I agree
 --param usage should be better here. I'll fix it.

 Ilya


 Here is a patch with new param instead of new option. Bootstrapped and
 checked on x86_64-linux.

 }

+static int
+ix86_reassociation_width (const_gimple stmt)

all functions need comments describing what they do and what parameters
they get.

+@deftypefn {Target Hook} int TARGET_SCHED_REASSOCIATION_WIDTH
(const_gimple @var{stmt})
+This hook is called by tree reassociator to determine a level of
+parallelism required in output calculations chain.

I don't think we should get an actual statement here, but an
enum tree_code and a machine mode.

+/* Find out how many cycles we need to compute whole statements
+   chain holding ops_num operands if may execute up to cpu_width
+   statements per cycle.  */

I have a hard time parsing this sentence, maybe a native english
speaker can help here.

+static int
+get_reassociation_width (VEC(operand_entry_t, heap) * ops, gimple stmt)
+{
+  int param_width = PARAM_VALUE (PARAM_TREE_REASSOC_WIDTH);
+  int ops_num = VEC_length (operand_entry_t, ops);
+  int width;
+  int cycles_best;
+
+  if (param_width  0)
+width = param_width;
+  else
+width = targetm.sched.reassociation_width (stmt);

this is the only place you need stmt, with tree-code and mode args
you'd not need it (and it's odd anyway).

+  while (width  1 
+get_required_cycles (ops_num, width - 1) == cycles_best)

s go on the next line, similar cases in your patch as well

+  if (dump_file  (dump_flags  TDF_DETAILS))
+{
+  fprintf (dump_file,
+  Width = %d was chosen for reassociation\n, width);
+}

no {}s around single-stmts

+static void
+rewrite_expr_tree_parallel (gimple stmt, int width,
+   VEC(operand_entry_t, heap) * ops)
+{

it's not easy to follow the flow of this function, esp. I wonder

+  else
+   {
+ tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc);
+ add_referenced_var (var);
+ stmts[i] = build_and_add_sum (var, op1, op2, opcode);
+   }

what you need to create new stmts for, and if you possibly create
multiple temporary variables here.

You don't seem to handle the special-cases of rewrite_expr_tree
for the leafs of your tree, especially the PHI node special-casing.
I think you may run into vectorization issues here.

-  TODO_verify_ssa
+  TODO_remove_unused_locals
+| TODO_verify_ssa

why?

I think the patch looks reasonable overall, please update it with the
above comments and re-post it.

Thanks,
Richard.

 Ilya
 --
 gcc/

 2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): New hook documentation.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_const_gimple_1): New default hook.

        * hooks.c (hook_int_const_gimple_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

        (pass_reassoc): TODO_remove_unused_locals flag added.

 gcc/testsuite/

 2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-19 Thread Ilya Enkovich
Hello Richard,

Thanks a lot for the review!

 it's not easy to follow the flow of this function, esp. I wonder

 +      else
 +       {
 +         tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc);
 +         add_referenced_var (var);
 +         stmts[i] = build_and_add_sum (var, op1, op2, opcode);
 +       }

 what you need to create new stmts for, and if you possibly create
 multiple temporary variables here.

 -  TODO_verify_ssa
 +  TODO_remove_unused_locals
 +    | TODO_verify_ssa

 why?


Current rewrite_expr_tree uses original statements and  just change
order of operands (tree leafs) usage. To keep data flow correct it
moves all statements down to the last one. I tried such approach but
it did not work well because of increased registers pressure in some
cases (all operands are live at the same time right before the first
statement of our sequence). Then I tried to move existing statements
to the appropriate places (just after the last operand definition) but
it appeared to be not so easy. It is easy to guarantee consistence in
the final result but not easy to keep data flow consistent after each
statement move. And if we do not keep code consistent after each move
then we may get wrong debug statement generated. After few tries I
decided that the most easy for both understanding and implementation
way is to recreate statements and remove the old chain. So, I used
build_and_add_sum to create all statements except the last one which
never needs recreation. I added TODO_remove_unused_locals since temps
used in original statements should be removed. Is this approach OK?

 You don't seem to handle the special-cases of rewrite_expr_tree
 for the leafs of your tree, especially the PHI node special-casing.
 I think you may run into vectorization issues here.

I do not see any reason why these cases are checked in
rewrite_expr_tree. It is optimization of operands order and may be it
will be good to move it somewhere. Is it OK if I move related code to
the end of optimize_ops_list method?

Thanks
Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther
2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com:
 One minor note, you will need to update doc/invoke.texi to document the new
 switch before checkin: -ftree-reassoc-width=n

You also need approval and somebody to review the patch before checkin.

Richard.

 --
 Michael Meissner, IBM
 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
 meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com:
 One minor note, you will need to update doc/invoke.texi to document the new
 switch before checkin: -ftree-reassoc-width=n

 You also need approval and somebody to review the patch before checkin.

Btw, rather than a new target hook and an option I suggest to use
a --param which default you can modify in the backend.

Richard.

 Richard.

 --
 Michael Meissner, IBM
 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
 meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899




Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
2011/7/14 Richard Guenther richard.guent...@gmail.com:

 Btw, rather than a new target hook and an option I suggest to use
 a --param which default you can modify in the backend.

 Richard.


Introduced target hook does not just define default value for option.
It is meant to make decision in each particular case. For now it
returns a constant value but march dependent heuristics will be added
later.

Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 11:59 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/7/14 Richard Guenther richard.guent...@gmail.com:

 Btw, rather than a new target hook and an option I suggest to use
 a --param which default you can modify in the backend.

 Richard.


 Introduced target hook does not just define default value for option.
 It is meant to make decision in each particular case. For now it
 returns a constant value but march dependent heuristics will be added
 later.

But then how comes the option to override it is useful?  It isn't dependent
on the particular case.  At least the option should be a --param.

Richard.

 Ilya



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
 One minor note, you will need to update doc/invoke.texi to document the new
 switch before checkin: -ftree-reassoc-width=n

 --
 Michael Meissner, IBM

Thanks for the note! Here is fixed patch.

Ilya
--
gcc/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (ftree-reassoc-width): New option documented.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* common.opt (ftree-reassoc-width): New option added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
-ftree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
2011/7/14 Richard Guenther richard.guent...@gmail.com:

 But then how comes the option to override it is useful?  It isn't dependent
 on the particular case.  At least the option should be a --param.

 Richard.


Option is still useful if you want to try feature on platform with no
hook implemented and for other performance experiments. I agree
--param usage should be better here. I'll fix it.

Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
 2011/7/14 Richard Guenther richard.guent...@gmail.com:

 But then how comes the option to override it is useful?  It isn't dependent
 on the particular case.  At least the option should be a --param.

 Richard.


 Option is still useful if you want to try feature on platform with no
 hook implemented and for other performance experiments. I agree
 --param usage should be better here. I'll fix it.

 Ilya


Here is a patch with new param instead of new option. Bootstrapped and
checked on x86_64-linux.

Ilya
--
gcc/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Michael Meissner
On Thu, Jul 14, 2011 at 11:32:59AM +0200, Richard Guenther wrote:
 On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
  2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com:
  One minor note, you will need to update doc/invoke.texi to document the new
  switch before checkin: -ftree-reassoc-width=n
 
  You also need approval and somebody to review the patch before checkin.
 
 Btw, rather than a new target hook and an option I suggest to use
 a --param which default you can modify in the backend.

You need the target hook that tells how big the reassociation based on the
type.  Machines have different numbers of functional units, for example, maybe
3 integer units and 2 floating point units.  For example, in the PowerPC, I
would set the basic integer and binary floating point types to 2 to match the
dispatch rules, but for decimal floating point, I would set it to one, since
the machines currently only have 1 decimal unit.

With the hook, I could see eliminating the switch and/or --param altogether,
and doing only in the hook.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

 You need the target hook that tells how big the reassociation based on the
 type.  Machines have different numbers of functional units, for example, maybe
 3 integer units and 2 floating point units.  For example, in the PowerPC, I
 would set the basic integer and binary floating point types to 2 to match the
 dispatch rules, but for decimal floating point, I would set it to one, since
 the machines currently only have 1 decimal unit.

 With the hook, I could see eliminating the switch and/or --param altogether,
 and doing only in the hook.


Hook introduced by this patch meets these requirements. But I think it
is useful to have option to override the hook because you cannot tune
it perfectly for each case.

Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
Hello William,

 However, it does not fix http://gcc.gnu.org/PR45671, which surprises me
 as it was marked as a duplicate of this one.  Any thoughts on why this
 isn't sufficient to reassociate the linear chain of adds?

 Test case:

 int myfunction (int a, int b, int c, int d, int e, int f, int g, int h)
 {
  int ret;

  ret = a + b + c + d + e + f + g + h;
  return ret;

 }




Reassociation does not work for signed integers because signed integer
is not wrap-around type in C. You can change it by passing -fwrapv
option but it will disable other useful optimization. Reassociation of
signed integers without this option is not a trivial question because
in that case you may introduce overflows and therefore undefined
behavior.

BR
Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
 Ilya, please mention PR middle-end/44382
 in ChangeLog.


Thanks for notice. Here is corrected ChangeLog:

gcc/

2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com

   PR middle-end/44382
   * target.def (reassociation_width): New hook.

   * doc/tm.texi.in (reassociation_width): New hook documentation.

   * doc/tm.texi (reassociation_width): Likewise.

   * hooks.h (hook_int_const_gimple_1): New default hook.

   * hooks.c (hook_int_const_gimple_1): Likewise.

   * config/i386/i386.h (ix86_tune_indices): Add
   X86_TUNE_REASSOC_INT_TO_PARALLEL and
   X86_TUNE_REASSOC_FP_TO_PARALLEL.

   (TARGET_REASSOC_INT_TO_PARALLEL): New.
   (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

   * config/i386/i386.c (initial_ix86_tune_features): Add
   X86_TUNE_REASSOC_INT_TO_PARALLEL and
   X86_TUNE_REASSOC_FP_TO_PARALLEL.

   (ix86_reassociation_width) implementation of
   new hook for i386 target.

   * common.opt (ftree-reassoc-width): New option added.

   * tree-ssa-reassoc.c (get_required_cycles): New function.
   (get_reassociation_width): Likewise.
   (rewrite_expr_tree_parallel): Likewise.

   (reassociate_bb): Now checks reassociation width to be used
   and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
   if needed.

   (pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com

   * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
   -ftree-reassoc-width=1.

   * gcc.dg/tree-ssa/reassoc-24.c: New test.
   * gcc.dg/tree-ssa/reassoc-25.c: Likewise.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 11:52:25AM +0400, Ilya Enkovich wrote:
  However, it does not fix http://gcc.gnu.org/PR45671, which surprises me
  as it was marked as a duplicate of this one.  Any thoughts on why this
  isn't sufficient to reassociate the linear chain of adds?
 
  Test case:
 
  int myfunction (int a, int b, int c, int d, int e, int f, int g, int h)
  {
   int ret;
 
   ret = a + b + c + d + e + f + g + h;
   return ret;
 
  }
 
 
 
 
 Reassociation does not work for signed integers because signed integer
 is not wrap-around type in C. You can change it by passing -fwrapv
 option but it will disable other useful optimization. Reassociation of
 signed integers without this option is not a trivial question because
 in that case you may introduce overflows and therefore undefined
 behavior.

Well, if it is clearly a win to reassociate, you can always reassociate
them by doing arithmetics in corresponding unsigned type and afterwards
converting back to the signed type.

Jakub


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich

 Well, if it is clearly a win to reassociate, you can always reassociate
 them by doing arithmetics in corresponding unsigned type and afterwards
 converting back to the signed type.

        Jakub


You are right. But in this case we again make all operands have
wrap-around type and thus disable some other optimization. It would be
nice to have opportunity to reassociate and still have undefined
behavior on overflow for optimizations. One way to do it for add/sub
is to use wider type (long long instead of int).

Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 01:01:59PM +0400, Ilya Enkovich wrote:
  Well, if it is clearly a win to reassociate, you can always reassociate
  them by doing arithmetics in corresponding unsigned type and afterwards
  converting back to the signed type.
 
 You are right. But in this case we again make all operands have
 wrap-around type and thus disable some other optimization. It would be
 nice to have opportunity to reassociate and still have undefined
 behavior on overflow for optimizations. One way to do it for add/sub
 is to use wider type (long long instead of int).

I disagree.  Widening would result in worse code in most cases, as you need
to sign extend all the operands.  On the other side, I doubt you can
actually usefully use the undefinedness of signed overflow for a series of
3 or more operands of the associative operation.

Jakub


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
2011/7/13 Jakub Jelinek ja...@redhat.com:

 I disagree.  Widening would result in worse code in most cases, as you need
 to sign extend all the operands.  On the other side, I doubt you can
 actually usefully use the undefinedness of signed overflow for a series of
 3 or more operands of the associative operation.

        Jakub


Sounds reasonable. Type casting to unsigned should be a better solution here.

Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 11:18 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/7/13 Jakub Jelinek ja...@redhat.com:

 I disagree.  Widening would result in worse code in most cases, as you need
 to sign extend all the operands.  On the other side, I doubt you can
 actually usefully use the undefinedness of signed overflow for a series of
 3 or more operands of the associative operation.

        Jakub


 Sounds reasonable. Type casting to unsigned should be a better solution here.

Well, the solution of course lies in the no-undefined-overflow branch
where we have separate tree codes for arithmetic with/without
undefined overflow.

Richard.

 Ilya



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread William J. Schmidt
On Tue, 2011-07-12 at 11:50 -0500, William J. Schmidt wrote:
 Ilya, thanks for posting this!  This patch is useful also on powerpc64.
 Applying it solved a performance degradation with bwaves due to loss of
 reassociation somewhere between 4.5 and 4.6 (still tracking it down).
 When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code
 generation returns.

On further investigation, this is improving the code generation but not
reverting all of the performance loss.  We'll open a bug on this one
once we have it narrowed down a little further.

Bill
 
 Bill
 




Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Michael Meissner
I just ran a spec 2006 run on the powerpc (32-bit) last night setting the
reassociation to 2.  I do see a win in bwaves, but unfortunately it is not
enough of a win, and it is still a regression to GCC 4.5.  However, I see some
regressions in 3 other benchmarks (I tend to omit differences of less than 2%):

401.bzip2  97.99%
410.bwaves113.88%
436.cactusADM  93.96%
444.namd   93.74%

The profile differences are as follows.  Unfortunately, I'm not sure I can post
sample counts under Spec rules:

Bzip2:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
28.96%  28.39%  mainSort
15.94%  15.49%  BZ2_decompress
12.56%  12.35%  mainGtU.part.0
11.59%  11.54%  generateMTFValues
8.89%   9.04%   fallbackSort
6.60%   8.28%   BZ2_compressBlock
7.48%   7.21%   handle_compress.isra.2
6.24%   5.95%   BZ2_bzDecompress
0.55%   0.58%   add_pair_to_block
0.54%   0.54%   BZ2_hbMakeCodeLengths

Bwaves:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
78.70%  74.73%  mat_times_vec_
11.68%  13.21%  bi_cgstab_block_
 6.72%   8.47%  shell_
 2.11%   2.62%  jacobian_
 0.79%   0.96%  flux_

CactusADM:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
99.67%  99.69%  bench_staggeredleapfrog2_

Namd:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
15.43%  14.71%  
_ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded.part.39
11.94%  11.80%  
_ZN20ComputeNonbondedUtil19calc_pair_fullelectEP9nonbonded.part.40
10.18%  11.52%  
_ZN20ComputeNonbondedUtil32calc_pair_energy_merge_fullelectEP9nonbonded.part.37
9.87%   9.02%   
_ZN20ComputeNonbondedUtil16calc_pair_energyEP9nonbonded.part.41
9.55%   8.85%   
_ZN20ComputeNonbondedUtil9calc_pairEP9nonbonded.part.42
9.52%   9.05%   
_ZN20ComputeNonbondedUtil25calc_pair_merge_fullelectEP9nonbonded.part.38
7.24%   8.72%   
_ZN20ComputeNonbondedUtil26calc_self_energy_fullelectEP9nonbonded.part.31
6.28%   6.42%   
_ZN20ComputeNonbondedUtil19calc_self_fullelectEP9nonbonded.part.32
5.23%   6.18%   
_ZN20ComputeNonbondedUtil32calc_self_energy_merge_fullelectEP9nonbonded.part.29
5.13%   4.66%   
_ZN20ComputeNonbondedUtil16calc_self_energyEP9nonbonded.part.33
4.72%   4.43%   
_ZN20ComputeNonbondedUtil25calc_self_merge_fullelectEP9nonbonded.part.30
4.60%   4.37%   
_ZN20ComputeNonbondedUtil9calc_selfEP9nonbonded.part.34

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
2011/7/13 Michael Meissner meiss...@linux.vnet.ibm.com:
 I just ran a spec 2006 run on the powerpc (32-bit) last night setting the
 reassociation to 2.  I do see a win in bwaves, but unfortunately it is not
 enough of a win, and it is still a regression to GCC 4.5.  However, I see some
 regressions in 3 other benchmarks (I tend to omit differences of less than 
 2%):

 401.bzip2          97.99%
 410.bwaves        113.88%
 436.cactusADM      93.96%
 444.namd           93.74%


I tried it on AMD, Core i5, ARM (Cortex A9) and Atom using SPEC2000
and EEMBC suites. All platforms showed both gains and regressions. The
only platform whose gains outweighted regressions enough without
additional tuning was Atom. Surely we should avoid tree height
reduction in some cases and may tune it but I think it will be nice to
have feature checked in first since at least one platform benefits
from it.

Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Michael Meissner
One minor note, you will need to update doc/invoke.texi to document the new
switch before checkin: -ftree-reassoc-width=n

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


[PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-12 Thread Илья Энкович
Hello,

Here is a patch related to missed optimization opportunity in tree
reassoc phase.

Currently tree reassoc phase always generates a linear form which
requires the minimum registers but has the highest tree height and
does not allow computation to be performed in parallel. It may be
critical for performance if required operations have high latency but
can be pipelined (i.e. few execution units or low throughput). This
problem becomes important on current Atom processors which are
in-order and have many such instructions: IMUL and scalar SSE FP
instructions.

This patch introduces a new feature to tree reassoc phase to generate
computation tree with reduced height allowing to perform few
long-latency instructions in parallel. It changes only one part of
reassociation - rewrite_expr_tree. A level of parallelism is
controlled via target hook and/or command line option.

New feature is enabled for Atom only by default. Patch boosts mostly
CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit.

Bootstrapped and checked on x86_64-linux.

Thanks,
Ilya
--
gcc/

2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com

* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* common.opt (ftree-reassoc-width): New option added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
-ftree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-12 Thread William J. Schmidt
Ilya, thanks for posting this!  This patch is useful also on powerpc64.
Applying it solved a performance degradation with bwaves due to loss of
reassociation somewhere between 4.5 and 4.6 (still tracking it down).
When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code
generation returns.

Bill

On Tue, 2011-07-12 at 16:30 +0400, Илья Энкович wrote:
 Hello,
 
 Here is a patch related to missed optimization opportunity in tree
 reassoc phase.
 
 Currently tree reassoc phase always generates a linear form which
 requires the minimum registers but has the highest tree height and
 does not allow computation to be performed in parallel. It may be
 critical for performance if required operations have high latency but
 can be pipelined (i.e. few execution units or low throughput). This
 problem becomes important on current Atom processors which are
 in-order and have many such instructions: IMUL and scalar SSE FP
 instructions.
 
 This patch introduces a new feature to tree reassoc phase to generate
 computation tree with reduced height allowing to perform few
 long-latency instructions in parallel. It changes only one part of
 reassociation - rewrite_expr_tree. A level of parallelism is
 controlled via target hook and/or command line option.
 
 New feature is enabled for Atom only by default. Patch boosts mostly
 CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit.
 
 Bootstrapped and checked on x86_64-linux.
 
 Thanks,
 Ilya
 --
 gcc/
 
 2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com
 
   * target.def (reassociation_width): New hook.
 
   * doc/tm.texi.in (reassociation_width): New hook documentation.
 
   * doc/tm.texi (reassociation_width): Likewise.
 
   * hooks.h (hook_int_const_gimple_1): New default hook.
 
   * hooks.c (hook_int_const_gimple_1): Likewise.
 
   * config/i386/i386.h (ix86_tune_indices): Add
   X86_TUNE_REASSOC_INT_TO_PARALLEL and
   X86_TUNE_REASSOC_FP_TO_PARALLEL.
 
   (TARGET_REASSOC_INT_TO_PARALLEL): New.
   (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.
 
   * config/i386/i386.c (initial_ix86_tune_features): Add
   X86_TUNE_REASSOC_INT_TO_PARALLEL and
   X86_TUNE_REASSOC_FP_TO_PARALLEL.
 
   (ix86_reassociation_width) implementation of
   new hook for i386 target.
 
   * common.opt (ftree-reassoc-width): New option added.
 
   * tree-ssa-reassoc.c (get_required_cycles): New function.
   (get_reassociation_width): Likewise.
   (rewrite_expr_tree_parallel): Likewise.
 
   (reassociate_bb): Now checks reassociation width to be used
   and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
   if needed.
 
   (pass_reassoc): TODO_remove_unused_locals flag added.
 
 gcc/testsuite/
 
 2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com
 
   * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
   -ftree-reassoc-width=1.
 
   * gcc.dg/tree-ssa/reassoc-24.c: New test.
   * gcc.dg/tree-ssa/reassoc-25.c: Likewise.




Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-12 Thread H.J. Lu
On Tue, Jul 12, 2011 at 9:50 AM, William J. Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Ilya, thanks for posting this!  This patch is useful also on powerpc64.
 Applying it solved a performance degradation with bwaves due to loss of
 reassociation somewhere between 4.5 and 4.6 (still tracking it down).
 When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code
 generation returns.


It is good to know.  Ilya, please mention PR middle-end/44382
in ChangeLog.

Thanks.

H.J.
--


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-12 Thread William J. Schmidt
On Tue, 2011-07-12 at 11:50 -0500, William J. Schmidt wrote:
 Ilya, thanks for posting this!  This patch is useful also on powerpc64.
 Applying it solved a performance degradation with bwaves due to loss of
 reassociation somewhere between 4.5 and 4.6 (still tracking it down).
 When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code
 generation returns.
 
 Bill
 

However, it does not fix http://gcc.gnu.org/PR45671, which surprises me
as it was marked as a duplicate of this one.  Any thoughts on why this
isn't sufficient to reassociate the linear chain of adds?

Test case:

int myfunction (int a, int b, int c, int d, int e, int f, int g, int h)
{
  int ret;

  ret = a + b + c + d + e + f + g + h;
  return ret;

}