Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello, 2011/9/2 Uros Bizjak ubiz...@gmail.com: I assume that you need to split tune attribute to int and FP part to handle reassociation for other targets, since Atom handles both in the same way. Please also describe function return value in the comment (and perhaps in documentation, too). OK with this addition. Thanks, Uros. Here is patch with added comment for hook implementation. Is it OK? Thanks, Ilya --- gcc/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, Sep 6, 2011 at 2:39 PM, Ilya Enkovich enkovich@gmail.com wrote: I assume that you need to split tune attribute to int and FP part to handle reassociation for other targets, since Atom handles both in the same way. Please also describe function return value in the comment (and perhaps in documentation, too). OK with this addition. Thanks, Uros. Here is patch with added comment for hook implementation. Is it OK? Thanks, Ilya --- gcc/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. OK. Thanks, Uros.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/9/6 Uros Bizjak ubiz...@gmail.com: OK. Thanks, Uros. Could please someone check it in for me? Thanks, Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, Sep 6, 2011 at 9:07 AM, Ilya Enkovich enkovich@gmail.com wrote: 2011/9/6 Uros Bizjak ubiz...@gmail.com: OK. Thanks, Uros. Could please someone check it in for me? I checked it in for you. -- H.J.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, Sep 1, 2011 at 12:27 PM, Ilya Enkovich enkovich@gmail.com wrote: this seems to not allow cycles_best to drop with lower width, but that it can't should be an implementation detail of get_required_cycles. To make it not so, can you add a comment before the loop, like /* get_required_cycles is monotonically increasing with lower width so we can perform a binary search for the minimal width that still results in the optimal cycle count. */ Fixed. Thanks! With the above change the non-x86 specifc parts are ok. Please get approval for them from a x86 maintainer. Could please someone review x86 part? I assume that you need to split tune attribute to int and FP part to handle reassociation for other targets, since Atom handles both in the same way. Please also describe function return value in the comment (and perhaps in documentation, too). OK with this addition. Btw, I would expect integer add and integer multiply to have different settings for some targets which would mean splitting this up even further ... Richard. Thanks, Uros.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/9/2 Richard Guenther richard.guent...@gmail.com: On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote: I assume that you need to split tune attribute to int and FP part to handle reassociation for other targets, since Atom handles both in the same way. Please also describe function return value in the comment (and perhaps in documentation, too). OK with this addition. Btw, I would expect integer add and integer multiply to have different settings for some targets which would mean splitting this up even further ... Which tune attributes are meant here? Is it X86_TUNE_REASSOC_* flags or new command line param? Thanks, Ilya Richard. Thanks, Uros.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Fri, Sep 2, 2011 at 3:45 PM, Ilya Enkovich enkovich@gmail.com wrote: 2011/9/2 Richard Guenther richard.guent...@gmail.com: On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote: I assume that you need to split tune attribute to int and FP part to handle reassociation for other targets, since Atom handles both in the same way. Please also describe function return value in the comment (and perhaps in documentation, too). OK with this addition. Btw, I would expect integer add and integer multiply to have different settings for some targets which would mean splitting this up even further ... Which tune attributes are meant here? Is it X86_TUNE_REASSOC_* flags or new command line param? The X86_TUNE_REASSOC_* flags. The setting surely depends on the number of available execution units and/or whether the instructions are pipelined or not. Richard.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/9/2 Richard Guenther richard.guent...@gmail.com: On Fri, Sep 2, 2011 at 3:45 PM, Ilya Enkovich enkovich@gmail.com wrote: 2011/9/2 Richard Guenther richard.guent...@gmail.com: On Fri, Sep 2, 2011 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote: I assume that you need to split tune attribute to int and FP part to handle reassociation for other targets, since Atom handles both in the same way. Please also describe function return value in the comment (and perhaps in documentation, too). OK with this addition. Btw, I would expect integer add and integer multiply to have different settings for some targets which would mean splitting this up even further ... Which tune attributes are meant here? Is it X86_TUNE_REASSOC_* flags or new command line param? The X86_TUNE_REASSOC_* flags. The setting surely depends on the number of available execution units and/or whether the instructions are pipelined or not. Now I'm not sure it is a good idea to use these tune flags at all because we may need a plenty of them. We have a lot of types including vector ones and a variety of opcodes. Any combination may require delicate tuning. Some sort of separate table would suit better here. Though I doubt we should introduce it right now filled with 1s. Ilya Richard.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
this seems to not allow cycles_best to drop with lower width, but that it can't should be an implementation detail of get_required_cycles. To make it not so, can you add a comment before the loop, like /* get_required_cycles is monotonically increasing with lower width so we can perform a binary search for the minimal width that still results in the optimal cycle count. */ Fixed. Thanks! With the above change the non-x86 specifc parts are ok. Please get approval for them from a x86 maintainer. Could please someone review x86 part? Thanks, Ilya --- gcc/ 2011-09-01 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-09-01 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello Richard, Thanks for the review! The fact that you have to adjust gcc.dg/tree-ssa/pr38533.c looks problematic to me. Can you investigate the problem report, look at the geneated code with the atom default of the param and see whether it's still reasonably fixed (maybe you'd already done this)? This test was created as a regression test to the problem in linearize_expr_tree which moves all statements down to the first modified one during reassociation increasing registers pressure. Test has a lot of definitions which are all ORed like this: def r1 def r2 s = r1 or r2 def r3 s = s or r3 def r4 s = s or r4 ... and it checks that register pressure is not increased after reassociation by simple scan of two sequential defs. If we use reassociation width higher than 1 then test will fails because we need to increase register pressure to get parallelism and finally get code like this: def r1 def r2 def r3 t1 = r1 or r2 s = s or r3 def r4 def r5 s = s or t1 t1 = r4 or r5 ... So, I had to fix a test. + /* Check if we may use less width and still compute sequence for + the same time. It will allow us to reduce registers usage. */ + while (width 1 + get_required_cycles (ops_num, width - 1) == cycles_best) + width--; I suppose get_required_cycles () is monotonic in width? Would it make sense to binary search the best width then (I realize the default is 1 and the only target differing has 2, but ...)? Maybe at least add a comment to this effect? Or not decrement by one but instead divide by two on each iteration (which is the same for 1 and 2 ...)? It's also all a mapping that is constant - we should be able to pre-compute it for all values, eventually compressing it to a much simpler formula width = f (cpu_width, ops_num)? I replaced sequential search with a binary search. I did not pay much attention to this function because do not think it is really time consuming compared to other parts of reassociation phase. Am I too optimistic? If you think it might significantly affect compile time I can introduce a table of pre-computed values (or make it later as a separate fix). I made all other fixes as you suggested. Bootstrapped and checked on x86_64-linux. Thanks, Ilya --- gcc/ 2011-08-31 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-31 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Aug 10, 2011 at 4:49 PM, Ilya Enkovich enkovich@gmail.com wrote: Hello, Here is a new version of the patch. Changes from the previous version (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html): - updated to trunk - TODO_remove_unused_locals flag was removed from todo_flags_finish of reassoc pass Bootstrapped and checked on x86_64-linux. The fact that you have to adjust gcc.dg/tree-ssa/pr38533.c looks problematic to me. Can you investigate the problem report, look at the geneated code with the atom default of the param and see whether it's still reasonably fixed (maybe you'd already done this)? + /* Check if we may use less width and still compute sequence for + the same time. It will allow us to reduce registers usage. */ + while (width 1 + get_required_cycles (ops_num, width - 1) == cycles_best) +width--; I suppose get_required_cycles () is monotonic in width? Would it make sense to binary search the best width then (I realize the default is 1 and the only target differing has 2, but ...)? Maybe at least add a comment to this effect? Or not decrement by one but instead divide by two on each iteration (which is the same for 1 and 2 ...)? It's also all a mapping that is constant - we should be able to pre-compute it for all values, eventually compressing it to a much simpler formula width = f (cpu_width, ops_num)? +static void +rewrite_expr_tree_parallel (gimple stmt, int width, + VEC(operand_entry_t, heap) * ops) +{ + enum tree_code opcode = gimple_assign_rhs_code (stmt); + int op_num = VEC_length (operand_entry_t, ops); + int stmt_num = op_num - 1; + gimple *stmts = XNEWVEC (gimple, stmt_num); use XALLOCAVEC here and remove the free call. + if (ready_stmts_end == 0 + (i - stmt_index = width || op_index 1)) s go to the next line, not at the end + else + { + tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc); + add_referenced_var (var); please re-use a common variable for the whole chain, they will all necessarily have compatible type. Creating and maintaining decls is expensive (also avoid giving them names, thus just pass NULL instead of reassoc). + { + enum machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + int width = get_reassociation_width (ops, rhs_code, mode); as you are passing in the mode here, consider passing the number of operands in ops as well instead of ops. This way we might consider moving this whole function to the target or to generic code. Similar move the dump printing in get_reassociation_width + if (dump_file (dump_flags TDF_DETAILS)) +fprintf (dump_file, +Width = %d was chosen for reassociation\n, width); here to the caller. Sorry for taking so long to review this again. I'll promise to be quick once you post an update. Thanks, Richard. Thanks, Ilya --- gcc/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Double ping. 2011/8/19 Ilya Enkovich enkovich@gmail.com: Ping. 2011/8/10 Ilya Enkovich enkovich@gmail.com: Hello, Here is a new version of the patch. Changes from the previous version (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html): - updated to trunk - TODO_remove_unused_locals flag was removed from todo_flags_finish of reassoc pass Bootstrapped and checked on x86_64-linux. Thanks, Ilya --- gcc/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Ping. 2011/8/10 Ilya Enkovich enkovich@gmail.com: Hello, Here is a new version of the patch. Changes from the previous version (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html): - updated to trunk - TODO_remove_unused_locals flag was removed from todo_flags_finish of reassoc pass Bootstrapped and checked on x86_64-linux. Thanks, Ilya --- gcc/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello, Here is a new version of the patch. Changes from the previous version (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html): - updated to trunk - TODO_remove_unused_locals flag was removed from todo_flags_finish of reassoc pass Bootstrapped and checked on x86_64-linux. Thanks, Ilya --- gcc/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, Jul 19, 2011 at 4:46 PM, Ilya Enkovich enkovich@gmail.com wrote: Hello Richard, Thanks a lot for the review! it's not easy to follow the flow of this function, esp. I wonder + else + { + tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc); + add_referenced_var (var); + stmts[i] = build_and_add_sum (var, op1, op2, opcode); + } what you need to create new stmts for, and if you possibly create multiple temporary variables here. - TODO_verify_ssa + TODO_remove_unused_locals + | TODO_verify_ssa why? Current rewrite_expr_tree uses original statements and just change order of operands (tree leafs) usage. To keep data flow correct it moves all statements down to the last one. I tried such approach but it did not work well because of increased registers pressure in some cases (all operands are live at the same time right before the first statement of our sequence). Then I tried to move existing statements to the appropriate places (just after the last operand definition) but it appeared to be not so easy. It is easy to guarantee consistence in the final result but not easy to keep data flow consistent after each statement move. And if we do not keep code consistent after each move then we may get wrong debug statement generated. After few tries I decided that the most easy for both understanding and implementation way is to recreate statements and remove the old chain. So, I used build_and_add_sum to create all statements except the last one which never needs recreation. I added TODO_remove_unused_locals since temps used in original statements should be removed. Is this approach OK? Well, it's enough to delay that to later passes that do this, so I'd prefer to not change this in this patch. You don't seem to handle the special-cases of rewrite_expr_tree for the leafs of your tree, especially the PHI node special-casing. I think you may run into vectorization issues here. I do not see any reason why these cases are checked in rewrite_expr_tree. It is optimization of operands order and may be it will be good to move it somewhere. Is it OK if I move related code to the end of optimize_ops_list method? I suppose yes. Maybe the cases are also obsoleted by the improved loop PHI handling. Richard. Thanks Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Ping. 2011/7/26 Ilya Enkovich enkovich@gmail.com: Hello, Here is updated patch for tree reassoc phase. Update includes coding style fixes, comments update, target hook arguments change. I also moved a part of code from rewrite_expr_tree into separate function to reuse it in rewrite_expr_tree_parallel for grouping operands with the same rank. Bootstrapped and checked on x86_64-linux. Ilya -- gcc/ 2011-07-26 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-26 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello, Here is updated patch for tree reassoc phase. Update includes coding style fixes, comments update, target hook arguments change. I also moved a part of code from rewrite_expr_tree into separate function to reuse it in rewrite_expr_tree_parallel for grouping operands with the same rank. Bootstrapped and checked on x86_64-linux. Ilya -- gcc/ 2011-07-26 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-26 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, 14 Jul 2011, Ilya Enkovich wrote: * doc/tm.texi.in (reassociation_width): New hook documentation. Unless the documentation for a new hook is based on pre-existing GFDL-only text, it should go in target.def not tm.texi.in, with the only thing added to tm.texi.in being a single @hook line. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 5:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2011/7/14 Richard Guenther richard.guent...@gmail.com: But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Option is still useful if you want to try feature on platform with no hook implemented and for other performance experiments. I agree --param usage should be better here. I'll fix it. Ilya Here is a patch with new param instead of new option. Bootstrapped and checked on x86_64-linux. } +static int +ix86_reassociation_width (const_gimple stmt) all functions need comments describing what they do and what parameters they get. +@deftypefn {Target Hook} int TARGET_SCHED_REASSOCIATION_WIDTH (const_gimple @var{stmt}) +This hook is called by tree reassociator to determine a level of +parallelism required in output calculations chain. I don't think we should get an actual statement here, but an enum tree_code and a machine mode. +/* Find out how many cycles we need to compute whole statements + chain holding ops_num operands if may execute up to cpu_width + statements per cycle. */ I have a hard time parsing this sentence, maybe a native english speaker can help here. +static int +get_reassociation_width (VEC(operand_entry_t, heap) * ops, gimple stmt) +{ + int param_width = PARAM_VALUE (PARAM_TREE_REASSOC_WIDTH); + int ops_num = VEC_length (operand_entry_t, ops); + int width; + int cycles_best; + + if (param_width 0) +width = param_width; + else +width = targetm.sched.reassociation_width (stmt); this is the only place you need stmt, with tree-code and mode args you'd not need it (and it's odd anyway). + while (width 1 +get_required_cycles (ops_num, width - 1) == cycles_best) s go on the next line, similar cases in your patch as well + if (dump_file (dump_flags TDF_DETAILS)) +{ + fprintf (dump_file, + Width = %d was chosen for reassociation\n, width); +} no {}s around single-stmts +static void +rewrite_expr_tree_parallel (gimple stmt, int width, + VEC(operand_entry_t, heap) * ops) +{ it's not easy to follow the flow of this function, esp. I wonder + else + { + tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc); + add_referenced_var (var); + stmts[i] = build_and_add_sum (var, op1, op2, opcode); + } what you need to create new stmts for, and if you possibly create multiple temporary variables here. You don't seem to handle the special-cases of rewrite_expr_tree for the leafs of your tree, especially the PHI node special-casing. I think you may run into vectorization issues here. - TODO_verify_ssa + TODO_remove_unused_locals +| TODO_verify_ssa why? I think the patch looks reasonable overall, please update it with the above comments and re-post it. Thanks, Richard. Ilya -- gcc/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello Richard, Thanks a lot for the review! it's not easy to follow the flow of this function, esp. I wonder + else + { + tree var = create_tmp_reg (TREE_TYPE (last_rhs1), reassoc); + add_referenced_var (var); + stmts[i] = build_and_add_sum (var, op1, op2, opcode); + } what you need to create new stmts for, and if you possibly create multiple temporary variables here. - TODO_verify_ssa + TODO_remove_unused_locals + | TODO_verify_ssa why? Current rewrite_expr_tree uses original statements and just change order of operands (tree leafs) usage. To keep data flow correct it moves all statements down to the last one. I tried such approach but it did not work well because of increased registers pressure in some cases (all operands are live at the same time right before the first statement of our sequence). Then I tried to move existing statements to the appropriate places (just after the last operand definition) but it appeared to be not so easy. It is easy to guarantee consistence in the final result but not easy to keep data flow consistent after each statement move. And if we do not keep code consistent after each move then we may get wrong debug statement generated. After few tries I decided that the most easy for both understanding and implementation way is to recreate statements and remove the old chain. So, I used build_and_add_sum to create all statements except the last one which never needs recreation. I added TODO_remove_unused_locals since temps used in original statements should be removed. Is this approach OK? You don't seem to handle the special-cases of rewrite_expr_tree for the leafs of your tree, especially the PHI node special-casing. I think you may run into vectorization issues here. I do not see any reason why these cases are checked in rewrite_expr_tree. It is optimization of operands order and may be it will be good to move it somewhere. Is it OK if I move related code to the end of optimize_ops_list method? Thanks Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com: One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n You also need approval and somebody to review the patch before checkin. Richard. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther richard.guent...@gmail.com wrote: 2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com: One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n You also need approval and somebody to review the patch before checkin. Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. Richard. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther richard.guent...@gmail.com: Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. Introduced target hook does not just define default value for option. It is meant to make decision in each particular case. For now it returns a constant value but march dependent heuristics will be added later. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:59 AM, Ilya Enkovich enkovich@gmail.com wrote: 2011/7/14 Richard Guenther richard.guent...@gmail.com: Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. Introduced target hook does not just define default value for option. It is meant to make decision in each particular case. For now it returns a constant value but march dependent heuristics will be added later. But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n -- Michael Meissner, IBM Thanks for the note! Here is fixed patch. Ilya -- gcc/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (ftree-reassoc-width): New option documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther richard.guent...@gmail.com: But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Option is still useful if you want to try feature on platform with no hook implemented and for other performance experiments. I agree --param usage should be better here. I'll fix it. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther richard.guent...@gmail.com: But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Option is still useful if you want to try feature on platform with no hook implemented and for other performance experiments. I agree --param usage should be better here. I'll fix it. Ilya Here is a patch with new param instead of new option. Bootstrapped and checked on x86_64-linux. Ilya -- gcc/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:32:59AM +0200, Richard Guenther wrote: On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther richard.guent...@gmail.com wrote: 2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com: One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n You also need approval and somebody to review the patch before checkin. Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. You need the target hook that tells how big the reassociation based on the type. Machines have different numbers of functional units, for example, maybe 3 integer units and 2 floating point units. For example, in the PowerPC, I would set the basic integer and binary floating point types to 2 to match the dispatch rules, but for decimal floating point, I would set it to one, since the machines currently only have 1 decimal unit. With the hook, I could see eliminating the switch and/or --param altogether, and doing only in the hook. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
You need the target hook that tells how big the reassociation based on the type. Machines have different numbers of functional units, for example, maybe 3 integer units and 2 floating point units. For example, in the PowerPC, I would set the basic integer and binary floating point types to 2 to match the dispatch rules, but for decimal floating point, I would set it to one, since the machines currently only have 1 decimal unit. With the hook, I could see eliminating the switch and/or --param altogether, and doing only in the hook. Hook introduced by this patch meets these requirements. But I think it is useful to have option to override the hook because you cannot tune it perfectly for each case. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello William, However, it does not fix http://gcc.gnu.org/PR45671, which surprises me as it was marked as a duplicate of this one. Any thoughts on why this isn't sufficient to reassociate the linear chain of adds? Test case: int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; } Reassociation does not work for signed integers because signed integer is not wrap-around type in C. You can change it by passing -fwrapv option but it will disable other useful optimization. Reassociation of signed integers without this option is not a trivial question because in that case you may introduce overflows and therefore undefined behavior. BR Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Ilya, please mention PR middle-end/44382 in ChangeLog. Thanks for notice. Here is corrected ChangeLog: gcc/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Jul 13, 2011 at 11:52:25AM +0400, Ilya Enkovich wrote: However, it does not fix http://gcc.gnu.org/PR45671, which surprises me as it was marked as a duplicate of this one. Any thoughts on why this isn't sufficient to reassociate the linear chain of adds? Test case: int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; } Reassociation does not work for signed integers because signed integer is not wrap-around type in C. You can change it by passing -fwrapv option but it will disable other useful optimization. Reassociation of signed integers without this option is not a trivial question because in that case you may introduce overflows and therefore undefined behavior. Well, if it is clearly a win to reassociate, you can always reassociate them by doing arithmetics in corresponding unsigned type and afterwards converting back to the signed type. Jakub
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Well, if it is clearly a win to reassociate, you can always reassociate them by doing arithmetics in corresponding unsigned type and afterwards converting back to the signed type. Jakub You are right. But in this case we again make all operands have wrap-around type and thus disable some other optimization. It would be nice to have opportunity to reassociate and still have undefined behavior on overflow for optimizations. One way to do it for add/sub is to use wider type (long long instead of int). Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Jul 13, 2011 at 01:01:59PM +0400, Ilya Enkovich wrote: Well, if it is clearly a win to reassociate, you can always reassociate them by doing arithmetics in corresponding unsigned type and afterwards converting back to the signed type. You are right. But in this case we again make all operands have wrap-around type and thus disable some other optimization. It would be nice to have opportunity to reassociate and still have undefined behavior on overflow for optimizations. One way to do it for add/sub is to use wider type (long long instead of int). I disagree. Widening would result in worse code in most cases, as you need to sign extend all the operands. On the other side, I doubt you can actually usefully use the undefinedness of signed overflow for a series of 3 or more operands of the associative operation. Jakub
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/13 Jakub Jelinek ja...@redhat.com: I disagree. Widening would result in worse code in most cases, as you need to sign extend all the operands. On the other side, I doubt you can actually usefully use the undefinedness of signed overflow for a series of 3 or more operands of the associative operation. Jakub Sounds reasonable. Type casting to unsigned should be a better solution here. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Jul 13, 2011 at 11:18 AM, Ilya Enkovich enkovich@gmail.com wrote: 2011/7/13 Jakub Jelinek ja...@redhat.com: I disagree. Widening would result in worse code in most cases, as you need to sign extend all the operands. On the other side, I doubt you can actually usefully use the undefinedness of signed overflow for a series of 3 or more operands of the associative operation. Jakub Sounds reasonable. Type casting to unsigned should be a better solution here. Well, the solution of course lies in the no-undefined-overflow branch where we have separate tree codes for arithmetic with/without undefined overflow. Richard. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, 2011-07-12 at 11:50 -0500, William J. Schmidt wrote: Ilya, thanks for posting this! This patch is useful also on powerpc64. Applying it solved a performance degradation with bwaves due to loss of reassociation somewhere between 4.5 and 4.6 (still tracking it down). When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code generation returns. On further investigation, this is improving the code generation but not reverting all of the performance loss. We'll open a bug on this one once we have it narrowed down a little further. Bill Bill
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
I just ran a spec 2006 run on the powerpc (32-bit) last night setting the reassociation to 2. I do see a win in bwaves, but unfortunately it is not enough of a win, and it is still a regression to GCC 4.5. However, I see some regressions in 3 other benchmarks (I tend to omit differences of less than 2%): 401.bzip2 97.99% 410.bwaves113.88% 436.cactusADM 93.96% 444.namd 93.74% The profile differences are as follows. Unfortunately, I'm not sure I can post sample counts under Spec rules: Bzip2: GCC 4.7 GCC 4.7 with patchesFunction === 28.96% 28.39% mainSort 15.94% 15.49% BZ2_decompress 12.56% 12.35% mainGtU.part.0 11.59% 11.54% generateMTFValues 8.89% 9.04% fallbackSort 6.60% 8.28% BZ2_compressBlock 7.48% 7.21% handle_compress.isra.2 6.24% 5.95% BZ2_bzDecompress 0.55% 0.58% add_pair_to_block 0.54% 0.54% BZ2_hbMakeCodeLengths Bwaves: GCC 4.7 GCC 4.7 with patchesFunction === 78.70% 74.73% mat_times_vec_ 11.68% 13.21% bi_cgstab_block_ 6.72% 8.47% shell_ 2.11% 2.62% jacobian_ 0.79% 0.96% flux_ CactusADM: GCC 4.7 GCC 4.7 with patchesFunction === 99.67% 99.69% bench_staggeredleapfrog2_ Namd: GCC 4.7 GCC 4.7 with patchesFunction === 15.43% 14.71% _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded.part.39 11.94% 11.80% _ZN20ComputeNonbondedUtil19calc_pair_fullelectEP9nonbonded.part.40 10.18% 11.52% _ZN20ComputeNonbondedUtil32calc_pair_energy_merge_fullelectEP9nonbonded.part.37 9.87% 9.02% _ZN20ComputeNonbondedUtil16calc_pair_energyEP9nonbonded.part.41 9.55% 8.85% _ZN20ComputeNonbondedUtil9calc_pairEP9nonbonded.part.42 9.52% 9.05% _ZN20ComputeNonbondedUtil25calc_pair_merge_fullelectEP9nonbonded.part.38 7.24% 8.72% _ZN20ComputeNonbondedUtil26calc_self_energy_fullelectEP9nonbonded.part.31 6.28% 6.42% _ZN20ComputeNonbondedUtil19calc_self_fullelectEP9nonbonded.part.32 5.23% 6.18% _ZN20ComputeNonbondedUtil32calc_self_energy_merge_fullelectEP9nonbonded.part.29 5.13% 4.66% _ZN20ComputeNonbondedUtil16calc_self_energyEP9nonbonded.part.33 4.72% 4.43% _ZN20ComputeNonbondedUtil25calc_self_merge_fullelectEP9nonbonded.part.30 4.60% 4.37% _ZN20ComputeNonbondedUtil9calc_selfEP9nonbonded.part.34 -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/13 Michael Meissner meiss...@linux.vnet.ibm.com: I just ran a spec 2006 run on the powerpc (32-bit) last night setting the reassociation to 2. I do see a win in bwaves, but unfortunately it is not enough of a win, and it is still a regression to GCC 4.5. However, I see some regressions in 3 other benchmarks (I tend to omit differences of less than 2%): 401.bzip2 97.99% 410.bwaves 113.88% 436.cactusADM 93.96% 444.namd 93.74% I tried it on AMD, Core i5, ARM (Cortex A9) and Atom using SPEC2000 and EEMBC suites. All platforms showed both gains and regressions. The only platform whose gains outweighted regressions enough without additional tuning was Atom. Surely we should avoid tree height reduction in some cases and may tune it but I think it will be nice to have feature checked in first since at least one platform benefits from it. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
[PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello, Here is a patch related to missed optimization opportunity in tree reassoc phase. Currently tree reassoc phase always generates a linear form which requires the minimum registers but has the highest tree height and does not allow computation to be performed in parallel. It may be critical for performance if required operations have high latency but can be pipelined (i.e. few execution units or low throughput). This problem becomes important on current Atom processors which are in-order and have many such instructions: IMUL and scalar SSE FP instructions. This patch introduces a new feature to tree reassoc phase to generate computation tree with reduced height allowing to perform few long-latency instructions in parallel. It changes only one part of reassociation - rewrite_expr_tree. A level of parallelism is controlled via target hook and/or command line option. New feature is enabled for Atom only by default. Patch boosts mostly CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit. Bootstrapped and checked on x86_64-linux. Thanks, Ilya -- gcc/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Ilya, thanks for posting this! This patch is useful also on powerpc64. Applying it solved a performance degradation with bwaves due to loss of reassociation somewhere between 4.5 and 4.6 (still tracking it down). When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code generation returns. Bill On Tue, 2011-07-12 at 16:30 +0400, Илья Энкович wrote: Hello, Here is a patch related to missed optimization opportunity in tree reassoc phase. Currently tree reassoc phase always generates a linear form which requires the minimum registers but has the highest tree height and does not allow computation to be performed in parallel. It may be critical for performance if required operations have high latency but can be pipelined (i.e. few execution units or low throughput). This problem becomes important on current Atom processors which are in-order and have many such instructions: IMUL and scalar SSE FP instructions. This patch introduces a new feature to tree reassoc phase to generate computation tree with reduced height allowing to perform few long-latency instructions in parallel. It changes only one part of reassociation - rewrite_expr_tree. A level of parallelism is controlled via target hook and/or command line option. New feature is enabled for Atom only by default. Patch boosts mostly CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit. Bootstrapped and checked on x86_64-linux. Thanks, Ilya -- gcc/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, Jul 12, 2011 at 9:50 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Ilya, thanks for posting this! This patch is useful also on powerpc64. Applying it solved a performance degradation with bwaves due to loss of reassociation somewhere between 4.5 and 4.6 (still tracking it down). When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code generation returns. It is good to know. Ilya, please mention PR middle-end/44382 in ChangeLog. Thanks. H.J. --
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, 2011-07-12 at 11:50 -0500, William J. Schmidt wrote: Ilya, thanks for posting this! This patch is useful also on powerpc64. Applying it solved a performance degradation with bwaves due to loss of reassociation somewhere between 4.5 and 4.6 (still tracking it down). When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code generation returns. Bill However, it does not fix http://gcc.gnu.org/PR45671, which surprises me as it was marked as a duplicate of this one. Any thoughts on why this isn't sufficient to reassociate the linear chain of adds? Test case: int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; }