[Bug target/43902] suboptimal MIPS widening multiply accumulate
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED Target Milestone|--- |4.6.0 --- Comment #15 from Andrew Pinski pinskia at gcc dot gnu.org 2012-07-27 21:33:16 UTC --- Fixed.
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #14 from bernds at gcc dot gnu dot org 2010-06-29 13:44 --- Subject: Bug 43902 Author: bernds Date: Tue Jun 29 13:43:57 2010 New Revision: 161533 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=161533 Log: PR target/43902 * config/arm/arm.md (maddsidi4, umaddsidi4): New expanders. (maddhisi4): Renamed from mulhisi3addsi. Operands renumbered. (maddhidi4): Likewise. testsuite/ PR target/43902 * gcc.target/arm/wmul-1.c: Test for smlabb instead of smulbb. * gcc.target/arm/wmul-3.c: New test. * gcc.target/arm/wmul-4.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/wmul-3.c trunk/gcc/testsuite/gcc.target/arm/wmul-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/arm/wmul-1.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #13 from bernds at gcc dot gnu dot org 2010-06-25 08:56 --- Subject: Bug 43902 Author: bernds Date: Fri Jun 25 08:56:24 2010 New Revision: 161366 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=161366 Log: With large parts from Jim Wilson: PR target/43902 * tree-pretty-print.c (dump_generic_node, op_code_prio): Add WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR. * optabs.c (optab_for_tree_code): Likewise. (expand_widen_pattern_expr): Likewise. * tree-ssa-math-opts.c (convert_mult_to_widen): New function, broken out of execute_optimize_widening_mul. (convert_plusminus_to_widen): New function. (execute_optimize_widening_mul): Use the two new functions. * expr.c (expand_expr_real_2): Add support for GIMPLE_TERNARY_RHS. Remove code to generate widening multiply-accumulate. Add support for WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR. * gimple-pretty-print.c (dump_ternary_rhs): New function. (dump_gimple_assign): Call it when appropriate. * tree.def (WIDEN_MULT_PLUS_EXPR, WIDEN_MULT_MINUS_EXPR): New codes. * cfgexpand.c (gimple_assign_rhs_to_tree): Likewise. (expand_gimple_stmt_1): Likewise. (expand_debug_expr): Support WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * gimple.c (extract_ops_from_tree_1): Renamed from extract_ops_from_tree. Add new arg for a third operand; fill it. (gimple_build_assign_stat): Support operations with three operands. (gimple_build_assign_with_ops_stat): Likewise. (gimple_assign_set_rhs_from_tree): Likewise. (gimple_assign_set_rhs_with_ops_1): Renamed from gimple_assign_set_rhs_with_ops. Add new arg for a third operand. (get_gimple_rhs_num_ops): Support GIMPLE_TERNARY_RHS. (get_gimple_rhs_num_ops): Handle WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR. * gimple.h (enum gimple_rhs_class): Add GIMPLE_TERNARY_RHS. (extract_ops_from_tree_1): Adjust declaration. (gimple_assign_set_rhs_with_ops_1): Likewise. (gimple_build_assign_with_ops): Pass NULL for last operand. (gimple_build_assign_with_ops3): New macro. (gimple_assign_rhs3, gimple_assign_rhs3_ptr, gimple_assign_set_rhs3, gimple_assign_set_rhs_with_ops, extract_ops_from_tree): New inline functions. * tree-cfg.c (verify_gimple_assign_ternary): New static function. (verify_gimple_assign): Call it. * doc/gimple.texi (Manipulating operands): Document GIMPLE_TERNARY_RHS. (Tuple specific accessors, subsection GIMPLE_ASSIGN): Document new functions for dealing with three-operand statements. * tree.c (commutative_ternary_tree_code): New function. * tree.h (commutative_ternary_tree_code): Declare it. * tree-vrp.c (gimple_assign_nonnegative_warnv_p): Return false for ternary statements. (gimple_assign_nonzero_warnv_p): Likewise. * tree-ssa-sccvn.c (stmt_has_constants): Handle GIMPLE_TERNARY_RHS. * tree-ssa-ccp.c (get_rhs_assign_op_for_ccp): New static function. (ccp_fold): Use it. Handle GIMPLE_TERNARY_RHS. * tree-ssa-dom.c (enum expr_kind): Add EXPR_TERNARY. (struct hashtable_expr): New member ternary in the union. (initialize_hash_element): Handle GIMPLE_TERNARY_RHS. (hashable_expr_equal_p): Fix indentation. Handle EXPR_TERNARY. (iterative_hash_hashable_expr): Likewise. (print_expr_hash_elt): Handle EXPR_TERNARY. * gimple-fold.c (fold_gimple_assign): Handle GIMPLE_TERNARY_RHS. * tree-ssa-threadedge.c (fold_assignment_stmt): Remove useless break statements. Handle GIMPLE_TERNARY_RHS. testsuite/ PR target/43902 * gcc.target/mips/madd-9.c: New test. Added: trunk/gcc/testsuite/gcc.target/mips/madd-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/doc/gimple.texi trunk/gcc/expr.c trunk/gcc/gimple-fold.c trunk/gcc/gimple-pretty-print.c trunk/gcc/gimple.c trunk/gcc/gimple.h trunk/gcc/optabs.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-cfg.c trunk/gcc/tree-inline.c trunk/gcc/tree-pretty-print.c trunk/gcc/tree-ssa-ccp.c trunk/gcc/tree-ssa-dom.c trunk/gcc/tree-ssa-math-opts.c trunk/gcc/tree-ssa-operands.c trunk/gcc/tree-ssa-sccvn.c trunk/gcc/tree-ssa-threadedge.c trunk/gcc/tree-vrp.c trunk/gcc/tree.c trunk/gcc/tree.def trunk/gcc/tree.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #10 from wilson at codesourcery dot com 2010-06-16 06:30 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Wed, 2010-06-09 at 20:21 +, bernds at gcc dot gnu dot org wrote: What do you think? Please let me know what your MIPS tests turned up. I'm looking at this again. My MIPS tests showed that my patch fixed 17 gcc.target/mips multiply-accumulate testcases that were accidentally broken by the new tree level widening multiply optimization pass. My new testcase madd-9.c failed, but it turned out that I accidentally double-patched the file. Fixing the file, it now passes. I forgot to include this testcase in my second patch, though it was there in the first one. The bad news is that there are two new failures: dpaq_sa_l_w.c and dpsq_sa_l_w.c. These are MIPS DSP fixed-point multiply-accumulate testcases, which is something I definitely didn't bother to check earlier. Overall I would say it is a win, since it fixes many int/long/long long examples, and only breaks a few obscure cases that should be fixable with a little more work. I'm looking at your patch too. There is a testcase that doesn't appear to belong, gcc.target/arm/pr42172-1.c. I'm not sure why convert_plusminus_to_widen needs to check for MULT and call convert_mult_to_widen. If we are scanning basic blocks from top to bottom, it seems that the multiplies would have already been checked. But maybe this has something to do with looking at operands computed in other basic blocks that haven't been scanned yet, in which case it would make sense. I'm not sure if that is possible. Otherwise, it looks like you have completed and cleanup up a bunch of stuff that I didn't get far enough to notice. It all looks good to me. I can try rerunning MIPS testcases to see how it works. I see that the failure of the DSP fixed-point tests is because we have checks for + if (TREE_CODE (type) != INTEGER_TYPE) +return false; and the old code in expr.c that we are replacing does - if ((TREE_CODE (type) == INTEGER_TYPE - || TREE_CODE (type) == FIXED_POINT_TYPE) So that looks like an easy fix (assuming no other complications), but will require a rebuild and retest. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #11 from bernds at gcc dot gnu dot org 2010-06-16 13:29 --- Yes, the check for MULT is for cases where the definition is after the use in basic-block order; I'd expect this can happen with crazy gotos and maybe in other cases as well. Could you retest the MIPS fixed-point testcases with the obvious fix? You probably have the MIPS toolchain set up already and it would probably take me more time. After that, I'm happy to approve your parts of the patch if you'll approve mine :-) so I can then check it in if you like. We could post it for review by the gimple crowd first though. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #12 from wilson at codesourcery dot com 2010-06-17 04:29 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Wed, 2010-06-16 at 13:29 +, bernds at gcc dot gnu dot org wrote: Could you retest the MIPS fixed-point testcases with the obvious fix? You probably have the MIPS toolchain set up already and it would probably take me more time. Unfortunately, the fix is not quite that easy. We need to handle saturating types in the optab check. I already wrote the multiply add check that way, but the widening-multiply check needs to be fixed. It turns out that there isn't any obtab for saturating widening multiply, so we have to add one. Also, there is also no pattern for it in the mips-fixed.md file. There is only the saturating widening multiply accumulate pattern. I believe I can fix this by adding a multiply accumulate pattern that adds zero, but now I'm starting to get cascading patches here. This could take a few days. Since I was looking at adding zero, I couldn't help but notice that the MIPS fixed point support seems incomplete. _Sat long _Fract f1 (void) { return 0; } gives me j $31 lw $2,.LC0 .LC0: .word 0x0 But this can be split off into a separate bug report. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #9 from bernds at gcc dot gnu dot org 2010-06-09 20:20 --- Created an attachment (id=20880) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20880action=view) A new version of Jim's patch Here's what I've done with it so far. I've changed the new tree code to be a proper gimple operation, which meant extending all sorts of gimple code to handle ternary operations. tree-ssa-math-opts runs late enough that I don't think we need to handle them in most passes, so I've turned some existing functions into wrappers that continue to present a two-operand interface. The other thing I've done is to restructure the pass a little to avoid the second loop over all the insns. I've changed the ARM backend to take advantage of it. Regression tests on ARM look fine (although I've had to change one of the testcases which didn't expect widen-macc to be generated). What do you think? Please let me know what your MIPS tests turned up. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #7 from bernds at gcc dot gnu dot org 2010-06-07 21:34 --- Jim, are you still working on this or should I pick it up? -- bernds at gcc dot gnu dot org changed: What|Removed |Added CC||bernds at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #8 from wilson at codesourcery dot com 2010-06-07 22:18 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Mon, 2010-06-07 at 21:34 +, bernds at gcc dot gnu dot org wrote: Jim, are you still working on this or should I pick it up? I'm working on it, but only a few hours a week, so my progress will be slow. If you have more time that would help. I put a patch in the PR that seems to be working well, but there were a few minor issues that turned up during a MIPS testsuite run that I wanted to look at before formally submitting the patch. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #5 from wilson at gcc dot gnu dot org 2010-05-25 06:30 --- Richard Guenther suggested using DOT_PROD_EXPR. I ran into several problems with that. DOT_PROD_EXPR expands to the sdot_prodM pattern. The mips port is using maddMN. We essentially have two named patterns that are doing the exact same thing, except that one is only used with vector types and one is only used with integer types. The name DOT_PROD_EXPR makes sense for the vector type case, but not so much for the integer type case. sdot_prodM gets installed into the optabs table with mode M. maddMN gets installed into the optabs table with mode N, where N is twice the size of mode M. This complicates lookup, since we need to use different modes for the different operators. All widening integer operations use mode N here, so it seems wrong to change one. sdot_prodM is available in two flavors, signed/unsigned. maddMN is available in 8 flavors, signed/unsigned saturating/non-saturating multiply add/subtract. The subtract part is the hard one, since I can't see any way to get a subtract from a dot product operator. Dot product very strongly implies that you are doing addition, and I know of no equivalent that uses subtraction. dot_prodM patterns are used in 3 md files (i386, ia64, rs6000). maddMN patterns are used in 1 md file (mips). Rather than mess with this, I ended up just adding some new tree operators, WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #6 from wilson at gcc dot gnu dot org 2010-05-25 06:35 --- Created an attachment (id=20739) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20739action=view) second patch attempt, modifying widen_mult tree pass This removes about 100 lines from expr.c, and adds about 90 lines to tree-ssa-math, plus unfortunately infrastructure for the new tree codes WIDEN_MULT_{PLUS,MINUS}_EXPR which is about 70 lines. It works for simple testcases. Needs a more thorough test. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #3 from wilson at codesourcery dot com 2010-05-03 22:28 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Tue, 2010-04-27 at 09:33 +, rguenth at gcc dot gnu dot org wrote: For more general optimization you might want to move all this code to the tree pass before expansion that detects widening multiplication. The DOT_PROD_EXPR tree code can be used to carry the information to the expander. I didn't understand the point here at first, but after updating my tree and rebuilding I now see Bernd's patch to add the new widening_mul optimization pass which I hadn't noticed before. The MIPS macc/madd support is now even more broken than before. I can generate them at -O but not at -O2 now, as the code in expand_expr_real_2 doesn't handle WIDEN_MULT_EXPR which is created at -O2 by the new code. So as Richard Guenther mentioned, the solution seems to be to remove the old code in expand_expr_real_2 and add something new in tree-ssa-math-opt.c to replace it. This means we won't be able to generate widening multiply accumulate with -O, but we will at -O2. This is probably acceptable, though it makes the solution more complicated than I had originally expected. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #4 from wilson at gcc dot gnu dot org 2010-05-03 22:36 --- Created an attachment (id=20552) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20552action=view) trivial solution for original problem This fixes the original problem, but does not fix the new breakage caused by Bernd's patch to add the new optimize_widening_mul pass. So this works at -O but not at -O2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #2 from rguenth at gcc dot gnu dot org 2010-04-27 09:33 --- (In reply to comment #1) Some further investigation shows that there is code in expand_expr_real_2 that is supposed to be able to generate multiply-accumulate instructions, but it isn't general enough. In my gimple, I have D.1999_10 = D.1998_9 * D.1996_7; total_11 = total_19 + D.1999_10; The code in expr.c does - if ((TREE_CODE (type) == INTEGER_TYPE - || TREE_CODE (type) == FIXED_POINT_TYPE) - (subexp0_def = get_def_for_expr (treeop0, - MULT_EXPR))) which fails because the multiply operand is treeop1 not treeop0. We need to check both operands for the multiply here. I have an initial patch that needs testing. For more general optimization you might want to move all this code to the tree pass before expansion that detects widening multiplication. The DOT_PROD_EXPR tree code can be used to carry the information to the expander. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-04-27 09:33:19 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #1 from wilson at gcc dot gnu dot org 2010-04-27 01:17 --- Some further investigation shows that there is code in expand_expr_real_2 that is supposed to be able to generate multiply-accumulate instructions, but it isn't general enough. In my gimple, I have D.1999_10 = D.1998_9 * D.1996_7; total_11 = total_19 + D.1999_10; The code in expr.c does - if ((TREE_CODE (type) == INTEGER_TYPE - || TREE_CODE (type) == FIXED_POINT_TYPE) - (subexp0_def = get_def_for_expr (treeop0, - MULT_EXPR))) which fails because the multiply operand is treeop1 not treeop0. We need to check both operands for the multiply here. I have an initial patch that needs testing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902