Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
On 07/04/2017 05:13 AM, Hurugalawadi, Naveen wrote: > Hi, > > Thanks for the review and comments on the patch. > >>> The proposed patch handled both the same. This means the pattern >>> shouldn't use range-info but instead match a more complex > > The patch handles as per the discussion by matching the pattern > in match.pd. > > Bootstrapped and Regression tested on AArch64 and X86_64. > Please review the patch and let us know if its okay? > > Thanks, > Naveen > > 2017-07-04 Naveen H.S> > gcc > * match.pd (((m1 >/=/<= m2) * d -> (m1 >/=/<= m2) ? d : 0) New > pattern. > > gcc/testsuite > * gcc.dg/tree-ssa/vrp116.c: New Test. > OK for the trunk. Sorry for the delay. Presumably EQ/NE are handled by a different pattern? Jeff
Re: [PING} [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
Hi, Please consider this as a personal reminder to review the patch at following link and let me know your comments on the same. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00178.html Thanks, Naveen
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
Hi, Thanks for the review and comments on the patch. >> The proposed patch handled both the same. This means the pattern >> shouldn't use range-info but instead match a more complex The patch handles as per the discussion by matching the pattern in match.pd. Bootstrapped and Regression tested on AArch64 and X86_64. Please review the patch and let us know if its okay? Thanks, Naveen 2017-07-04 Naveen H.Sgcc * match.pd (((m1 >/=/<= m2) * d -> (m1 >/=/<= m2) ? d : 0) New pattern. gcc/testsuite * gcc.dg/tree-ssa/vrp116.c: New Test.diff --git a/gcc/match.pd b/gcc/match.pd index 4c64b21..d914db1 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1088,6 +1088,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && tree_nop_conversion_p (type, TREE_TYPE (@1))) (convert (bit_and (bit_not @1) @0 +/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0 */ +(for cmp (gt lt ge le) +(simplify + (mult (convert (cmp @0 @1)) @2) + (cond (cmp @0 @1) @2 { build_zero_cst (type); }))) + /* For integral types with undefined overflow and C != 0 fold x * C EQ/NE y * C into x EQ/NE y. */ (for cmp (eq ne) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c new file mode 100644 index 000..d9d7b23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-vrp1" } */ + +int +f (int m1, int m2, int c) +{ + int d = m1 > m2; + int e = d * c; + return e ? m1 : m2; +} + +/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
On Thu, Jun 29, 2017 at 5:10 PM, Wilco Dijkstrawrote: > Richard Biener wrote: > >> int f (int m, int c) >> { >> return (m & 1) * c; >> } > > This case (integer[0,1] rather than boolean input) should be transformed into > c & -(m & 1). The proposed patch handled both the same. This means the pattern shouldn't use range-info but instead match a more complex (mult (convert (cmp @0 @1)) @2) ? Note that from a gimple perspective c & -(m & 1) is more complex than (m & 1) * c and thus non-canonical. > Wilco
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
Richard Biener wrote: > int f (int m, int c) > { > return (m & 1) * c; > } This case (integer[0,1] rather than boolean input) should be transformed into c & -(m & 1). Wilco
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
On 06/29/2017 05:20 AM, Wilco Dijkstra wrote: > Richard Biener wrote: >> Hurugalawadi, Naveen wrote: >>> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0. > >> What's the reason of this transform? I expect that the HW multiplier >> is quite fast given one operand is either zero or one and a multiplication >> is a gimple operation that's better handled in optimizations than >> COND_EXPRs which eventually expand to conditional code which >> would be much slower. > > Even really fast multipliers have several cycles latency, and this is > generally > fixed irrespectively of the inputs. Maybe you were thinking about division? And on some targets, just getting the arguments into the right register bank is many cycles. Think HPPA where integer multiply occurs in the floating point unit. Though I don't think that oddity should drive this discussion. > > Additionally integer multiply typically has much lower throughput than other > ALU operations like conditional move - a modern CPU may have 4 ALUs > but only 1 multiplier, so removing redundant integer multiplies is always > good. I'd tend to agree in general. jeff
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstrawrote: > Richard Biener wrote: >> Hurugalawadi, Naveen wrote: >> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0. > >> What's the reason of this transform? I expect that the HW multiplier >> is quite fast given one operand is either zero or one and a multiplication >> is a gimple operation that's better handled in optimizations than >> COND_EXPRs which eventually expand to conditional code which >> would be much slower. > > Even really fast multipliers have several cycles latency, and this is > generally > fixed irrespectively of the inputs. Maybe you were thinking about division? > > Additionally integer multiply typically has much lower throughput than other > ALU operations like conditional move - a modern CPU may have 4 ALUs > but only 1 multiplier, so removing redundant integer multiplies is always > good. > > Note (m1 > m2) is also a conditional expression which will result in branches > for floating point expressions and on some targets even for integers. Moving > the multiply into the conditional expression generates the best code: > > Integer version: > f1: > cmpw0, 100 > csel w0, w1, wzr, gt > ret > f2: > cmpw0, 100 > cset w0, gt > mulw0, w0, w1 > ret > > Float version: > f3: > movi v1.2s, #0 > cmpw0, 100 > fcsel s0, s0, s1, gt > ret > f4: > cmpw0, 100 > bgt.L8 > movi v1.2s, #0 > fmul s0, s0, s1 // eh??? > .L8: > ret But then int f (int m, int c) { return (m & 1) * c; } int g (int m, int c) { if (m & 1 != 0) return c; return 0; } f: .LFB0: .cfi_startproc andl$1, %edi movl%edi, %eax imull %esi, %eax ret g: .LFB1: .cfi_startproc movl%edi, %eax andl$1, %eax cmovne %esi, %eax ret anyway. As a general comment to the patch please do it as a pattern in match.pd (match boolean_value_range_p @0 (if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) == 1))) (match boolean_value_range_p INTEGER_CST (if (integer_zerop (t) || integer_onep (t (match boolean_value_range_p SSA_NAME (if (INTEGRAL_TYPE_P (type) && ~get_nonzero_bits (t) == 1))) (simplify (mult:c boolean_value_range_p@0 @1) (cond @0 @1 @0)) or something like that. Richard. > Wilco
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
Richard Biener wrote: > Hurugalawadi, Naveen wrote: > > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0. > What's the reason of this transform? I expect that the HW multiplier > is quite fast given one operand is either zero or one and a multiplication > is a gimple operation that's better handled in optimizations than > COND_EXPRs which eventually expand to conditional code which > would be much slower. Even really fast multipliers have several cycles latency, and this is generally fixed irrespectively of the inputs. Maybe you were thinking about division? Additionally integer multiply typically has much lower throughput than other ALU operations like conditional move - a modern CPU may have 4 ALUs but only 1 multiplier, so removing redundant integer multiplies is always good. Note (m1 > m2) is also a conditional expression which will result in branches for floating point expressions and on some targets even for integers. Moving the multiply into the conditional expression generates the best code: Integer version: f1: cmpw0, 100 csel w0, w1, wzr, gt ret f2: cmpw0, 100 cset w0, gt mulw0, w0, w1 ret Float version: f3: movi v1.2s, #0 cmpw0, 100 fcsel s0, s0, s1, gt ret f4: cmpw0, 100 bgt.L8 movi v1.2s, #0 fmul s0, s0, s1 // eh??? .L8: ret Wilco
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
On Thu, Jun 29, 2017 at 7:06 AM, Hurugalawadi, Naveenwrote: > Hi, > > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0. > > The patch optimizes it inside tree-vrp.c when simplifying with the range > inside simplify_stmt_using_ranges. If a multiply is found and either side > has a range [0,1], then transform it. > > Ex:- d * c where d has a range of [0,1] transform it to be > COND_EXPR(d != 0, c, 0). > The other optimization passes should prop m1 > m2. > > Bootstrapped and Regression tested on AArch64 and X86_64. > Please review the patch and let us know if its okay? What's the reason of this transform? I expect that the HW multiplier is quite fast given one operand is either zero or one and a multiplication is a gimple operation that's better handled in optimizations than COND_EXPRs which eventually expand to conditional code which would be much slower. Thanks, Richard. > Thanks, > Naveen > > 2017-06-28 Naveen H.S > > gcc > * tree-vrp.c (simplify_stmt_using_ranges): Add case for > optimizing a case of multiplication. > (simplify_mult_ops_using_ranges): New. > > gcc/testsuite > * gcc.dg/tree-ssa/vrp116.c: New Test. > >
[PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
Hi, The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0. The patch optimizes it inside tree-vrp.c when simplifying with the range inside simplify_stmt_using_ranges. If a multiply is found and either side has a range [0,1], then transform it. Ex:- d * c where d has a range of [0,1] transform it to be COND_EXPR(d != 0, c, 0). The other optimization passes should prop m1 > m2. Bootstrapped and Regression tested on AArch64 and X86_64. Please review the patch and let us know if its okay? Thanks, Naveen 2017-06-28 Naveen H.Sgcc * tree-vrp.c (simplify_stmt_using_ranges): Add case for optimizing a case of multiplication. (simplify_mult_ops_using_ranges): New. gcc/testsuite * gcc.dg/tree-ssa/vrp116.c: New Test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c new file mode 100644 index 000..d9d7b23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-vrp1" } */ + +int +f (int m1, int m2, int c) +{ + int d = m1 > m2; + int e = d * c; + return e ? m1 : m2; +} + +/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */ diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 9ca3924..291b87f 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -9146,6 +9146,46 @@ vrp_visit_phi_node (gphi *phi) return SSA_PROP_NOT_INTERESTING; } +static bool +simplify_mult_ops_using_ranges (gimple_stmt_iterator * gsi, gimple *stmt) +{ + enum tree_code rhs_code = gimple_assign_rhs_code (stmt); + tree op0, op1, lhs; + + op0 = gimple_assign_rhs1 (stmt); + op1 = gimple_assign_rhs2 (stmt); + lhs = gimple_assign_lhs (stmt); + + if (!op_with_boolean_value_range_p (op0) + && !op_with_boolean_value_range_p (op1)) +return false; + + if (rhs_code == MULT_EXPR) +{ + if (op_with_boolean_value_range_p (op0)) + { + tree t = build_int_cst (TREE_TYPE (lhs), 0); + tree tmp = build3 (COND_EXPR, TREE_TYPE (lhs), + build2 (NE_EXPR, boolean_type_node, op0, t), + op1, t); + gimple *new_assign = gimple_build_assign (lhs, tmp); + gsi_replace (gsi, new_assign, true); + return true; + } + if (op_with_boolean_value_range_p (op1)) + { + tree t = build_int_cst (TREE_TYPE (lhs), 0); + tree tmp = build3 (COND_EXPR, TREE_TYPE (lhs), + build2 (NE_EXPR, boolean_type_node, op1, t), + op0, t); + gimple *new_assign = gimple_build_assign (lhs, tmp); + gsi_replace (gsi, new_assign, true); + return true; + } +} + return false; +} + /* Simplify boolean operations if the source is known to be already a boolean. */ static bool @@ -10345,6 +10385,11 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi) return simplify_div_or_mod_using_ranges (gsi, stmt); break; + case MULT_EXPR: + if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1))) + return simplify_mult_ops_using_ranges (gsi, stmt); + break; + /* Transform ABS (X) into X or -X as appropriate. */ case ABS_EXPR: if (TREE_CODE (rhs1) == SSA_NAME