Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-07-19 Thread Jeff Law
On 07/04/2017 05:13 AM, Hurugalawadi, Naveen wrote:
> Hi,
> 
> Thanks for the review and comments on the patch.
> 
>>> The proposed patch handled both the same.  This means the pattern
>>> shouldn't use range-info but instead match a more complex
> 
> The patch handles as per the discussion by matching the pattern
> in match.pd.
> 
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?
> 
> Thanks,
> Naveen
> 
> 2017-07-04  Naveen H.S  
> 
> gcc
>   * match.pd (((m1 >/=/<= m2) * d -> (m1 >/=/<= m2) ? d : 0) New
>   pattern.
> 
> gcc/testsuite
>   * gcc.dg/tree-ssa/vrp116.c: New Test.
> 
OK for the trunk.  Sorry for the delay.

Presumably EQ/NE are handled by a different pattern?

Jeff


Re: [PING} [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-07-18 Thread Hurugalawadi, Naveen
Hi,  

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.  

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00178.html

Thanks,
Naveen




Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-07-04 Thread Hurugalawadi, Naveen
Hi,

Thanks for the review and comments on the patch.

>> The proposed patch handled both the same.  This means the pattern
>> shouldn't use range-info but instead match a more complex

The patch handles as per the discussion by matching the pattern
in match.pd.

Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?

Thanks,
Naveen

2017-07-04  Naveen H.S  

gcc
* match.pd (((m1 >/=/<= m2) * d -> (m1 >/=/<= m2) ? d : 0) New
pattern.

gcc/testsuite
* gcc.dg/tree-ssa/vrp116.c: New Test.diff --git a/gcc/match.pd b/gcc/match.pd
index 4c64b21..d914db1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1088,6 +1088,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0
 
+/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
+(for cmp (gt lt ge le)
+(simplify
+ (mult (convert (cmp @0 @1)) @2)
+  (cond (cmp @0 @1) @2 { build_zero_cst (type); })))
+
 /* For integral types with undefined overflow and C != 0 fold
x * C EQ/NE y * C into x EQ/NE y.  */
 (for cmp (eq ne)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
new file mode 100644
index 000..d9d7b23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 > m2;
+  int e = d * c;
+  return e ? m1 : m2;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-30 Thread Richard Biener
On Thu, Jun 29, 2017 at 5:10 PM, Wilco Dijkstra  wrote:
> Richard Biener wrote:
>
>> int f (int m, int c)
>> {
>>  return (m & 1) * c;
>> }
>
> This case (integer[0,1] rather than boolean input) should be transformed into 
> c & -(m & 1).

The proposed patch handled both the same.  This means the pattern
shouldn't use range-info
but instead match a more complex

(mult (convert (cmp @0 @1)) @2)

?  Note that from a gimple perspective c & -(m & 1) is more complex
than (m & 1) * c
and thus non-canonical.

> Wilco


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Wilco Dijkstra
Richard Biener wrote:

> int f (int m, int c)
> {
>  return (m & 1) * c;
> }

This case (integer[0,1] rather than boolean input) should be transformed into c 
& -(m & 1).

Wilco


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Jeff Law
On 06/29/2017 05:20 AM, Wilco Dijkstra wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>>> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
> 
>> What's the reason of this transform?  I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
> 
> Even really fast multipliers have several cycles latency, and this is 
> generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
And on some targets, just getting the arguments into the right register
bank is many cycles.  Think HPPA where integer multiply occurs in the
floating point unit.  Though I don't think that oddity should drive this
discussion.


> 
> Additionally integer multiply typically has much lower throughput than other 
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always 
> good.
I'd tend to agree in general.

jeff


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstra  wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
>> What's the reason of this transform?  I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
>
> Even really fast multipliers have several cycles latency, and this is 
> generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
>
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always 
> good.
>
> Note (m1 > m2) is also a conditional expression which will result in branches
> for floating point expressions and on some targets even for integers. Moving
> the multiply into the conditional expression generates the best code:
>
> Integer version:
> f1:
> cmpw0, 100
> csel   w0, w1, wzr, gt
> ret
> f2:
> cmpw0, 100
> cset   w0, gt
> mulw0, w0, w1
> ret
>
> Float version:
> f3:
> movi   v1.2s, #0
> cmpw0, 100
> fcsel  s0, s0, s1, gt
> ret
> f4:
> cmpw0, 100
> bgt.L8
> movi   v1.2s, #0
> fmul   s0, s0, s1  // eh???
> .L8:
> ret

But then

int f (int m, int c)
{
  return (m & 1) * c;
}
int g (int m, int c)
{
  if (m & 1 != 0)
return c;
  return 0;
}

f:
.LFB0:
.cfi_startproc
andl$1, %edi
movl%edi, %eax
imull   %esi, %eax
ret
g:
.LFB1:
.cfi_startproc
movl%edi, %eax
andl$1, %eax
cmovne  %esi, %eax
ret

anyway.  As a general comment to the patch please do it as
a pattern in match.pd

(match boolean_value_range_p
 @0
 (if (INTEGRAL_TYPE_P (type)
  && TYPE_PRECISION (type) == 1)))
(match boolean_value_range_p
 INTEGER_CST
 (if (integer_zerop (t) || integer_onep (t
(match boolean_value_range_p
 SSA_NAME
 (if (INTEGRAL_TYPE_P (type)
  && ~get_nonzero_bits (t) == 1)))

(simplify
 (mult:c boolean_value_range_p@0 @1)
 (cond @0 @1 @0))

or something like that.

Richard.

> Wilco


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Wilco Dijkstra
Richard Biener wrote:
> Hurugalawadi, Naveen wrote:
> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.

> What's the reason of this transform?  I expect that the HW multiplier
> is quite fast given one operand is either zero or one and a multiplication
> is a gimple operation that's better handled in optimizations than
> COND_EXPRs which eventually expand to conditional code which
> would be much slower.

Even really fast multipliers have several cycles latency, and this is generally
fixed irrespectively of the inputs. Maybe you were thinking about division?

Additionally integer multiply typically has much lower throughput than other 
ALU operations like conditional move - a modern CPU may have 4 ALUs
but only 1 multiplier, so removing redundant integer multiplies is always good.

Note (m1 > m2) is also a conditional expression which will result in branches
for floating point expressions and on some targets even for integers. Moving
the multiply into the conditional expression generates the best code:

Integer version:
f1:
cmpw0, 100
csel   w0, w1, wzr, gt
ret
f2:
cmpw0, 100
cset   w0, gt
mulw0, w0, w1
ret

Float version:
f3:
movi   v1.2s, #0
cmpw0, 100
fcsel  s0, s0, s1, gt
ret
f4:
cmpw0, 100
bgt.L8
movi   v1.2s, #0
fmul   s0, s0, s1  // eh???
.L8:
ret

Wilco

Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-29 Thread Richard Biener
On Thu, Jun 29, 2017 at 7:06 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
> The patch optimizes it inside tree-vrp.c when simplifying with the range
> inside simplify_stmt_using_ranges. If a multiply is found and either side
> has a range [0,1], then transform it.
>
> Ex:- d * c where d has a range of [0,1] transform it to be
> COND_EXPR(d != 0, c, 0).
> The other optimization passes should prop m1 > m2.
>
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?

What's the reason of this transform?  I expect that the HW multiplier
is quite fast given one operand is either zero or one and a multiplication
is a gimple operation that's better handled in optimizations than
COND_EXPRs which eventually expand to conditional code which
would be much slower.

Thanks,
Richard.

> Thanks,
> Naveen
>
> 2017-06-28  Naveen H.S  
>
> gcc
> * tree-vrp.c (simplify_stmt_using_ranges): Add case for
> optimizing a case of multiplication.
> (simplify_mult_ops_using_ranges): New.
>
> gcc/testsuite
> * gcc.dg/tree-ssa/vrp116.c: New Test.
>
>


[PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-06-28 Thread Hurugalawadi, Naveen
Hi, 

The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.

The patch optimizes it inside tree-vrp.c when simplifying with the range
inside simplify_stmt_using_ranges. If a multiply is found and either side
has a range [0,1], then transform it.

Ex:- d * c where d has a range of [0,1] transform it to be
COND_EXPR(d != 0, c, 0).
The other optimization passes should prop m1 > m2.

Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?

Thanks,
Naveen

2017-06-28  Naveen H.S  

gcc
* tree-vrp.c (simplify_stmt_using_ranges): Add case for
optimizing a case of multiplication.
(simplify_mult_ops_using_ranges): New.   

gcc/testsuite
* gcc.dg/tree-ssa/vrp116.c: New Test.


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
new file mode 100644
index 000..d9d7b23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 > m2;
+  int e = d * c;
+  return e ? m1 : m2;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 9ca3924..291b87f 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9146,6 +9146,46 @@ vrp_visit_phi_node (gphi *phi)
   return SSA_PROP_NOT_INTERESTING;
 }
 
+static bool
+simplify_mult_ops_using_ranges (gimple_stmt_iterator * gsi, gimple *stmt)
+{
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+  tree op0, op1, lhs;
+
+  op0 = gimple_assign_rhs1 (stmt);
+  op1 = gimple_assign_rhs2 (stmt);
+  lhs = gimple_assign_lhs (stmt);
+
+  if (!op_with_boolean_value_range_p (op0)
+  && !op_with_boolean_value_range_p (op1))
+return false;
+
+  if (rhs_code == MULT_EXPR)
+{
+  if (op_with_boolean_value_range_p (op0))
+	{
+	  tree t = build_int_cst (TREE_TYPE (lhs), 0);
+	  tree tmp = build3 (COND_EXPR, TREE_TYPE (lhs),
+			 build2 (NE_EXPR, boolean_type_node, op0, t),
+			 op1, t);
+	  gimple *new_assign = gimple_build_assign (lhs, tmp);
+	  gsi_replace (gsi, new_assign, true);
+	  return true;
+	}
+  if (op_with_boolean_value_range_p (op1))
+	{
+	  tree t = build_int_cst (TREE_TYPE (lhs), 0);
+	  tree tmp = build3 (COND_EXPR, TREE_TYPE (lhs),
+			 build2 (NE_EXPR, boolean_type_node, op1, t),
+			 op0, t);
+	  gimple *new_assign = gimple_build_assign (lhs, tmp);
+	  gsi_replace (gsi, new_assign, true);
+	  return true;
+	}
+}
+  return false;
+}
+
 /* Simplify boolean operations if the source is known
to be already a boolean.  */
 static bool
@@ -10345,6 +10385,11 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
 	return simplify_div_or_mod_using_ranges (gsi, stmt);
 	  break;
 
+	case MULT_EXPR:
+	  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
+	return simplify_mult_ops_using_ranges (gsi, stmt);
+	  break;
+
   /* Transform ABS (X) into X or -X as appropriate.  */
 	case ABS_EXPR:
 	  if (TREE_CODE (rhs1) == SSA_NAME