Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-23 Thread Richard Guenther
On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs  wrote:
> There are many cases where the widening_mult pass does not recognise
> widening multiply-and-accumulate cases simply because there is a type
> conversion step between the multiply and add statements.
>
> This patch should rectify that simply by looking beyond those conversions.

That's surely wrong for (int)(short)int_var.  You have to constrain
the conversions
you look through properly.

Richard.

> OK?
>
> Andrew
>
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-23 Thread Janis Johnson
On 06/23/2011 07:40 AM, Andrew Stubbs wrote:

+++ b/gcc/testsuite/gcc.target/arm/umlal-1.c
+/* { dg-final { scan-assembler "umlal" } } */

Don't use the name of the instruction as the test name or the scan
will always pass, because the file name shows up in assembly output.

See http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01823.html for a
proposed effective target that can be used in this test.

Janis


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Andrew Stubbs

On 23/06/11 17:26, Richard Guenther wrote:

On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs  wrote:

There are many cases where the widening_mult pass does not recognise
widening multiply-and-accumulate cases simply because there is a type
conversion step between the multiply and add statements.

This patch should rectify that simply by looking beyond those conversions.


That's surely wrong for (int)(short)int_var.  You have to constrain
the conversions
you look through properly.


To be clear, it only skips past NOP_EXPR. Is it not the case that what 
you're describing would need a CONVERT_EXPR?


Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Richard Guenther
On Fri, Jun 24, 2011 at 10:05 AM, Andrew Stubbs
 wrote:
> On 23/06/11 17:26, Richard Guenther wrote:
>>
>> On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs
>>  wrote:
>>>
>>> There are many cases where the widening_mult pass does not recognise
>>> widening multiply-and-accumulate cases simply because there is a type
>>> conversion step between the multiply and add statements.
>>>
>>> This patch should rectify that simply by looking beyond those
>>> conversions.
>>
>> That's surely wrong for (int)(short)int_var.  You have to constrain
>> the conversions
>> you look through properly.
>
> To be clear, it only skips past NOP_EXPR. Is it not the case that what
> you're describing would need a CONVERT_EXPR?

NOP_EXPR is the same as CONVERT_EXPR.

Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Stubbs, Andrew
On 24/06/11 09:28, Richard Guenther wrote:
>> >  To be clear, it only skips past NOP_EXPR. Is it not the case that what
>> >  you're describing would need a CONVERT_EXPR?
> NOP_EXPR is the same as CONVERT_EXPR.

Are you sure?

I thought this was safe because the internals manual says:

   NOP_EXPR
   These nodes are used to represent conversions that do not require any
   code-generation 

   CONVERT_EXPR
   These nodes are similar to NOP_EXPRs, but are used in those
   situations where code may need to be generated 

So, I tried this example:

int
foo (int a, short b, short c)
{
   int bc = b * c;
   return a + (short)bc;
}

Both before and after my patch, GCC gives:

 mul r2, r1, r2
 sxtah   r0, r0, r2

(where, SXTAH means sign-extend the third operand from HImode to SImode 
and add to the second operand.)

The dump after the widening_mult pass is:

foo (int a, short int b, short int c)
{
   int bc;
   int D.2018;
   short int D.2017;
   int D.2016;
   int D.2015;
   int D.2014;

:
   D.2014_2 = (int) b_1(D);
   D.2015_4 = (int) c_3(D);
   bc_5 = b_1(D) w* c_3(D);
   D.2017_6 = (short int) bc_5;
   D.2018_7 = (int) D.2017_6;
   D.2016_9 = D.2018_7 + a_8(D);
   return D.2016_9;

}

Where you can clearly see that the addition has not been recognised as a 
multiply-and-accumulate.

When I step through convert_plusminus_to_widen, I can see that the 
reason it has not matched is because "D.2017_6 = (short int) bc_5" is 
encoded with a CONVERT_EXPR, just as the manual said it would be.

So, according to the manual, and my (admittedly limited) experiments, 
skipping over NOP_EXPR does appear to be safe.

But you say that it isn't safe. So now I'm confused. :(

I can certainly add checks to make sure that the skipped operations 
actually don't make any important changes to the value, but do I need to?

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Richard Guenther
On Fri, Jun 24, 2011 at 3:46 PM, Stubbs, Andrew
 wrote:
> On 24/06/11 09:28, Richard Guenther wrote:
>>> >  To be clear, it only skips past NOP_EXPR. Is it not the case that what
>>> >  you're describing would need a CONVERT_EXPR?
>> NOP_EXPR is the same as CONVERT_EXPR.
>
> Are you sure?

Yes, definitely.  They are synonyms of each other (an unfinished merging
process), the usual check for them is via CONVERT_EXPR_P.

> I thought this was safe because the internals manual says:
>
>   NOP_EXPR
>   These nodes are used to represent conversions that do not require any
>   code-generation 
>
>   CONVERT_EXPR
>   These nodes are similar to NOP_EXPRs, but are used in those
>   situations where code may need to be generated 

Which is wrong (sorry).

> So, I tried this example:
>
> int
> foo (int a, short b, short c)
> {
>   int bc = b * c;
>   return a + (short)bc;
> }
>
> Both before and after my patch, GCC gives:
>
>         mul     r2, r1, r2
>         sxtah   r0, r0, r2
>
> (where, SXTAH means sign-extend the third operand from HImode to SImode
> and add to the second operand.)
>
> The dump after the widening_mult pass is:
>
> foo (int a, short int b, short int c)
> {
>   int bc;
>   int D.2018;
>   short int D.2017;
>   int D.2016;
>   int D.2015;
>   int D.2014;
>
> :
>   D.2014_2 = (int) b_1(D);
>   D.2015_4 = (int) c_3(D);
>   bc_5 = b_1(D) w* c_3(D);
>   D.2017_6 = (short int) bc_5;
>   D.2018_7 = (int) D.2017_6;
>   D.2016_9 = D.2018_7 + a_8(D);
>   return D.2016_9;
>
> }
>
> Where you can clearly see that the addition has not been recognised as a
> multiply-and-accumulate.
>
> When I step through convert_plusminus_to_widen, I can see that the
> reason it has not matched is because "D.2017_6 = (short int) bc_5" is
> encoded with a CONVERT_EXPR, just as the manual said it would be.

A NOP_EXPR in this place would be valid as well.  The merging hasn't
been completed and at least the C frontend still generates CONVERT_EXPRs
in some cases.

> So, according to the manual, and my (admittedly limited) experiments,
> skipping over NOP_EXPR does appear to be safe.
>
> But you say that it isn't safe. So now I'm confused. :(
>
> I can certainly add checks to make sure that the skipped operations
> actually don't make any important changes to the value, but do I need to?

Yes.

Thanks,
Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Stubbs, Andrew
On 24/06/11 16:47, Richard Guenther wrote:
>> >  I can certainly add checks to make sure that the skipped operations
>> >  actually don't make any important changes to the value, but do I need to?
> Yes.

Ok, I'll go away and do that then.

BTW, I see useless_type_conversion_p, but that's not quite what I want. 
Is there an equivalent existing function to determine whether a 
conversion changes the logical/arithmetic meaning of a type?

I mean, conversion to a wider mode is not "useless", but it is harmless, 
whereas conversion to a narrower mode may truncate the value.

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-25 Thread Richard Guenther
On Fri, Jun 24, 2011 at 6:58 PM, Stubbs, Andrew
 wrote:
> On 24/06/11 16:47, Richard Guenther wrote:
>>> >  I can certainly add checks to make sure that the skipped operations
>>> >  actually don't make any important changes to the value, but do I need to?
>> Yes.
>
> Ok, I'll go away and do that then.
>
> BTW, I see useless_type_conversion_p, but that's not quite what I want.
> Is there an equivalent existing function to determine whether a
> conversion changes the logical/arithmetic meaning of a type?
>
> I mean, conversion to a wider mode is not "useless", but it is harmless,
> whereas conversion to a narrower mode may truncate the value.

Well, you have to decide that for the concrete situation based on
the signedness and precision of the types involved.  All such
conversions change the logical/arithmetic meaning of a type if
seen in the right context.

Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Andrew Stubbs

On 24/06/11 16:47, Richard Guenther wrote:

I can certainly add checks to make sure that the skipped operations
>  actually don't make any important changes to the value, but do I need to?

Yes.


OK, how about this patch?

I've added checks to make sure the value is not truncated at any point.

I've also changed the test cases to address Janis' comments.

Andrew
2011-06-28  Andrew Stubbs  

	gcc/
	* gimple.h (tree_ssa_harmless_type_conversion): New prototype.
	(tree_ssa_strip_harmless_type_conversions): New prototype.
	(harmless_type_conversion_p): New prototype.
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for
	multiply statement beyond no-op conversion statements.
	* tree-ssa.c (harmless_type_conversion_p): New function.
	(tree_ssa_harmless_type_conversion): New function.
	(tree_ssa_strip_harmless_type_conversions): New function.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1090,8 +1090,11 @@ extern bool validate_gimple_arglist (const_gimple, ...);
 
 /* In tree-ssa.c  */
 extern bool tree_ssa_useless_type_conversion (tree);
+extern bool tree_ssa_harmless_type_conversion (tree);
 extern tree tree_ssa_strip_useless_type_conversions (tree);
+extern tree tree_ssa_strip_harmless_type_conversions (tree);
 extern bool useless_type_conversion_p (tree, tree);
+extern bool harmless_type_conversion_p (tree, tree);
 extern bool types_compatible_p (tree, tree);
 
 /* Return the code for GIMPLE statement G.  */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2117,23 +2117,19 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
-{
-  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-  if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
-}
-  else
+  if (TREE_CODE (rhs1) != SSA_NAME
+  || TREE_CODE (rhs2) != SSA_NAME)
 return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
-{
-  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-  if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
-}
-  else
-return false;
+  rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1);
+  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+  if (is_gimple_assign (rhs1_stmt))
+rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+
+  rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2);
+  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+  if (is_gimple_assign (rhs2_stmt))
+rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
 {
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1484,6 +1484,33 @@ useless_type_conversion_p (tree outer_type, tree inner_type)
   return false;
 }
 
+/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will
+   not alter the arithmetic meaning of a type, otherwise return false.
+
+   For example, widening an integer type leaves the value unchanged,
+   but narrowing an integer type can cause truncation.
+
+   Note that switching between signed and unsigned modes doesn't change
+   the underlying representation, and so is harmless.
+
+   This function is not yet a complete definition of what is harmless
+   but should reject everything that is not.  */
+
+bool
+harmless_type_conversion_p (tree outer_type, tree inner_type)
+{
+  /* If it's useless, it's also harmless.  */
+  if (useless_type_conversion_p (outer_type, inner_type))
+return true;
+
+  if (INTEGRAL_TYPE_P (inner_type)
+  && INTEGRAL_TYPE_P (outer_type)
+  && TYPE_PRECISION (inner_type) <= TYPE_PRECISION (outer_type))
+return true;
+
+  return false;
+}
+
 /* Return true if a conversion from either type of TYPE1 and TYPE2
to the other is not required.  Otherwise return false.  */
 
@@ -1515,6 +1542,29 @@ tree_ssa_useless_type_conversion (tree expr)
   return false;
 }
 
+/* Return true if EXPR is a harmless type conversion, otherwise return
+   false.  */
+
+bool
+tree_ssa_harmless_type_conversion (tree expr)
+{
+  gimple stmt;
+
+  if (TREE_CODE (expr) != SSA_NAME)
+return false;
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
+return false;
+
+  return harmless_type_conversion_p (TREE_TYPE (gi

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Richard Guenther
On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs  wrote:
> On 24/06/11 16:47, Richard Guenther wrote:
>>>
>>> I can certainly add checks to make sure that the skipped operations
>>> >  actually don't make any important changes to the value, but do I need
>>> > to?
>>
>> Yes.
>
> OK, how about this patch?

I'd name the predicate value_preserving_conversion_p which I think
is what you mean.  harmless isn't really descriptive.

Note that you include non-value-preserving conversions, namely
int -> unsigned int.  Don't dispatch to useless_type_conversion_p,
it's easy to enumerate which conversions are value-preserving.

Don't try to match the tree_ssa_useless_* set of functions, instead
put the value_preserving_conversion_p predicate in tree.[ch] and
a suitable function using it in tree-ssa-math-opts.c.

Thanks,
Richard.

> I've added checks to make sure the value is not truncated at any point.
>
> I've also changed the test cases to address Janis' comments.
>
> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Michael Matz
Hi,

On Tue, 28 Jun 2011, Richard Guenther wrote:

> I'd name the predicate value_preserving_conversion_p which I think is 
> what you mean.  harmless isn't really descriptive.
> 
> Note that you include non-value-preserving conversions, namely int -> 
> unsigned int.

It seems that Andrew really does want to accept them.  If so 
value_preserving_conversion_p would be the wrong name.  It seems to me he 
wants to accept those conversions that make it possible to retrieve the 
old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the 
to-be-named property.  bits_preserving?  Hmm.


Ciao,
Michael.


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Andrew Stubbs

On 28/06/11 16:53, Michael Matz wrote:

On Tue, 28 Jun 2011, Richard Guenther wrote:

I'd name the predicate value_preserving_conversion_p which I think is
what you mean.  harmless isn't really descriptive.

Note that you include non-value-preserving conversions, namely int ->
unsigned int.


It seems that Andrew really does want to accept them.  If so
value_preserving_conversion_p would be the wrong name.  It seems to me he
wants to accept those conversions that make it possible to retrieve the
old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the
to-be-named property.  bits_preserving?  Hmm.


What I want (and I'm not totally clear on what this actually means) is 
to be able to optimize all the cases where the end result will be the 
same as the compiler produces now (using multiple multiply, shift, and 
add operations).


Ok, so that's an obvious statement, but the point is that, right now, 
the compiler does nothing special when you cast from int -> unsigned 
int, or vice-versa, and I want to capture that somehow. There are some 
exceptions, I'm sure, but what are they?


What is clear is that I don't want to just assume that casting from one 
signedness to the other is a show-stopper.


For example:

  unsigned long long
  foo (unsigned long long a, unsigned char b, unsigned char c)
  {
return a + b * c;
  }

This appears to be entirely unsigned maths with plenty of spare 
precision, and therefore a dead cert for any SI->DI 
multiply-and-accumulate instruction, but not so - it is represented 
internally as:


  signed int tmp = (signed int)a * (signed int)b;
  unsigned long long result = a + (unsigned long long)tmp;

Notice the unexpected signed int in the middle! I need to be able to get 
past that to optimize this properly.


I've tried various test cases in which I cast signedness and mode around 
a bit, and so far it appear to perform safely, but probably I'm not be 
cunning enough.


Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Michael Matz
Hi,

On Tue, 28 Jun 2011, Andrew Stubbs wrote:

> What I want (and I'm not totally clear on what this actually means) is 
> to be able to optimize all the cases where the end result will be the 
> same as the compiler produces now (using multiple multiply, shift, and 
> add operations).

Okay, then you really want to look through value-preserving conversions.

> Ok, so that's an obvious statement, but the point is that, right now, 
> the compiler does nothing special when you cast from int -> unsigned 
> int, or vice-versa, and I want to capture that somehow. There are some 
> exceptions, I'm sure, but what are they?

Same-sized signed <-> unsigned conversions aren't value preserving:
  unsigned char c = 255; (signed char)c == -1; 255 != -1
unsigned -> larger sized signed is value preserving
  unsigned char c = 255; (signed short)c == 255;
signed -> unsigned never is value preserving

> multiply-and-accumulate instruction, but not so - it is represented 
> internally as:
> 
>   signed int tmp = (signed int)a * (signed int)b;
>   unsigned long long result = a + (unsigned long long)tmp;
> 
> Notice the unexpected signed int in the middle!

Yeah, the C standard requires this.

> I need to be able to get past that to optimize this properly.

Then you're lucky because unsigned char -> signed int is an embedding, 
hence value preserving.  I thought we had a predicate for such conversions 
already, but seems I was wrong.  So, create it as Richi said, but 
enumerate explicitely the cases you want to handle, and include only those 
that really are value preserving.


Ciao,
Michael.


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 28/06/11 17:37, Michael Matz wrote:
>> What I want (and I'm not totally clear on what this actually means) is
>> >  to be able to optimize all the cases where the end result will be the
>> >  same as the compiler produces now (using multiple multiply, shift, and
>> >  add operations).
> Okay, then you really want to look through value-preserving conversions.
>
>> >  Ok, so that's an obvious statement, but the point is that, right now,
>> >  the compiler does nothing special when you cast from int ->  unsigned
>> >  int, or vice-versa, and I want to capture that somehow. There are some
>> >  exceptions, I'm sure, but what are they?
> Same-sized signed<->  unsigned conversions aren't value preserving:
>unsigned char c = 255; (signed char)c == -1; 255 != -1
> unsigned ->  larger sized signed is value preserving
>unsigned char c = 255; (signed short)c == 255;
> signed ->  unsigned never is value preserving

OK, so I've tried implementing this, and I find I hit against a problem:

Given this test case:

   unsigned long long
   foo (unsigned long long a, signed char *b, signed char *c)
   {
 return a + *b * *c;
   }

Those rules say that it should not be suitable for optimization because 
there's an implicit cast from signed int to unsigned long long.

Without any widening multiplications allowed, GCC gives this code (for ARM):

   ldrsb   r2, [r2, #0]
   ldrsb   r3, [r3, #0]
   mul r2, r2, r3
   addsr0, r0, r2
   adc r1, r1, r2, asr #31

This is exactly what a signed widening multiply-and-accumulate with 
smlalbb would have done!

OK, so the types in the testcase are a bit contrived, but my point is 
that I want to be able to use the widening-mult instructions everywhere 
that they would produce the same output and gcc would otherwise, and gcc 
just doesn't seem that interested in signed<->unsigned conversions.

So, I'm happy to put in checks to ensure that truncations are not 
ignore, but I'm really not sure what's the right thing to do with the 
extends and signedness switches.

Any suggestions?

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Richard Guenther
On Fri, Jul 1, 2011 at 1:58 PM, Stubbs, Andrew  wrote:
> On 28/06/11 17:37, Michael Matz wrote:
>>> What I want (and I'm not totally clear on what this actually means) is
>>> >  to be able to optimize all the cases where the end result will be the
>>> >  same as the compiler produces now (using multiple multiply, shift, and
>>> >  add operations).
>> Okay, then you really want to look through value-preserving conversions.
>>
>>> >  Ok, so that's an obvious statement, but the point is that, right now,
>>> >  the compiler does nothing special when you cast from int ->  unsigned
>>> >  int, or vice-versa, and I want to capture that somehow. There are some
>>> >  exceptions, I'm sure, but what are they?
>> Same-sized signed<->  unsigned conversions aren't value preserving:
>>    unsigned char c = 255; (signed char)c == -1; 255 != -1
>> unsigned ->  larger sized signed is value preserving
>>    unsigned char c = 255; (signed short)c == 255;
>> signed ->  unsigned never is value preserving
>
> OK, so I've tried implementing this, and I find I hit against a problem:
>
> Given this test case:
>
>   unsigned long long
>   foo (unsigned long long a, signed char *b, signed char *c)
>   {
>     return a + *b * *c;
>   }
>
> Those rules say that it should not be suitable for optimization because
> there's an implicit cast from signed int to unsigned long long.
>
> Without any widening multiplications allowed, GCC gives this code (for ARM):
>
>   ldrsb   r2, [r2, #0]
>   ldrsb   r3, [r3, #0]
>   mul     r2, r2, r3
>   adds    r0, r0, r2
>   adc     r1, r1, r2, asr #31
>
> This is exactly what a signed widening multiply-and-accumulate with
> smlalbb would have done!
>
> OK, so the types in the testcase are a bit contrived, but my point is
> that I want to be able to use the widening-mult instructions everywhere
> that they would produce the same output and gcc would otherwise, and gcc
> just doesn't seem that interested in signed<->unsigned conversions.
>
> So, I'm happy to put in checks to ensure that truncations are not
> ignore, but I'm really not sure what's the right thing to do with the
> extends and signedness switches.
>
> Any suggestions?

Well - some operations work the same on both signedness if you
just care about the twos-complement result.  This includes
multiplication (but not for example division).  For this special
case I suggest to not bother trying to invent a generic predicate
but do something local in tree-ssa-math-opts.c.

Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 01:58 PM, Stubbs, Andrew wrote:

Given this test case:

unsigned long long
foo (unsigned long long a, signed char *b, signed char *c)
{
  return a + *b * *c;
}

Those rules say that it should not be suitable for optimization because
there's an implicit cast from signed int to unsigned long long.


Got it now!  Casts from signed to unsigned are not value-preserving, but 
they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the 
same result bit-by-bit as the s64 result.  The fact that s64 has an 
implicit ... in front, while an u64 has an implicit ... does not 
matter.


Is this the meaning of the predicate you want?  I think so, based on the 
discussion, but it's hard to say without seeing the cases enumerated 
(i.e. a patch).


However, perhaps there is a catch.  We can do the following thought 
experiment.  What would happen if you had multiple widening multiplies? 
 Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 
128-bit unsigned?  I believe in this case you couldn't optimize 8-bit 
signed to 128-bit unsigned.  Would your code do it?


Paolo


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 13:33, Paolo Bonzini wrote:
> Got it now! Casts from signed to unsigned are not value-preserving, but
> they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
> same result bit-by-bit as the s64 result. The fact that s64 has an
> implicit ... in front, while an u64 has an implicit ... does not
> matter.

But, the ... and ... are not implicit. They are very real, and 
if applied incorrectly will change the result, I think.

> Is this the meaning of the predicate you want? I think so, based on the
> discussion, but it's hard to say without seeing the cases enumerated
> (i.e. a patch).

The purpose of this predicate is to determine whether any type 
conversions that occur between the output of a widening multiply, and 
the input of an addition have any bearing on the end result.

We know what the effective output type of the multiply is (the size is 
2x the input type, and the signed if either one of the inputs in 
signed), and we know what the input type of the addition is, but any 
amount of junk can lie in between. The problem is determining if it *is* 
junk.

In an ideal world there would only be two cases to consider:

   1. No conversion needed.

   2. A single sign-extend or zero-extend (according to the type of the 
inputs) to match the input size of the addition.

Anything else would be unsuitable for optimization. Of course, it's 
never that simple, but it should still be possible to boil down a list 
of conversions to one of these cases, if it's valid.

The signedness of the input to the addition is not significant - the 
code would be the same either way. But, I is important not to try to 
zero-extend something that started out signed, and not to sign-extend 
something that started out unsigned.

> However, perhaps there is a catch. We can do the following thought
> experiment. What would happen if you had multiple widening multiplies?
> Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
> unsigned? I believe in this case you couldn't optimize 8-bit signed to
> 128-bit unsigned. Would your code do it?

My code does not attempt to combine multiple multiplies. In any case, if 
you have two multiplications, surely you have at least three input 
values, so they can't be combined?

It does attempt to combine a multiply and an addition, where a suitable 
madd* insn is available. (This is not new; I'm just trying to do it in 
more cases.)

I have considered the case where you have "(a * b) + (c * d)", but have 
not yet coded anything for it. At present, the code will simply choose 
whichever multiply happens to find itself the first input operand of the 
plus, and ignores the other, even if the first turns out not to be a 
suitable candidate.

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 03:30 PM, Stubbs, Andrew wrote:

>  However, perhaps there is a catch. We can do the following thought
>  experiment. What would happen if you had multiple widening multiplies?
>  Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
>  unsigned? I believe in this case you couldn't optimize 8-bit signed to
>  128-bit unsigned. Would your code do it?

My code does not attempt to combine multiple multiplies. In any case, if
you have two multiplications, surely you have at least three input
values, so they can't be combined?


What about (u128)c + (u64)((s8)a * (s8)b)?  You cannot convert this to 
(u128)c + (u128)((s8)a * (s8)b).


Paolo


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 15:40, Paolo Bonzini wrote:
> On 07/01/2011 03:30 PM, Stubbs, Andrew wrote:
>>> > However, perhaps there is a catch. We can do the following thought
>>> > experiment. What would happen if you had multiple widening multiplies?
>>> > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to
>>> 128-bit
>>> > unsigned? I believe in this case you couldn't optimize 8-bit signed to
>>> > 128-bit unsigned. Would your code do it?
>> My code does not attempt to combine multiple multiplies. In any case, if
>> you have two multiplications, surely you have at least three input
>> values, so they can't be combined?
>
> What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
> (u128)c + (u128)((s8)a * (s8)b).

Oh I see, sorry. Yes, that's exactly what I'm trying to do here.

No, wait, I don't see. Where are these multiple widening multiplies 
you're talking about? I only see one multiply?

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 14:30, Stubbs, Andrew wrote:
>> Got it now! Casts from signed to unsigned are not value-preserving, but
>> >  they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
>> >  same result bit-by-bit as the s64 result. The fact that s64 has an
>> >  implicit ... in front, while an u64 has an implicit ... does not
>> >  matter.
> But, the ... and ... are not implicit. They are very real, and
> if applied incorrectly will change the result, I think.

Wait, I'm clearly confused 

When I try a s32->u64 conversion, the expand pass generates a 
sign_extend insn.

Clearly it's the source type that determines the extension type, not the 
destination type ... and I'm a dunce!

Thanks :)

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Paolo Bonzini

On 07/01/2011 04:55 PM, Stubbs, Andrew wrote:

>
>  What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
>  (u128)c + (u128)((s8)a * (s8)b).

Oh I see, sorry. Yes, that's exactly what I'm trying to do here.

No, wait, I don't see. Where are these multiple widening multiplies
you're talking about? I only see one multiply?


I meant one multiplication with multiple widening steps.  Not clear at 
all, sorry.


Paolo


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Bernd Schmidt
On 06/28/11 18:14, Andrew Stubbs wrote:

>   unsigned long long
>   foo (unsigned long long a, unsigned char b, unsigned char c)
>   {
> return a + b * c;
>   }
> 
> This appears to be entirely unsigned maths with plenty of spare
> precision, and therefore a dead cert for any SI->DI
> multiply-and-accumulate instruction, but not so - it is represented
> internally as:
> 
>   signed int tmp = (signed int)a * (signed int)b;
>   unsigned long long result = a + (unsigned long long)tmp;
> 
> Notice the unexpected signed int in the middle! I need to be able to get
> past that to optimize this properly.

Since both inputs are positive in a signed int (they must be, being cast
from a smaller unsigned value), you can infer that it does not matter
whether you treat the result of the multiplication as a signed or an
unsigned value. It is positive in any case.

So, I think the thing to test is: if the accumulate step requires
widening the result of the multiplication, either the cast must be value
preserving (widening unsigned to signed), or you must be able to prove
that the multiplication produces a positive result.

If the accumulate step just casts the multiplication result from signed
to unsigned, keeping the precision the same, you can ignore the cast
since the addition is unaffected by it.


Bernd


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 16:54, Paolo Bonzini wrote:
> On 07/01/2011 04:55 PM, Stubbs, Andrew wrote:
>>> >
>>> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
>>> > (u128)c + (u128)((s8)a * (s8)b).
>> Oh I see, sorry. Yes, that's exactly what I'm trying to do here.
>>
>> No, wait, I don't see. Where are these multiple widening multiplies
>> you're talking about? I only see one multiply?
>
> I meant one multiplication with multiple widening steps. Not clear at
> all, sorry.

Yes, I see now, the whole purpose of my patch set is widening by more 
than one mode.

The case of the multiply-and-accumulate is the only way there can be 
more than one step though. Widening multiplies themselves are always 
handled as one unit.

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-04 Thread Andrew Stubbs

On 01/07/11 13:25, Richard Guenther wrote:

Well - some operations work the same on both signedness if you
just care about the twos-complement result.  This includes
multiplication (but not for example division).  For this special
case I suggest to not bother trying to invent a generic predicate
but do something local in tree-ssa-math-opts.c.


OK, here's my updated patch.

I've taken the view that we *know* what size and signedness the result 
of the multiplication is, and we know what size the input to the 
addition must be, so all the check has to do is make sure it does that 
same conversion, even if by a roundabout means.


What I hadn't grasped before is that when extending a value it's the 
source type that is significant, not the destination, so the checks are 
not as complex as I had thought.


So, this patch adds a test to ensure that:

 1. the type is not truncated so far that we lose any information; and

 2. the type is only ever extended in the proper signedness.

Also, just to be absolutely sure, I've also added a little bit of logic 
to permit extends that are then undone by a truncate. I'm really not 
sure what guarantees there are about what sort of cast sequences can 
exist? Is this necessary? I haven't managed to coax it to generated any 
examples of extends followed by truncates myself, but in any case, it's 
hardly any code and it'll make sure it's future proofed.


OK?

Andrew
2011-06-28  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (valid_types_for_madd_p): New function.
	(convert_plusminus_to_widen): Use valid_types_for_madd_p to
	identify optimization candidates.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

---
 .../gcc/testsuite/gcc.target/arm/no-wmla-1.c   |   11 ++
 .../gcc/testsuite/gcc.target/arm/wmul-5.c  |   10 ++
 src/gcc-mainline/gcc/tree-ssa-math-opts.c  |  112 ++--
 3 files changed, 123 insertions(+), 10 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c

diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
new file mode 100644
index 000..17f7427
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
new file mode 100644
index 000..65c43e3
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
diff --git a/src/gcc-mainline/gcc/tree-ssa-math-opts.c b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
index d55ba57..5ef7bb4 100644
--- a/src/gcc-mainline/gcc/tree-ssa-math-opts.c
+++ b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
@@ -2085,6 +2085,78 @@ convert_mult_to_widen (gimple stmt)
   return true;
 }
 
+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,
+   and then the convertions between the output of the multiply, and
+   the input to an addition EXPR, to ensure that they are compatible with
+   a widening multiply-and-accumulate.
+
+   This function assumes that expr is a valid string of conversion expressions
+   terminated by a multiplication.
+
+   This function tries NOT to make any (fragile) assumptions about what
+   sequence of conversions can exist in the input.  */
+
+static bool
+valid_types_for_madd_p (tree type1, tree type2, tree expr)
+{
+  gimple stmt, prev_stmt;
+  enum tree_code code, prev_code;
+  tree prev_expr, type, prev_type;
+  int bitsize, prev_bitsize, initial_bitsize, min_bitsize;
+  bool initial_unsigned;
+
+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+  code = gimple_assign_rhs_code (stmt);
+  type = TREE_TYPE (expr);
+  bitsize = TYPE_PRECISION (type);
+  min_bitsize = bitsize;
+
+  if (code == MULT_EXPR || code == WIDEN_MULT_EXPR)
+return true;
+
+  if (!INTEGRAL_TYPE_P (type)
+  || TYPE_PRECISION (type) < initial_bitsize)
+return false;
+
+  /* Step through the conversions backwards.  */
+  while (true)
+{
+  prev_expr = gimple_assign_rhs1 (stmt);
+  prev_stmt = SSA_NAME_DEF_STMT (prev_expr);
+  prev_code = gimple_assign_rhs_code (prev_stmt);
+  prev_type = TREE_TYPE (prev_expr);
+  prev_bitsize = TYPE_PRECISION (prev_type);
+
+  if (prev_code == MULT_EXPR || prev_code == WIDEN

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-07 Thread Richard Guenther
On Mon, Jul 4, 2011 at 4:23 PM, Andrew Stubbs  wrote:
> On 01/07/11 13:25, Richard Guenther wrote:
>>
>> Well - some operations work the same on both signedness if you
>> just care about the twos-complement result.  This includes
>> multiplication (but not for example division).  For this special
>> case I suggest to not bother trying to invent a generic predicate
>> but do something local in tree-ssa-math-opts.c.
>
> OK, here's my updated patch.
>
> I've taken the view that we *know* what size and signedness the result of
> the multiplication is, and we know what size the input to the addition must
> be, so all the check has to do is make sure it does that same conversion,
> even if by a roundabout means.
>
> What I hadn't grasped before is that when extending a value it's the source
> type that is significant, not the destination, so the checks are not as
> complex as I had thought.
>
> So, this patch adds a test to ensure that:
>
>  1. the type is not truncated so far that we lose any information; and
>
>  2. the type is only ever extended in the proper signedness.
>
> Also, just to be absolutely sure, I've also added a little bit of logic to
> permit extends that are then undone by a truncate. I'm really not sure what
> guarantees there are about what sort of cast sequences can exist? Is this
> necessary? I haven't managed to coax it to generated any examples of extends
> followed by truncates myself, but in any case, it's hardly any code and
> it'll make sure it's future proofed.
>
> OK?

I think you should assume that series of widenings, (int)(short)char_variable
are already combined.  Thus I believe you only need to consider a single
conversion in valid_types_for_madd_p.

+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,

what are those types?  Is TYPE1 the result type and TYPE2 the
operand type?  If so why

+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);

this?!

+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);

that also looks odd.  So probably TYPE1 isn't the result type.  If they
are the types of the operands, then what operand is EXPR for?

I didn't look at the actual implementation of the function because of the
lack of understanding of the inputs.

-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+   TREE_CODE (tmp) == SSA_NAME
+   && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+   tmp = gimple_assign_rhs1 (rhs1_stmt))
 {
-  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-  if (is_gimple_assign (rhs1_stmt))
-   rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+  rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+  if (!is_gimple_assign (rhs1_stmt))
+   break;
+  rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
 }

the result looks a bit like spaghetti code ... and lacks a comment
on what it is trying to do.  It looks like it sees through an arbitrary
number of conversions - possibly ones that will make the
macc invalid, as for (short)int-var * short-var + int-var.  So you'll
be pessimizing code by doing that unconditionally.  As I said
above you should at most consider one intermediate conversion.

I believe the code should be arranged such that only valid
conversions are looked through in the first place.  Valid, in
that the resulting types should still match the macc constraints.

Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-07 Thread Andrew Stubbs

On 07/07/11 10:58, Richard Guenther wrote:

I think you should assume that series of widenings, (int)(short)char_variable
are already combined.  Thus I believe you only need to consider a single
conversion in valid_types_for_madd_p.


Hmm, I'm not so sure. I'll look into it a bit further.


+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,

what are those types?  Is TYPE1 the result type and TYPE2 the
operand type?  If so why


TYPE1 and TYPE2 are the inputs to the multiply. I thought I explained 
that in the comment before the function.



+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);

this?!


The result of the multiply will be this many bits wide. This may be 
narrower than the type that holds it.


E.g., 16-bit * 8-bit gives a result at most 24-bits wide, which will 
usually be held in a 32- or 64-bit variable.



+  initial_unsigned = TYPE_UNSIGNED (type1)&&  TYPE_UNSIGNED (type2);

that also looks odd.  So probably TYPE1 isn't the result type.  If they
are the types of the operands, then what operand is EXPR for?


EXPR, as the comment says, is the addition that follows the multiply.


-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+   TREE_CODE (tmp) == SSA_NAME
+&&  (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+   tmp = gimple_assign_rhs1 (rhs1_stmt))
  {
-  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-  if (is_gimple_assign (rhs1_stmt))
-   rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+  rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+  if (!is_gimple_assign (rhs1_stmt))
+   break;
+  rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
  }

the result looks a bit like spaghetti code ... and lacks a comment
on what it is trying to do.  It looks like it sees through an arbitrary
number of conversions - possibly ones that will make the
macc invalid, as for (short)int-var * short-var + int-var.  So you'll
be pessimizing code by doing that unconditionally.  As I said
above you should at most consider one intermediate conversion.


Ok, I need to add a comment here. The code does indeed look back through 
an arbitrary number of conversions. It is searching for the last real 
operation before the addition, hoping to find a multiply.



I believe the code should be arranged such that only valid
conversions are looked through in the first place.  Valid, in
that the resulting types should still match the macc constraints.


Well, it might be possible to discard some conversions initially, but 
until the multiply is found, and it's input types are known, we can't 
know for certain what conversions are valid.


I think I need to explain what's going on here more clearly.

  1. It finds an addition statement. It's not known yet whether it is 
part of a multiply-and-accumulate, or not.


  2. It follows the conversion chain back from each operand to see if 
it finds a multiply, or widening multiply statement.


  3. If it finds a non-widening multiply, it checks it to see if it 
could be widening multiply-and-accumulate (it will already have been 
rejected as a widening multiply on it's own, but the addition might be 
in a wider mode, or the target might provide multiply-and-accumulate 
insns that don't have corresponding widening multiply insns).


  4. (This is the new bit!) It looks to see if there are any 
conversions between the multiply and addition that can safely be ignored.


  5. If we get here, then emit any necessary conversion statements, and 
convert the addition to a WIDEN_MULT_PLUS_EXPR.


Before these changes, any conversion between the multiply and addition 
statements would prevent optimization, even though there are many cases 
where the conversions are valid, and even inserted automatically.


I'm going to go away and find out whether there are really any cases 
where there can legitimately be more than one conversion, and at least 
update my patch with better commenting.


Thanks for you review.

Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-07 Thread Andrew Stubbs

On 07/07/11 11:26, Andrew Stubbs wrote:

On 07/07/11 10:58, Richard Guenther wrote:

I think you should assume that series of widenings,
(int)(short)char_variable
are already combined.  Thus I believe you only need to consider a single
conversion in valid_types_for_madd_p.


Hmm, I'm not so sure. I'll look into it a bit further.


OK, here's a test case that gives multiple conversions:

  long long
  foo (long long a, signed char b, signed char c)
  {
int bc = b * c;
return a + (short)bc;
  }

The dump right before the widen_mult pass gives:

  foo (long long int a, signed char b, signed char c)
  {
int bc;
long long int D.2018;
short int D.2017;
long long int D.2016;
int D.2015;
int D.2014;

  :
D.2014_2 = (int) b_1(D);
D.2015_4 = (int) c_3(D);
bc_5 = D.2014_2 * D.2015_4;
D.2017_6 = (short int) bc_5;
D.2018_7 = (long long int) D.2017_6;
D.2016_9 = D.2018_7 + a_8(D);
return D.2016_9;

  }

Here we have a multiply and accumulate done the long way. The 8-bit 
inputs are widened to 32-bit, multiplied to give a 32-bit result (of 
which only the lower 16-bits contain meaningful data), then truncated to 
16-bits, and sign-extended up to 64-bits ready for the 64-bit addition.


This is slight contrived, perhaps, but not unlike the sort of thing that 
might occur when you have inline functions and macros, and most 
importantly - it is mathematically valid!



So, here's the output from my patched widen_mult pass:

  foo (long long int a, signed char b, signed char c)
  {
int bc;
long long int D.2018;
short int D.2017;
long long int D.2016;
int D.2015;
int D.2014;

  :
D.2014_2 = (int) b_1(D);
D.2015_4 = (int) c_3(D);
bc_5 = b_1(D) w* c_3(D);
D.2017_6 = (short int) bc_5;
D.2018_7 = (long long int) D.2017_6;
D.2016_9 = WIDEN_MULT_PLUS_EXPR ;
return D.2016_9;

  }

As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is 
now redundant. (Ideally, this would be removed now, but in fact it 
doesn't get eliminated until the RTL into_cfglayout pass. This is not 
new behaviour.)



My point is that it's possible to have at least two conversions to 
examine. Is it possible to have more? I don't know, but once I'm dealing 
with two I might as well deal with an arbitrary number.


Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-07 Thread Richard Guenther
On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs  wrote:
> On 07/07/11 11:26, Andrew Stubbs wrote:
>>
>> On 07/07/11 10:58, Richard Guenther wrote:
>>>
>>> I think you should assume that series of widenings,
>>> (int)(short)char_variable
>>> are already combined.  Thus I believe you only need to consider a single
>>> conversion in valid_types_for_madd_p.
>>
>> Hmm, I'm not so sure. I'll look into it a bit further.
>
> OK, here's a test case that gives multiple conversions:
>
>  long long
>  foo (long long a, signed char b, signed char c)
>  {
>    int bc = b * c;
>    return a + (short)bc;
>  }
>
> The dump right before the widen_mult pass gives:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  :
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = D.2014_2 * D.2015_4;
>    D.2017_6 = (short int) bc_5;

Ok, so you have a truncation that is a no-op value-wise.  I would
argue that this truncation should be removed independent on
whether we have a widening multiply instruction or not.

The technically most capable place to remove non-value-changing
truncations (and combine them with a successive conversion)
would be value-range propagation.  Which already knows:

Value ranges after VRP:

b_1(D): VARYING
D.2698_2: [-128, 127]
c_3(D): VARYING
D.2699_4: [-128, 127]
bc_5: [-16256, 16384]
D.2701_6: [-16256, 16384]
D.2702_7: [-16256, 16384]
a_8(D): VARYING
D.2700_9: VARYING

thus truncating bc_5 to short does not change the value.

The simplification could be made when looking at the
statement

>    D.2018_7 = (long long int) D.2017_6;

in vrp_fold_stmt, based on the fact that this conversion
converts from a value-preserving intermediate conversion.
Thus the transform would replace the D.2017_6 operand
with bc_5.

So yes, the case appears - but it shouldn't ;)

I'll cook up a quick patch for VRP.

Thanks,
Richard.

>    D.2016_9 = D.2018_7 + a_8(D);
>    return D.2016_9;
>
>  }
>
> Here we have a multiply and accumulate done the long way. The 8-bit inputs
> are widened to 32-bit, multiplied to give a 32-bit result (of which only the
> lower 16-bits contain meaningful data), then truncated to 16-bits, and
> sign-extended up to 64-bits ready for the 64-bit addition.
>
> This is slight contrived, perhaps, but not unlike the sort of thing that
> might occur when you have inline functions and macros, and most importantly
> - it is mathematically valid!
>
>
> So, here's the output from my patched widen_mult pass:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  :
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = b_1(D) w* c_3(D);
>    D.2017_6 = (short int) bc_5;
>    D.2018_7 = (long long int) D.2017_6;
>    D.2016_9 = WIDEN_MULT_PLUS_EXPR ;
>    return D.2016_9;
>
>  }
>
> As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now
> redundant. (Ideally, this would be removed now, but in fact it doesn't get
> eliminated until the RTL into_cfglayout pass. This is not new behaviour.)
>
>
> My point is that it's possible to have at least two conversions to examine.
> Is it possible to have more? I don't know, but once I'm dealing with two I
> might as well deal with an arbitrary number.
>
> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-07 Thread Richard Guenther
On Thu, Jul 7, 2011 at 2:28 PM, Richard Guenther
 wrote:
> On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs  wrote:
>> On 07/07/11 11:26, Andrew Stubbs wrote:
>>>
>>> On 07/07/11 10:58, Richard Guenther wrote:

 I think you should assume that series of widenings,
 (int)(short)char_variable
 are already combined.  Thus I believe you only need to consider a single
 conversion in valid_types_for_madd_p.
>>>
>>> Hmm, I'm not so sure. I'll look into it a bit further.
>>
>> OK, here's a test case that gives multiple conversions:
>>
>>  long long
>>  foo (long long a, signed char b, signed char c)
>>  {
>>    int bc = b * c;
>>    return a + (short)bc;
>>  }
>>
>> The dump right before the widen_mult pass gives:
>>
>>  foo (long long int a, signed char b, signed char c)
>>  {
>>    int bc;
>>    long long int D.2018;
>>    short int D.2017;
>>    long long int D.2016;
>>    int D.2015;
>>    int D.2014;
>>
>>  :
>>    D.2014_2 = (int) b_1(D);
>>    D.2015_4 = (int) c_3(D);
>>    bc_5 = D.2014_2 * D.2015_4;
>>    D.2017_6 = (short int) bc_5;
>
> Ok, so you have a truncation that is a no-op value-wise.  I would
> argue that this truncation should be removed independent on
> whether we have a widening multiply instruction or not.
>
> The technically most capable place to remove non-value-changing
> truncations (and combine them with a successive conversion)
> would be value-range propagation.  Which already knows:
>
> Value ranges after VRP:
>
> b_1(D): VARYING
> D.2698_2: [-128, 127]
> c_3(D): VARYING
> D.2699_4: [-128, 127]
> bc_5: [-16256, 16384]
> D.2701_6: [-16256, 16384]
> D.2702_7: [-16256, 16384]
> a_8(D): VARYING
> D.2700_9: VARYING
>
> thus truncating bc_5 to short does not change the value.
>
> The simplification could be made when looking at the
> statement
>
>>    D.2018_7 = (long long int) D.2017_6;
>
> in vrp_fold_stmt, based on the fact that this conversion
> converts from a value-preserving intermediate conversion.
> Thus the transform would replace the D.2017_6 operand
> with bc_5.
>
> So yes, the case appears - but it shouldn't ;)
>
> I'll cook up a quick patch for VRP.

Like the attached.  I'll finish and properly test it.

Richard.


p
Description: Binary data


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-08 Thread Andrew Stubbs

On 07/07/11 13:37, Richard Guenther wrote:

I'll cook up a quick patch for VRP.


Like the attached.  I'll finish and properly test it.


Your patch appears to do the wrong thing for this test case:

int
foo (int a, short b, short c)
{
  int bc = b * c;
  return a + (short)bc;
}

With your patch, the input to the widening-mult pass now looks like this:

foo (int a, short int b, short int c)
{
  int bc;
  int D.2016;
  int D.2015;
  int D.2014;

:
  D.2014_2 = (int) b_1(D);
  D.2015_4 = (int) c_3(D);
  bc_5 = D.2014_2 * D.2015_4;
  D.2016_9 = bc_5 + a_8(D);
  return D.2016_9;

}

It looks like when the user tries to deliberately break the maths your 
patch seems to unbreak it.


Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-08 Thread Richard Guenther
On Fri, Jul 8, 2011 at 2:44 PM, Andrew Stubbs  wrote:
> On 07/07/11 13:37, Richard Guenther wrote:
>>>
>>> I'll cook up a quick patch for VRP.
>>
>> Like the attached.  I'll finish and properly test it.
>
> Your patch appears to do the wrong thing for this test case:
>
> int
> foo (int a, short b, short c)
> {
>  int bc = b * c;
>  return a + (short)bc;
> }
>
> With your patch, the input to the widening-mult pass now looks like this:
>
> foo (int a, short int b, short int c)
> {
>  int bc;
>  int D.2016;
>  int D.2015;
>  int D.2014;
>
> :
>  D.2014_2 = (int) b_1(D);
>  D.2015_4 = (int) c_3(D);
>  bc_5 = D.2014_2 * D.2015_4;
>  D.2016_9 = bc_5 + a_8(D);
>  return D.2016_9;
>
> }
>
> It looks like when the user tries to deliberately break the maths your patch
> seems to unbreak it.

Yeah, I fixed that in the checked in version.

Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-11 Thread Andrew Stubbs

On 07/07/11 10:58, Richard Guenther wrote:

I think you should assume that series of widenings, (int)(short)char_variable
are already combined.  Thus I believe you only need to consider a single
conversion in valid_types_for_madd_p.


Ok, here's my new patch.

This version only allows one conversion between the multiply and 
addition, so assumes that VRP has eliminated any needless ones.


That one conversion may either be a truncate, if the mode was too large 
for the meaningful data, or an extend, which must be of the right flavour.


This means that this patch now has the same effect as the last patch, 
for all valid cases (following you VRP patch), but rejects the cases 
where the C language (unhelpfully) requires an intermediate temporary to 
be of the 'wrong' signedness.


Hopefully the output will now be the same between both -O0 and -O2, and 
programmers will continue to have to be careful about casting unsigned 
variables whenever they expect purely unsigned math. :(


Is this one ok?

Andrew
2011-07-11  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2175,6 +2176,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
 return false;
 
+  /* Allow for one conversion statement between the multiply
+ and addition/subtraction statement.  If there are more than
+ one conversions then we assume they would invalidate this
+ transformation.  If that's not the case then they should have
+ been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+{
+  conv1_stmt = rhs1_stmt;
+  rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+  else
+	return false;
+}
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+{
+  conv2_stmt = rhs2_stmt;
+  rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+  if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+  else
+	return false;
+}
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
  is_widening_mult_p, but we still need the rhs returns.
 
@@ -2188,6 +2221,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs2;
+  conv_stmt = conv1_stmt;
 }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
@@ -2195,6 +2229,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs1;
+  conv_stmt = conv2_stmt;
 }
   else
 return false;
@@ -2202,6 +2237,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
 return false;
 
+  /* If there was a conversion between the multiply and addition
+ then we need to make sure it fits a multiply-and-accumulate.
+ The should be a single mode change which does not change the
+ value.  */
+  if (conv_stmt)
+{
+  tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+  tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+  int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	return false;
+	}
+  else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's 

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-12 Thread Richard Guenther
On Mon, Jul 11, 2011 at 6:55 PM, Andrew Stubbs  wrote:
> On 07/07/11 10:58, Richard Guenther wrote:
>>
>> I think you should assume that series of widenings,
>> (int)(short)char_variable
>> are already combined.  Thus I believe you only need to consider a single
>> conversion in valid_types_for_madd_p.
>
> Ok, here's my new patch.
>
> This version only allows one conversion between the multiply and addition,
> so assumes that VRP has eliminated any needless ones.
>
> That one conversion may either be a truncate, if the mode was too large for
> the meaningful data, or an extend, which must be of the right flavour.
>
> This means that this patch now has the same effect as the last patch, for
> all valid cases (following you VRP patch), but rejects the cases where the C
> language (unhelpfully) requires an intermediate temporary to be of the
> 'wrong' signedness.
>
> Hopefully the output will now be the same between both -O0 and -O2, and
> programmers will continue to have to be careful about casting unsigned
> variables whenever they expect purely unsigned math. :(
>
> Is this one ok?

Ok.

Thanks,
Richard.

> Andrew
>


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-14 Thread Andrew Stubbs
This update changes only the context modified by changes to patch 2. The 
patch has already been approved. I'm just posting it for completeness.


Andrew
2011-07-14  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
 return false;
 
+  /* Allow for one conversion statement between the multiply
+ and addition/subtraction statement.  If there are more than
+ one conversions then we assume they would invalidate this
+ transformation.  If that's not the case then they should have
+ been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+{
+  conv1_stmt = rhs1_stmt;
+  rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+  else
+	return false;
+}
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+{
+  conv2_stmt = rhs2_stmt;
+  rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+  if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+  else
+	return false;
+}
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
  is_widening_mult_p, but we still need the rhs returns.
 
@@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs2;
+  conv_stmt = conv1_stmt;
 }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
@@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs1;
+  conv_stmt = conv2_stmt;
 }
   else
 return false;
@@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
 return false;
 
+  /* If there was a conversion between the multiply and addition
+ then we need to make sure it fits a multiply-and-accumulate.
+ The should be a single mode change which does not change the
+ value.  */
+  if (conv_stmt)
+{
+  tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+  tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+  int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	return false;
+	}
+  else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	  && !(is_unsigned && TYPE_PRECISION (from_type) > data_size))
+	return false;
+	}
+  /* else convert is a no-op for our purposes.  */
+}
+
   /* Verify that the machine can perform a widening multiply
  accumulate in this mode/signedness combination, otherwise
  this transformation is likely to pessimize code.  */


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-18 Thread Janis Johnson
On 07/14/2011 07:16 AM, Andrew Stubbs wrote:
> { dg-options "-O2 -march=armv7-a" }

The tests use "{ dg-options "-O2 -march=armv7-a" }" but -march will be
overridden for multilibs that specify -march, and might conflict with
other multilib options.  If you really need that particular -march value
then use dg-skip-if to skip multilibs with conflicting or overriding
options, or else "dg-require-effective-target arm_dsp" to only run the
tests when the multilib already supports it; that has the advantage of
testing a wider range of arch values.

Janis



Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-19 Thread Andrew Stubbs

On 19/07/11 00:33, Janis Johnson wrote:

On 07/14/2011 07:16 AM, Andrew Stubbs wrote:

{ dg-options "-O2 -march=armv7-a" }


The tests use "{ dg-options "-O2 -march=armv7-a" }" but -march will be
overridden for multilibs that specify -march, and might conflict with
other multilib options.  If you really need that particular -march value
then use dg-skip-if to skip multilibs with conflicting or overriding
options, or else "dg-require-effective-target arm_dsp" to only run the
tests when the multilib already supports it; that has the advantage of
testing a wider range of arch values.


Yes, I know about this one. You committed that feature since I first 
posted this, I think? I plan to make that change when I do the final commit.


Andrew


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-08-19 Thread Andrew Stubbs

On 12/07/11 11:52, Richard Guenther wrote:

Is this one ok?

Ok.


I've just committed this slightly modified patch.

The changes are mainly in the context and the testcase.

Andrew
2011-08-19  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "\tmul\t" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2136,6 +2136,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2178,6 +2179,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
 return false;
 
+  /* Allow for one conversion statement between the multiply
+ and addition/subtraction statement.  If there are more than
+ one conversions then we assume they would invalidate this
+ transformation.  If that's not the case then they should have
+ been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+{
+  conv1_stmt = rhs1_stmt;
+  rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+  else
+	return false;
+}
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+{
+  conv2_stmt = rhs2_stmt;
+  rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+  if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+  else
+	return false;
+}
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
  is_widening_mult_p, but we still need the rhs returns.
 
@@ -2191,6 +2224,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs2;
+  conv_stmt = conv1_stmt;
 }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
@@ -2198,6 +2232,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs1;
+  conv_stmt = conv2_stmt;
 }
   else
 return false;
@@ -2208,6 +2243,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
 return false;
 
+  /* If there was a conversion between the multiply and addition
+ then we need to make sure it fits a multiply-and-accumulate.
+ The should be a single mode change which does not change the
+ value.  */
+  if (conv_stmt)
+{
+  tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+  tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+  int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	return false;
+	}
+  else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	  && !(is_unsigned && TYPE_PRECISION (from_type) > data_size))
+	return false;
+	}
+  /* else convert is a no-op for our purposes.  */
+}
+
   /* Verify that the machine can perform a widening multiply
  accumulate in this mode/signedness combination, otherwise
  this transformation is likely to pessimize code.  */