Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs wrote: > There are many cases where the widening_mult pass does not recognise > widening multiply-and-accumulate cases simply because there is a type > conversion step between the multiply and add statements. > > This patch should rectify that simply by looking beyond those conversions. That's surely wrong for (int)(short)int_var. You have to constrain the conversions you look through properly. Richard. > OK? > > Andrew > >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 06/23/2011 07:40 AM, Andrew Stubbs wrote: +++ b/gcc/testsuite/gcc.target/arm/umlal-1.c +/* { dg-final { scan-assembler "umlal" } } */ Don't use the name of the instruction as the test name or the scan will always pass, because the file name shows up in assembly output. See http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01823.html for a proposed effective target that can be used in this test. Janis
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 23/06/11 17:26, Richard Guenther wrote: On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs wrote: There are many cases where the widening_mult pass does not recognise widening multiply-and-accumulate cases simply because there is a type conversion step between the multiply and add statements. This patch should rectify that simply by looking beyond those conversions. That's surely wrong for (int)(short)int_var. You have to constrain the conversions you look through properly. To be clear, it only skips past NOP_EXPR. Is it not the case that what you're describing would need a CONVERT_EXPR? Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Fri, Jun 24, 2011 at 10:05 AM, Andrew Stubbs wrote: > On 23/06/11 17:26, Richard Guenther wrote: >> >> On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs >> wrote: >>> >>> There are many cases where the widening_mult pass does not recognise >>> widening multiply-and-accumulate cases simply because there is a type >>> conversion step between the multiply and add statements. >>> >>> This patch should rectify that simply by looking beyond those >>> conversions. >> >> That's surely wrong for (int)(short)int_var. You have to constrain >> the conversions >> you look through properly. > > To be clear, it only skips past NOP_EXPR. Is it not the case that what > you're describing would need a CONVERT_EXPR? NOP_EXPR is the same as CONVERT_EXPR. Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 24/06/11 09:28, Richard Guenther wrote: >> > To be clear, it only skips past NOP_EXPR. Is it not the case that what >> > you're describing would need a CONVERT_EXPR? > NOP_EXPR is the same as CONVERT_EXPR. Are you sure? I thought this was safe because the internals manual says: NOP_EXPR These nodes are used to represent conversions that do not require any code-generation CONVERT_EXPR These nodes are similar to NOP_EXPRs, but are used in those situations where code may need to be generated So, I tried this example: int foo (int a, short b, short c) { int bc = b * c; return a + (short)bc; } Both before and after my patch, GCC gives: mul r2, r1, r2 sxtah r0, r0, r2 (where, SXTAH means sign-extend the third operand from HImode to SImode and add to the second operand.) The dump after the widening_mult pass is: foo (int a, short int b, short int c) { int bc; int D.2018; short int D.2017; int D.2016; int D.2015; int D.2014; : D.2014_2 = (int) b_1(D); D.2015_4 = (int) c_3(D); bc_5 = b_1(D) w* c_3(D); D.2017_6 = (short int) bc_5; D.2018_7 = (int) D.2017_6; D.2016_9 = D.2018_7 + a_8(D); return D.2016_9; } Where you can clearly see that the addition has not been recognised as a multiply-and-accumulate. When I step through convert_plusminus_to_widen, I can see that the reason it has not matched is because "D.2017_6 = (short int) bc_5" is encoded with a CONVERT_EXPR, just as the manual said it would be. So, according to the manual, and my (admittedly limited) experiments, skipping over NOP_EXPR does appear to be safe. But you say that it isn't safe. So now I'm confused. :( I can certainly add checks to make sure that the skipped operations actually don't make any important changes to the value, but do I need to? Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Fri, Jun 24, 2011 at 3:46 PM, Stubbs, Andrew wrote: > On 24/06/11 09:28, Richard Guenther wrote: >>> > To be clear, it only skips past NOP_EXPR. Is it not the case that what >>> > you're describing would need a CONVERT_EXPR? >> NOP_EXPR is the same as CONVERT_EXPR. > > Are you sure? Yes, definitely. They are synonyms of each other (an unfinished merging process), the usual check for them is via CONVERT_EXPR_P. > I thought this was safe because the internals manual says: > > NOP_EXPR > These nodes are used to represent conversions that do not require any > code-generation > > CONVERT_EXPR > These nodes are similar to NOP_EXPRs, but are used in those > situations where code may need to be generated Which is wrong (sorry). > So, I tried this example: > > int > foo (int a, short b, short c) > { > int bc = b * c; > return a + (short)bc; > } > > Both before and after my patch, GCC gives: > > mul r2, r1, r2 > sxtah r0, r0, r2 > > (where, SXTAH means sign-extend the third operand from HImode to SImode > and add to the second operand.) > > The dump after the widening_mult pass is: > > foo (int a, short int b, short int c) > { > int bc; > int D.2018; > short int D.2017; > int D.2016; > int D.2015; > int D.2014; > > : > D.2014_2 = (int) b_1(D); > D.2015_4 = (int) c_3(D); > bc_5 = b_1(D) w* c_3(D); > D.2017_6 = (short int) bc_5; > D.2018_7 = (int) D.2017_6; > D.2016_9 = D.2018_7 + a_8(D); > return D.2016_9; > > } > > Where you can clearly see that the addition has not been recognised as a > multiply-and-accumulate. > > When I step through convert_plusminus_to_widen, I can see that the > reason it has not matched is because "D.2017_6 = (short int) bc_5" is > encoded with a CONVERT_EXPR, just as the manual said it would be. A NOP_EXPR in this place would be valid as well. The merging hasn't been completed and at least the C frontend still generates CONVERT_EXPRs in some cases. > So, according to the manual, and my (admittedly limited) experiments, > skipping over NOP_EXPR does appear to be safe. > > But you say that it isn't safe. So now I'm confused. :( > > I can certainly add checks to make sure that the skipped operations > actually don't make any important changes to the value, but do I need to? Yes. Thanks, Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 24/06/11 16:47, Richard Guenther wrote: >> > I can certainly add checks to make sure that the skipped operations >> > actually don't make any important changes to the value, but do I need to? > Yes. Ok, I'll go away and do that then. BTW, I see useless_type_conversion_p, but that's not quite what I want. Is there an equivalent existing function to determine whether a conversion changes the logical/arithmetic meaning of a type? I mean, conversion to a wider mode is not "useless", but it is harmless, whereas conversion to a narrower mode may truncate the value. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Fri, Jun 24, 2011 at 6:58 PM, Stubbs, Andrew wrote: > On 24/06/11 16:47, Richard Guenther wrote: >>> > I can certainly add checks to make sure that the skipped operations >>> > actually don't make any important changes to the value, but do I need to? >> Yes. > > Ok, I'll go away and do that then. > > BTW, I see useless_type_conversion_p, but that's not quite what I want. > Is there an equivalent existing function to determine whether a > conversion changes the logical/arithmetic meaning of a type? > > I mean, conversion to a wider mode is not "useless", but it is harmless, > whereas conversion to a narrower mode may truncate the value. Well, you have to decide that for the concrete situation based on the signedness and precision of the types involved. All such conversions change the logical/arithmetic meaning of a type if seen in the right context. Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 24/06/11 16:47, Richard Guenther wrote: I can certainly add checks to make sure that the skipped operations > actually don't make any important changes to the value, but do I need to? Yes. OK, how about this patch? I've added checks to make sure the value is not truncated at any point. I've also changed the test cases to address Janis' comments. Andrew 2011-06-28 Andrew Stubbs gcc/ * gimple.h (tree_ssa_harmless_type_conversion): New prototype. (tree_ssa_strip_harmless_type_conversions): New prototype. (harmless_type_conversion_p): New prototype. * tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for multiply statement beyond no-op conversion statements. * tree-ssa.c (harmless_type_conversion_p): New function. (tree_ssa_harmless_type_conversion): New function. (tree_ssa_strip_harmless_type_conversions): New function. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -1090,8 +1090,11 @@ extern bool validate_gimple_arglist (const_gimple, ...); /* In tree-ssa.c */ extern bool tree_ssa_useless_type_conversion (tree); +extern bool tree_ssa_harmless_type_conversion (tree); extern tree tree_ssa_strip_useless_type_conversions (tree); +extern tree tree_ssa_strip_harmless_type_conversions (tree); extern bool useless_type_conversion_p (tree, tree); +extern bool harmless_type_conversion_p (tree, tree); extern bool types_compatible_p (tree, tree); /* Return the code for GIMPLE statement G. */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler "mul" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2117,23 +2117,19 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, rhs1 = gimple_assign_rhs1 (stmt); rhs2 = gimple_assign_rhs2 (stmt); - if (TREE_CODE (rhs1) == SSA_NAME) -{ - rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); - if (is_gimple_assign (rhs1_stmt)) - rhs1_code = gimple_assign_rhs_code (rhs1_stmt); -} - else + if (TREE_CODE (rhs1) != SSA_NAME + || TREE_CODE (rhs2) != SSA_NAME) return false; - if (TREE_CODE (rhs2) == SSA_NAME) -{ - rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); - if (is_gimple_assign (rhs2_stmt)) - rhs2_code = gimple_assign_rhs_code (rhs2_stmt); -} - else -return false; + rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1); + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) +rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + + rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2); + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) +rhs2_code = gimple_assign_rhs_code (rhs2_stmt); if (code == PLUS_EXPR && rhs1_code == MULT_EXPR) { --- a/gcc/tree-ssa.c +++ b/gcc/tree-ssa.c @@ -1484,6 +1484,33 @@ useless_type_conversion_p (tree outer_type, tree inner_type) return false; } +/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will + not alter the arithmetic meaning of a type, otherwise return false. + + For example, widening an integer type leaves the value unchanged, + but narrowing an integer type can cause truncation. + + Note that switching between signed and unsigned modes doesn't change + the underlying representation, and so is harmless. + + This function is not yet a complete definition of what is harmless + but should reject everything that is not. */ + +bool +harmless_type_conversion_p (tree outer_type, tree inner_type) +{ + /* If it's useless, it's also harmless. */ + if (useless_type_conversion_p (outer_type, inner_type)) +return true; + + if (INTEGRAL_TYPE_P (inner_type) + && INTEGRAL_TYPE_P (outer_type) + && TYPE_PRECISION (inner_type) <= TYPE_PRECISION (outer_type)) +return true; + + return false; +} + /* Return true if a conversion from either type of TYPE1 and TYPE2 to the other is not required. Otherwise return false. */ @@ -1515,6 +1542,29 @@ tree_ssa_useless_type_conversion (tree expr) return false; } +/* Return true if EXPR is a harmless type conversion, otherwise return + false. */ + +bool +tree_ssa_harmless_type_conversion (tree expr) +{ + gimple stmt; + + if (TREE_CODE (expr) != SSA_NAME) +return false; + + stmt = SSA_NAME_DEF_STMT (expr); + + if (!is_gimple_assign (stmt)) +return false; + + if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))) +return false; + + return harmless_type_conversion_p (TREE_TYPE (gi
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs wrote: > On 24/06/11 16:47, Richard Guenther wrote: >>> >>> I can certainly add checks to make sure that the skipped operations >>> > actually don't make any important changes to the value, but do I need >>> > to? >> >> Yes. > > OK, how about this patch? I'd name the predicate value_preserving_conversion_p which I think is what you mean. harmless isn't really descriptive. Note that you include non-value-preserving conversions, namely int -> unsigned int. Don't dispatch to useless_type_conversion_p, it's easy to enumerate which conversions are value-preserving. Don't try to match the tree_ssa_useless_* set of functions, instead put the value_preserving_conversion_p predicate in tree.[ch] and a suitable function using it in tree-ssa-math-opts.c. Thanks, Richard. > I've added checks to make sure the value is not truncated at any point. > > I've also changed the test cases to address Janis' comments. > > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
Hi, On Tue, 28 Jun 2011, Richard Guenther wrote: > I'd name the predicate value_preserving_conversion_p which I think is > what you mean. harmless isn't really descriptive. > > Note that you include non-value-preserving conversions, namely int -> > unsigned int. It seems that Andrew really does want to accept them. If so value_preserving_conversion_p would be the wrong name. It seems to me he wants to accept those conversions that make it possible to retrieve the old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the to-be-named property. bits_preserving? Hmm. Ciao, Michael.
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 28/06/11 16:53, Michael Matz wrote: On Tue, 28 Jun 2011, Richard Guenther wrote: I'd name the predicate value_preserving_conversion_p which I think is what you mean. harmless isn't really descriptive. Note that you include non-value-preserving conversions, namely int -> unsigned int. It seems that Andrew really does want to accept them. If so value_preserving_conversion_p would be the wrong name. It seems to me he wants to accept those conversions that make it possible to retrieve the old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the to-be-named property. bits_preserving? Hmm. What I want (and I'm not totally clear on what this actually means) is to be able to optimize all the cases where the end result will be the same as the compiler produces now (using multiple multiply, shift, and add operations). Ok, so that's an obvious statement, but the point is that, right now, the compiler does nothing special when you cast from int -> unsigned int, or vice-versa, and I want to capture that somehow. There are some exceptions, I'm sure, but what are they? What is clear is that I don't want to just assume that casting from one signedness to the other is a show-stopper. For example: unsigned long long foo (unsigned long long a, unsigned char b, unsigned char c) { return a + b * c; } This appears to be entirely unsigned maths with plenty of spare precision, and therefore a dead cert for any SI->DI multiply-and-accumulate instruction, but not so - it is represented internally as: signed int tmp = (signed int)a * (signed int)b; unsigned long long result = a + (unsigned long long)tmp; Notice the unexpected signed int in the middle! I need to be able to get past that to optimize this properly. I've tried various test cases in which I cast signedness and mode around a bit, and so far it appear to perform safely, but probably I'm not be cunning enough. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
Hi, On Tue, 28 Jun 2011, Andrew Stubbs wrote: > What I want (and I'm not totally clear on what this actually means) is > to be able to optimize all the cases where the end result will be the > same as the compiler produces now (using multiple multiply, shift, and > add operations). Okay, then you really want to look through value-preserving conversions. > Ok, so that's an obvious statement, but the point is that, right now, > the compiler does nothing special when you cast from int -> unsigned > int, or vice-versa, and I want to capture that somehow. There are some > exceptions, I'm sure, but what are they? Same-sized signed <-> unsigned conversions aren't value preserving: unsigned char c = 255; (signed char)c == -1; 255 != -1 unsigned -> larger sized signed is value preserving unsigned char c = 255; (signed short)c == 255; signed -> unsigned never is value preserving > multiply-and-accumulate instruction, but not so - it is represented > internally as: > > signed int tmp = (signed int)a * (signed int)b; > unsigned long long result = a + (unsigned long long)tmp; > > Notice the unexpected signed int in the middle! Yeah, the C standard requires this. > I need to be able to get past that to optimize this properly. Then you're lucky because unsigned char -> signed int is an embedding, hence value preserving. I thought we had a predicate for such conversions already, but seems I was wrong. So, create it as Richi said, but enumerate explicitely the cases you want to handle, and include only those that really are value preserving. Ciao, Michael.
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 28/06/11 17:37, Michael Matz wrote: >> What I want (and I'm not totally clear on what this actually means) is >> > to be able to optimize all the cases where the end result will be the >> > same as the compiler produces now (using multiple multiply, shift, and >> > add operations). > Okay, then you really want to look through value-preserving conversions. > >> > Ok, so that's an obvious statement, but the point is that, right now, >> > the compiler does nothing special when you cast from int -> unsigned >> > int, or vice-versa, and I want to capture that somehow. There are some >> > exceptions, I'm sure, but what are they? > Same-sized signed<-> unsigned conversions aren't value preserving: >unsigned char c = 255; (signed char)c == -1; 255 != -1 > unsigned -> larger sized signed is value preserving >unsigned char c = 255; (signed short)c == 255; > signed -> unsigned never is value preserving OK, so I've tried implementing this, and I find I hit against a problem: Given this test case: unsigned long long foo (unsigned long long a, signed char *b, signed char *c) { return a + *b * *c; } Those rules say that it should not be suitable for optimization because there's an implicit cast from signed int to unsigned long long. Without any widening multiplications allowed, GCC gives this code (for ARM): ldrsb r2, [r2, #0] ldrsb r3, [r3, #0] mul r2, r2, r3 addsr0, r0, r2 adc r1, r1, r2, asr #31 This is exactly what a signed widening multiply-and-accumulate with smlalbb would have done! OK, so the types in the testcase are a bit contrived, but my point is that I want to be able to use the widening-mult instructions everywhere that they would produce the same output and gcc would otherwise, and gcc just doesn't seem that interested in signed<->unsigned conversions. So, I'm happy to put in checks to ensure that truncations are not ignore, but I'm really not sure what's the right thing to do with the extends and signedness switches. Any suggestions? Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Fri, Jul 1, 2011 at 1:58 PM, Stubbs, Andrew wrote: > On 28/06/11 17:37, Michael Matz wrote: >>> What I want (and I'm not totally clear on what this actually means) is >>> > to be able to optimize all the cases where the end result will be the >>> > same as the compiler produces now (using multiple multiply, shift, and >>> > add operations). >> Okay, then you really want to look through value-preserving conversions. >> >>> > Ok, so that's an obvious statement, but the point is that, right now, >>> > the compiler does nothing special when you cast from int -> unsigned >>> > int, or vice-versa, and I want to capture that somehow. There are some >>> > exceptions, I'm sure, but what are they? >> Same-sized signed<-> unsigned conversions aren't value preserving: >> unsigned char c = 255; (signed char)c == -1; 255 != -1 >> unsigned -> larger sized signed is value preserving >> unsigned char c = 255; (signed short)c == 255; >> signed -> unsigned never is value preserving > > OK, so I've tried implementing this, and I find I hit against a problem: > > Given this test case: > > unsigned long long > foo (unsigned long long a, signed char *b, signed char *c) > { > return a + *b * *c; > } > > Those rules say that it should not be suitable for optimization because > there's an implicit cast from signed int to unsigned long long. > > Without any widening multiplications allowed, GCC gives this code (for ARM): > > ldrsb r2, [r2, #0] > ldrsb r3, [r3, #0] > mul r2, r2, r3 > adds r0, r0, r2 > adc r1, r1, r2, asr #31 > > This is exactly what a signed widening multiply-and-accumulate with > smlalbb would have done! > > OK, so the types in the testcase are a bit contrived, but my point is > that I want to be able to use the widening-mult instructions everywhere > that they would produce the same output and gcc would otherwise, and gcc > just doesn't seem that interested in signed<->unsigned conversions. > > So, I'm happy to put in checks to ensure that truncations are not > ignore, but I'm really not sure what's the right thing to do with the > extends and signedness switches. > > Any suggestions? Well - some operations work the same on both signedness if you just care about the twos-complement result. This includes multiplication (but not for example division). For this special case I suggest to not bother trying to invent a generic predicate but do something local in tree-ssa-math-opts.c. Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/01/2011 01:58 PM, Stubbs, Andrew wrote: Given this test case: unsigned long long foo (unsigned long long a, signed char *b, signed char *c) { return a + *b * *c; } Those rules say that it should not be suitable for optimization because there's an implicit cast from signed int to unsigned long long. Got it now! Casts from signed to unsigned are not value-preserving, but they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the same result bit-by-bit as the s64 result. The fact that s64 has an implicit ... in front, while an u64 has an implicit ... does not matter. Is this the meaning of the predicate you want? I think so, based on the discussion, but it's hard to say without seeing the cases enumerated (i.e. a patch). However, perhaps there is a catch. We can do the following thought experiment. What would happen if you had multiple widening multiplies? Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit unsigned? I believe in this case you couldn't optimize 8-bit signed to 128-bit unsigned. Would your code do it? Paolo
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 01/07/11 13:33, Paolo Bonzini wrote: > Got it now! Casts from signed to unsigned are not value-preserving, but > they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the > same result bit-by-bit as the s64 result. The fact that s64 has an > implicit ... in front, while an u64 has an implicit ... does not > matter. But, the ... and ... are not implicit. They are very real, and if applied incorrectly will change the result, I think. > Is this the meaning of the predicate you want? I think so, based on the > discussion, but it's hard to say without seeing the cases enumerated > (i.e. a patch). The purpose of this predicate is to determine whether any type conversions that occur between the output of a widening multiply, and the input of an addition have any bearing on the end result. We know what the effective output type of the multiply is (the size is 2x the input type, and the signed if either one of the inputs in signed), and we know what the input type of the addition is, but any amount of junk can lie in between. The problem is determining if it *is* junk. In an ideal world there would only be two cases to consider: 1. No conversion needed. 2. A single sign-extend or zero-extend (according to the type of the inputs) to match the input size of the addition. Anything else would be unsuitable for optimization. Of course, it's never that simple, but it should still be possible to boil down a list of conversions to one of these cases, if it's valid. The signedness of the input to the addition is not significant - the code would be the same either way. But, I is important not to try to zero-extend something that started out signed, and not to sign-extend something that started out unsigned. > However, perhaps there is a catch. We can do the following thought > experiment. What would happen if you had multiple widening multiplies? > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit > unsigned? I believe in this case you couldn't optimize 8-bit signed to > 128-bit unsigned. Would your code do it? My code does not attempt to combine multiple multiplies. In any case, if you have two multiplications, surely you have at least three input values, so they can't be combined? It does attempt to combine a multiply and an addition, where a suitable madd* insn is available. (This is not new; I'm just trying to do it in more cases.) I have considered the case where you have "(a * b) + (c * d)", but have not yet coded anything for it. At present, the code will simply choose whichever multiply happens to find itself the first input operand of the plus, and ignores the other, even if the first turns out not to be a suitable candidate. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/01/2011 03:30 PM, Stubbs, Andrew wrote: > However, perhaps there is a catch. We can do the following thought > experiment. What would happen if you had multiple widening multiplies? > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit > unsigned? I believe in this case you couldn't optimize 8-bit signed to > 128-bit unsigned. Would your code do it? My code does not attempt to combine multiple multiplies. In any case, if you have two multiplications, surely you have at least three input values, so they can't be combined? What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to (u128)c + (u128)((s8)a * (s8)b). Paolo
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 01/07/11 15:40, Paolo Bonzini wrote: > On 07/01/2011 03:30 PM, Stubbs, Andrew wrote: >>> > However, perhaps there is a catch. We can do the following thought >>> > experiment. What would happen if you had multiple widening multiplies? >>> > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to >>> 128-bit >>> > unsigned? I believe in this case you couldn't optimize 8-bit signed to >>> > 128-bit unsigned. Would your code do it? >> My code does not attempt to combine multiple multiplies. In any case, if >> you have two multiplications, surely you have at least three input >> values, so they can't be combined? > > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to > (u128)c + (u128)((s8)a * (s8)b). Oh I see, sorry. Yes, that's exactly what I'm trying to do here. No, wait, I don't see. Where are these multiple widening multiplies you're talking about? I only see one multiply? Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 01/07/11 14:30, Stubbs, Andrew wrote: >> Got it now! Casts from signed to unsigned are not value-preserving, but >> > they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the >> > same result bit-by-bit as the s64 result. The fact that s64 has an >> > implicit ... in front, while an u64 has an implicit ... does not >> > matter. > But, the ... and ... are not implicit. They are very real, and > if applied incorrectly will change the result, I think. Wait, I'm clearly confused When I try a s32->u64 conversion, the expand pass generates a sign_extend insn. Clearly it's the source type that determines the extension type, not the destination type ... and I'm a dunce! Thanks :) Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/01/2011 04:55 PM, Stubbs, Andrew wrote: > > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to > (u128)c + (u128)((s8)a * (s8)b). Oh I see, sorry. Yes, that's exactly what I'm trying to do here. No, wait, I don't see. Where are these multiple widening multiplies you're talking about? I only see one multiply? I meant one multiplication with multiple widening steps. Not clear at all, sorry. Paolo
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 06/28/11 18:14, Andrew Stubbs wrote: > unsigned long long > foo (unsigned long long a, unsigned char b, unsigned char c) > { > return a + b * c; > } > > This appears to be entirely unsigned maths with plenty of spare > precision, and therefore a dead cert for any SI->DI > multiply-and-accumulate instruction, but not so - it is represented > internally as: > > signed int tmp = (signed int)a * (signed int)b; > unsigned long long result = a + (unsigned long long)tmp; > > Notice the unexpected signed int in the middle! I need to be able to get > past that to optimize this properly. Since both inputs are positive in a signed int (they must be, being cast from a smaller unsigned value), you can infer that it does not matter whether you treat the result of the multiplication as a signed or an unsigned value. It is positive in any case. So, I think the thing to test is: if the accumulate step requires widening the result of the multiplication, either the cast must be value preserving (widening unsigned to signed), or you must be able to prove that the multiplication produces a positive result. If the accumulate step just casts the multiplication result from signed to unsigned, keeping the precision the same, you can ignore the cast since the addition is unaffected by it. Bernd
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 01/07/11 16:54, Paolo Bonzini wrote: > On 07/01/2011 04:55 PM, Stubbs, Andrew wrote: >>> > >>> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to >>> > (u128)c + (u128)((s8)a * (s8)b). >> Oh I see, sorry. Yes, that's exactly what I'm trying to do here. >> >> No, wait, I don't see. Where are these multiple widening multiplies >> you're talking about? I only see one multiply? > > I meant one multiplication with multiple widening steps. Not clear at > all, sorry. Yes, I see now, the whole purpose of my patch set is widening by more than one mode. The case of the multiply-and-accumulate is the only way there can be more than one step though. Widening multiplies themselves are always handled as one unit. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 01/07/11 13:25, Richard Guenther wrote: Well - some operations work the same on both signedness if you just care about the twos-complement result. This includes multiplication (but not for example division). For this special case I suggest to not bother trying to invent a generic predicate but do something local in tree-ssa-math-opts.c. OK, here's my updated patch. I've taken the view that we *know* what size and signedness the result of the multiplication is, and we know what size the input to the addition must be, so all the check has to do is make sure it does that same conversion, even if by a roundabout means. What I hadn't grasped before is that when extending a value it's the source type that is significant, not the destination, so the checks are not as complex as I had thought. So, this patch adds a test to ensure that: 1. the type is not truncated so far that we lose any information; and 2. the type is only ever extended in the proper signedness. Also, just to be absolutely sure, I've also added a little bit of logic to permit extends that are then undone by a truncate. I'm really not sure what guarantees there are about what sort of cast sequences can exist? Is this necessary? I haven't managed to coax it to generated any examples of extends followed by truncates myself, but in any case, it's hardly any code and it'll make sure it's future proofed. OK? Andrew 2011-06-28 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (valid_types_for_madd_p): New function. (convert_plusminus_to_widen): Use valid_types_for_madd_p to identify optimization candidates. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- .../gcc/testsuite/gcc.target/arm/no-wmla-1.c | 11 ++ .../gcc/testsuite/gcc.target/arm/wmul-5.c | 10 ++ src/gcc-mainline/gcc/tree-ssa-math-opts.c | 112 ++-- 3 files changed, 123 insertions(+), 10 deletions(-) create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c new file mode 100644 index 000..17f7427 --- /dev/null +++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler "mul" } } */ diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c new file mode 100644 index 000..65c43e3 --- /dev/null +++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ diff --git a/src/gcc-mainline/gcc/tree-ssa-math-opts.c b/src/gcc-mainline/gcc/tree-ssa-math-opts.c index d55ba57..5ef7bb4 100644 --- a/src/gcc-mainline/gcc/tree-ssa-math-opts.c +++ b/src/gcc-mainline/gcc/tree-ssa-math-opts.c @@ -2085,6 +2085,78 @@ convert_mult_to_widen (gimple stmt) return true; } +/* Check the input types, TYPE1 and TYPE2 to a widening multiply, + and then the convertions between the output of the multiply, and + the input to an addition EXPR, to ensure that they are compatible with + a widening multiply-and-accumulate. + + This function assumes that expr is a valid string of conversion expressions + terminated by a multiplication. + + This function tries NOT to make any (fragile) assumptions about what + sequence of conversions can exist in the input. */ + +static bool +valid_types_for_madd_p (tree type1, tree type2, tree expr) +{ + gimple stmt, prev_stmt; + enum tree_code code, prev_code; + tree prev_expr, type, prev_type; + int bitsize, prev_bitsize, initial_bitsize, min_bitsize; + bool initial_unsigned; + + initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); + initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2); + + stmt = SSA_NAME_DEF_STMT (expr); + code = gimple_assign_rhs_code (stmt); + type = TREE_TYPE (expr); + bitsize = TYPE_PRECISION (type); + min_bitsize = bitsize; + + if (code == MULT_EXPR || code == WIDEN_MULT_EXPR) +return true; + + if (!INTEGRAL_TYPE_P (type) + || TYPE_PRECISION (type) < initial_bitsize) +return false; + + /* Step through the conversions backwards. */ + while (true) +{ + prev_expr = gimple_assign_rhs1 (stmt); + prev_stmt = SSA_NAME_DEF_STMT (prev_expr); + prev_code = gimple_assign_rhs_code (prev_stmt); + prev_type = TREE_TYPE (prev_expr); + prev_bitsize = TYPE_PRECISION (prev_type); + + if (prev_code == MULT_EXPR || prev_code == WIDEN
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Mon, Jul 4, 2011 at 4:23 PM, Andrew Stubbs wrote: > On 01/07/11 13:25, Richard Guenther wrote: >> >> Well - some operations work the same on both signedness if you >> just care about the twos-complement result. This includes >> multiplication (but not for example division). For this special >> case I suggest to not bother trying to invent a generic predicate >> but do something local in tree-ssa-math-opts.c. > > OK, here's my updated patch. > > I've taken the view that we *know* what size and signedness the result of > the multiplication is, and we know what size the input to the addition must > be, so all the check has to do is make sure it does that same conversion, > even if by a roundabout means. > > What I hadn't grasped before is that when extending a value it's the source > type that is significant, not the destination, so the checks are not as > complex as I had thought. > > So, this patch adds a test to ensure that: > > 1. the type is not truncated so far that we lose any information; and > > 2. the type is only ever extended in the proper signedness. > > Also, just to be absolutely sure, I've also added a little bit of logic to > permit extends that are then undone by a truncate. I'm really not sure what > guarantees there are about what sort of cast sequences can exist? Is this > necessary? I haven't managed to coax it to generated any examples of extends > followed by truncates myself, but in any case, it's hardly any code and > it'll make sure it's future proofed. > > OK? I think you should assume that series of widenings, (int)(short)char_variable are already combined. Thus I believe you only need to consider a single conversion in valid_types_for_madd_p. +/* Check the input types, TYPE1 and TYPE2 to a widening multiply, what are those types? Is TYPE1 the result type and TYPE2 the operand type? If so why + initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); this?! + initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2); that also looks odd. So probably TYPE1 isn't the result type. If they are the types of the operands, then what operand is EXPR for? I didn't look at the actual implementation of the function because of the lack of understanding of the inputs. - if (TREE_CODE (rhs1) == SSA_NAME) + for (tmp = rhs1, rhs1_code = ERROR_MARK; + TREE_CODE (tmp) == SSA_NAME + && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK); + tmp = gimple_assign_rhs1 (rhs1_stmt)) { - rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); - if (is_gimple_assign (rhs1_stmt)) - rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + rhs1_stmt = SSA_NAME_DEF_STMT (tmp); + if (!is_gimple_assign (rhs1_stmt)) + break; + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); } the result looks a bit like spaghetti code ... and lacks a comment on what it is trying to do. It looks like it sees through an arbitrary number of conversions - possibly ones that will make the macc invalid, as for (short)int-var * short-var + int-var. So you'll be pessimizing code by doing that unconditionally. As I said above you should at most consider one intermediate conversion. I believe the code should be arranged such that only valid conversions are looked through in the first place. Valid, in that the resulting types should still match the macc constraints. Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/07/11 10:58, Richard Guenther wrote: I think you should assume that series of widenings, (int)(short)char_variable are already combined. Thus I believe you only need to consider a single conversion in valid_types_for_madd_p. Hmm, I'm not so sure. I'll look into it a bit further. +/* Check the input types, TYPE1 and TYPE2 to a widening multiply, what are those types? Is TYPE1 the result type and TYPE2 the operand type? If so why TYPE1 and TYPE2 are the inputs to the multiply. I thought I explained that in the comment before the function. + initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); this?! The result of the multiply will be this many bits wide. This may be narrower than the type that holds it. E.g., 16-bit * 8-bit gives a result at most 24-bits wide, which will usually be held in a 32- or 64-bit variable. + initial_unsigned = TYPE_UNSIGNED (type1)&& TYPE_UNSIGNED (type2); that also looks odd. So probably TYPE1 isn't the result type. If they are the types of the operands, then what operand is EXPR for? EXPR, as the comment says, is the addition that follows the multiply. - if (TREE_CODE (rhs1) == SSA_NAME) + for (tmp = rhs1, rhs1_code = ERROR_MARK; + TREE_CODE (tmp) == SSA_NAME +&& (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK); + tmp = gimple_assign_rhs1 (rhs1_stmt)) { - rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); - if (is_gimple_assign (rhs1_stmt)) - rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + rhs1_stmt = SSA_NAME_DEF_STMT (tmp); + if (!is_gimple_assign (rhs1_stmt)) + break; + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); } the result looks a bit like spaghetti code ... and lacks a comment on what it is trying to do. It looks like it sees through an arbitrary number of conversions - possibly ones that will make the macc invalid, as for (short)int-var * short-var + int-var. So you'll be pessimizing code by doing that unconditionally. As I said above you should at most consider one intermediate conversion. Ok, I need to add a comment here. The code does indeed look back through an arbitrary number of conversions. It is searching for the last real operation before the addition, hoping to find a multiply. I believe the code should be arranged such that only valid conversions are looked through in the first place. Valid, in that the resulting types should still match the macc constraints. Well, it might be possible to discard some conversions initially, but until the multiply is found, and it's input types are known, we can't know for certain what conversions are valid. I think I need to explain what's going on here more clearly. 1. It finds an addition statement. It's not known yet whether it is part of a multiply-and-accumulate, or not. 2. It follows the conversion chain back from each operand to see if it finds a multiply, or widening multiply statement. 3. If it finds a non-widening multiply, it checks it to see if it could be widening multiply-and-accumulate (it will already have been rejected as a widening multiply on it's own, but the addition might be in a wider mode, or the target might provide multiply-and-accumulate insns that don't have corresponding widening multiply insns). 4. (This is the new bit!) It looks to see if there are any conversions between the multiply and addition that can safely be ignored. 5. If we get here, then emit any necessary conversion statements, and convert the addition to a WIDEN_MULT_PLUS_EXPR. Before these changes, any conversion between the multiply and addition statements would prevent optimization, even though there are many cases where the conversions are valid, and even inserted automatically. I'm going to go away and find out whether there are really any cases where there can legitimately be more than one conversion, and at least update my patch with better commenting. Thanks for you review. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/07/11 11:26, Andrew Stubbs wrote: On 07/07/11 10:58, Richard Guenther wrote: I think you should assume that series of widenings, (int)(short)char_variable are already combined. Thus I believe you only need to consider a single conversion in valid_types_for_madd_p. Hmm, I'm not so sure. I'll look into it a bit further. OK, here's a test case that gives multiple conversions: long long foo (long long a, signed char b, signed char c) { int bc = b * c; return a + (short)bc; } The dump right before the widen_mult pass gives: foo (long long int a, signed char b, signed char c) { int bc; long long int D.2018; short int D.2017; long long int D.2016; int D.2015; int D.2014; : D.2014_2 = (int) b_1(D); D.2015_4 = (int) c_3(D); bc_5 = D.2014_2 * D.2015_4; D.2017_6 = (short int) bc_5; D.2018_7 = (long long int) D.2017_6; D.2016_9 = D.2018_7 + a_8(D); return D.2016_9; } Here we have a multiply and accumulate done the long way. The 8-bit inputs are widened to 32-bit, multiplied to give a 32-bit result (of which only the lower 16-bits contain meaningful data), then truncated to 16-bits, and sign-extended up to 64-bits ready for the 64-bit addition. This is slight contrived, perhaps, but not unlike the sort of thing that might occur when you have inline functions and macros, and most importantly - it is mathematically valid! So, here's the output from my patched widen_mult pass: foo (long long int a, signed char b, signed char c) { int bc; long long int D.2018; short int D.2017; long long int D.2016; int D.2015; int D.2014; : D.2014_2 = (int) b_1(D); D.2015_4 = (int) c_3(D); bc_5 = b_1(D) w* c_3(D); D.2017_6 = (short int) bc_5; D.2018_7 = (long long int) D.2017_6; D.2016_9 = WIDEN_MULT_PLUS_EXPR ; return D.2016_9; } As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now redundant. (Ideally, this would be removed now, but in fact it doesn't get eliminated until the RTL into_cfglayout pass. This is not new behaviour.) My point is that it's possible to have at least two conversions to examine. Is it possible to have more? I don't know, but once I'm dealing with two I might as well deal with an arbitrary number. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs wrote: > On 07/07/11 11:26, Andrew Stubbs wrote: >> >> On 07/07/11 10:58, Richard Guenther wrote: >>> >>> I think you should assume that series of widenings, >>> (int)(short)char_variable >>> are already combined. Thus I believe you only need to consider a single >>> conversion in valid_types_for_madd_p. >> >> Hmm, I'm not so sure. I'll look into it a bit further. > > OK, here's a test case that gives multiple conversions: > > long long > foo (long long a, signed char b, signed char c) > { > int bc = b * c; > return a + (short)bc; > } > > The dump right before the widen_mult pass gives: > > foo (long long int a, signed char b, signed char c) > { > int bc; > long long int D.2018; > short int D.2017; > long long int D.2016; > int D.2015; > int D.2014; > > : > D.2014_2 = (int) b_1(D); > D.2015_4 = (int) c_3(D); > bc_5 = D.2014_2 * D.2015_4; > D.2017_6 = (short int) bc_5; Ok, so you have a truncation that is a no-op value-wise. I would argue that this truncation should be removed independent on whether we have a widening multiply instruction or not. The technically most capable place to remove non-value-changing truncations (and combine them with a successive conversion) would be value-range propagation. Which already knows: Value ranges after VRP: b_1(D): VARYING D.2698_2: [-128, 127] c_3(D): VARYING D.2699_4: [-128, 127] bc_5: [-16256, 16384] D.2701_6: [-16256, 16384] D.2702_7: [-16256, 16384] a_8(D): VARYING D.2700_9: VARYING thus truncating bc_5 to short does not change the value. The simplification could be made when looking at the statement > D.2018_7 = (long long int) D.2017_6; in vrp_fold_stmt, based on the fact that this conversion converts from a value-preserving intermediate conversion. Thus the transform would replace the D.2017_6 operand with bc_5. So yes, the case appears - but it shouldn't ;) I'll cook up a quick patch for VRP. Thanks, Richard. > D.2016_9 = D.2018_7 + a_8(D); > return D.2016_9; > > } > > Here we have a multiply and accumulate done the long way. The 8-bit inputs > are widened to 32-bit, multiplied to give a 32-bit result (of which only the > lower 16-bits contain meaningful data), then truncated to 16-bits, and > sign-extended up to 64-bits ready for the 64-bit addition. > > This is slight contrived, perhaps, but not unlike the sort of thing that > might occur when you have inline functions and macros, and most importantly > - it is mathematically valid! > > > So, here's the output from my patched widen_mult pass: > > foo (long long int a, signed char b, signed char c) > { > int bc; > long long int D.2018; > short int D.2017; > long long int D.2016; > int D.2015; > int D.2014; > > : > D.2014_2 = (int) b_1(D); > D.2015_4 = (int) c_3(D); > bc_5 = b_1(D) w* c_3(D); > D.2017_6 = (short int) bc_5; > D.2018_7 = (long long int) D.2017_6; > D.2016_9 = WIDEN_MULT_PLUS_EXPR ; > return D.2016_9; > > } > > As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now > redundant. (Ideally, this would be removed now, but in fact it doesn't get > eliminated until the RTL into_cfglayout pass. This is not new behaviour.) > > > My point is that it's possible to have at least two conversions to examine. > Is it possible to have more? I don't know, but once I'm dealing with two I > might as well deal with an arbitrary number. > > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Thu, Jul 7, 2011 at 2:28 PM, Richard Guenther wrote: > On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs wrote: >> On 07/07/11 11:26, Andrew Stubbs wrote: >>> >>> On 07/07/11 10:58, Richard Guenther wrote: I think you should assume that series of widenings, (int)(short)char_variable are already combined. Thus I believe you only need to consider a single conversion in valid_types_for_madd_p. >>> >>> Hmm, I'm not so sure. I'll look into it a bit further. >> >> OK, here's a test case that gives multiple conversions: >> >> long long >> foo (long long a, signed char b, signed char c) >> { >> int bc = b * c; >> return a + (short)bc; >> } >> >> The dump right before the widen_mult pass gives: >> >> foo (long long int a, signed char b, signed char c) >> { >> int bc; >> long long int D.2018; >> short int D.2017; >> long long int D.2016; >> int D.2015; >> int D.2014; >> >> : >> D.2014_2 = (int) b_1(D); >> D.2015_4 = (int) c_3(D); >> bc_5 = D.2014_2 * D.2015_4; >> D.2017_6 = (short int) bc_5; > > Ok, so you have a truncation that is a no-op value-wise. I would > argue that this truncation should be removed independent on > whether we have a widening multiply instruction or not. > > The technically most capable place to remove non-value-changing > truncations (and combine them with a successive conversion) > would be value-range propagation. Which already knows: > > Value ranges after VRP: > > b_1(D): VARYING > D.2698_2: [-128, 127] > c_3(D): VARYING > D.2699_4: [-128, 127] > bc_5: [-16256, 16384] > D.2701_6: [-16256, 16384] > D.2702_7: [-16256, 16384] > a_8(D): VARYING > D.2700_9: VARYING > > thus truncating bc_5 to short does not change the value. > > The simplification could be made when looking at the > statement > >> D.2018_7 = (long long int) D.2017_6; > > in vrp_fold_stmt, based on the fact that this conversion > converts from a value-preserving intermediate conversion. > Thus the transform would replace the D.2017_6 operand > with bc_5. > > So yes, the case appears - but it shouldn't ;) > > I'll cook up a quick patch for VRP. Like the attached. I'll finish and properly test it. Richard. p Description: Binary data
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/07/11 13:37, Richard Guenther wrote: I'll cook up a quick patch for VRP. Like the attached. I'll finish and properly test it. Your patch appears to do the wrong thing for this test case: int foo (int a, short b, short c) { int bc = b * c; return a + (short)bc; } With your patch, the input to the widening-mult pass now looks like this: foo (int a, short int b, short int c) { int bc; int D.2016; int D.2015; int D.2014; : D.2014_2 = (int) b_1(D); D.2015_4 = (int) c_3(D); bc_5 = D.2014_2 * D.2015_4; D.2016_9 = bc_5 + a_8(D); return D.2016_9; } It looks like when the user tries to deliberately break the maths your patch seems to unbreak it. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Fri, Jul 8, 2011 at 2:44 PM, Andrew Stubbs wrote: > On 07/07/11 13:37, Richard Guenther wrote: >>> >>> I'll cook up a quick patch for VRP. >> >> Like the attached. I'll finish and properly test it. > > Your patch appears to do the wrong thing for this test case: > > int > foo (int a, short b, short c) > { > int bc = b * c; > return a + (short)bc; > } > > With your patch, the input to the widening-mult pass now looks like this: > > foo (int a, short int b, short int c) > { > int bc; > int D.2016; > int D.2015; > int D.2014; > > : > D.2014_2 = (int) b_1(D); > D.2015_4 = (int) c_3(D); > bc_5 = D.2014_2 * D.2015_4; > D.2016_9 = bc_5 + a_8(D); > return D.2016_9; > > } > > It looks like when the user tries to deliberately break the maths your patch > seems to unbreak it. Yeah, I fixed that in the checked in version. Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/07/11 10:58, Richard Guenther wrote: I think you should assume that series of widenings, (int)(short)char_variable are already combined. Thus I believe you only need to consider a single conversion in valid_types_for_madd_p. Ok, here's my new patch. This version only allows one conversion between the multiply and addition, so assumes that VRP has eliminated any needless ones. That one conversion may either be a truncate, if the mode was too large for the meaningful data, or an extend, which must be of the right flavour. This means that this patch now has the same effect as the last patch, for all valid cases (following you VRP patch), but rejects the cases where the C language (unhelpfully) requires an intermediate temporary to be of the 'wrong' signedness. Hopefully the output will now be the same between both -O0 and -O2, and programmers will continue to have to be careful about casting unsigned variables whenever they expect purely unsigned math. :( Is this one ok? Andrew 2011-07-11 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single conversion statement separating multiply-and-accumulate. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler "mul" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, enum tree_code code) { gimple rhs1_stmt = NULL, rhs2_stmt = NULL; + gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt; tree type, type1, type2; tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs; enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK; @@ -2175,6 +2176,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, else return false; + /* Allow for one conversion statement between the multiply + and addition/subtraction statement. If there are more than + one conversions then we assume they would invalidate this + transformation. If that's not the case then they should have + been folded before now. */ + if (CONVERT_EXPR_CODE_P (rhs1_code)) +{ + conv1_stmt = rhs1_stmt; + rhs1 = gimple_assign_rhs1 (rhs1_stmt); + if (TREE_CODE (rhs1) == SSA_NAME) + { + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + } + else + return false; +} + if (CONVERT_EXPR_CODE_P (rhs2_code)) +{ + conv2_stmt = rhs2_stmt; + rhs2 = gimple_assign_rhs1 (rhs2_stmt); + if (TREE_CODE (rhs2) == SSA_NAME) + { + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) + rhs2_code = gimple_assign_rhs_code (rhs2_stmt); + } + else + return false; +} + /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call is_widening_mult_p, but we still need the rhs returns. @@ -2188,6 +2221,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs2; + conv_stmt = conv1_stmt; } else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR) { @@ -2195,6 +2229,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs1; + conv_stmt = conv2_stmt; } else return false; @@ -2202,6 +2237,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) return false; + /* If there was a conversion between the multiply and addition + then we need to make sure it fits a multiply-and-accumulate. + The should be a single mode change which does not change the + value. */ + if (conv_stmt) +{ + tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt)); + tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt)); + int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); + bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2); + + if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type)) + { + /* Conversion is a truncate. */ + if (TYPE_PRECISION (to_type) < data_size) + return false; + } + else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type)) + { + /* Conversion is an extend. Check it's
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Mon, Jul 11, 2011 at 6:55 PM, Andrew Stubbs wrote: > On 07/07/11 10:58, Richard Guenther wrote: >> >> I think you should assume that series of widenings, >> (int)(short)char_variable >> are already combined. Thus I believe you only need to consider a single >> conversion in valid_types_for_madd_p. > > Ok, here's my new patch. > > This version only allows one conversion between the multiply and addition, > so assumes that VRP has eliminated any needless ones. > > That one conversion may either be a truncate, if the mode was too large for > the meaningful data, or an extend, which must be of the right flavour. > > This means that this patch now has the same effect as the last patch, for > all valid cases (following you VRP patch), but rejects the cases where the C > language (unhelpfully) requires an intermediate temporary to be of the > 'wrong' signedness. > > Hopefully the output will now be the same between both -O0 and -O2, and > programmers will continue to have to be careful about casting unsigned > variables whenever they expect purely unsigned math. :( > > Is this one ok? Ok. Thanks, Richard. > Andrew >
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
This update changes only the context modified by changes to patch 2. The patch has already been approved. I'm just posting it for completeness. Andrew 2011-07-14 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single conversion statement separating multiply-and-accumulate. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler "mul" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, enum tree_code code) { gimple rhs1_stmt = NULL, rhs2_stmt = NULL; + gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt; tree type, type1, type2, tmp; tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs; enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK; @@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, else return false; + /* Allow for one conversion statement between the multiply + and addition/subtraction statement. If there are more than + one conversions then we assume they would invalidate this + transformation. If that's not the case then they should have + been folded before now. */ + if (CONVERT_EXPR_CODE_P (rhs1_code)) +{ + conv1_stmt = rhs1_stmt; + rhs1 = gimple_assign_rhs1 (rhs1_stmt); + if (TREE_CODE (rhs1) == SSA_NAME) + { + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + } + else + return false; +} + if (CONVERT_EXPR_CODE_P (rhs2_code)) +{ + conv2_stmt = rhs2_stmt; + rhs2 = gimple_assign_rhs1 (rhs2_stmt); + if (TREE_CODE (rhs2) == SSA_NAME) + { + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) + rhs2_code = gimple_assign_rhs_code (rhs2_stmt); + } + else + return false; +} + /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call is_widening_mult_p, but we still need the rhs returns. @@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs2; + conv_stmt = conv1_stmt; } else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR) { @@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs1; + conv_stmt = conv2_stmt; } else return false; @@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) return false; + /* If there was a conversion between the multiply and addition + then we need to make sure it fits a multiply-and-accumulate. + The should be a single mode change which does not change the + value. */ + if (conv_stmt) +{ + tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt)); + tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt)); + int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); + bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2); + + if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type)) + { + /* Conversion is a truncate. */ + if (TYPE_PRECISION (to_type) < data_size) + return false; + } + else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type)) + { + /* Conversion is an extend. Check it's the right sort. */ + if (TYPE_UNSIGNED (from_type) != is_unsigned + && !(is_unsigned && TYPE_PRECISION (from_type) > data_size)) + return false; + } + /* else convert is a no-op for our purposes. */ +} + /* Verify that the machine can perform a widening multiply accumulate in this mode/signedness combination, otherwise this transformation is likely to pessimize code. */
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 07/14/2011 07:16 AM, Andrew Stubbs wrote: > { dg-options "-O2 -march=armv7-a" } The tests use "{ dg-options "-O2 -march=armv7-a" }" but -march will be overridden for multilibs that specify -march, and might conflict with other multilib options. If you really need that particular -march value then use dg-skip-if to skip multilibs with conflicting or overriding options, or else "dg-require-effective-target arm_dsp" to only run the tests when the multilib already supports it; that has the advantage of testing a wider range of arch values. Janis
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 19/07/11 00:33, Janis Johnson wrote: On 07/14/2011 07:16 AM, Andrew Stubbs wrote: { dg-options "-O2 -march=armv7-a" } The tests use "{ dg-options "-O2 -march=armv7-a" }" but -march will be overridden for multilibs that specify -march, and might conflict with other multilib options. If you really need that particular -march value then use dg-skip-if to skip multilibs with conflicting or overriding options, or else "dg-require-effective-target arm_dsp" to only run the tests when the multilib already supports it; that has the advantage of testing a wider range of arch values. Yes, I know about this one. You committed that feature since I first posted this, I think? I plan to make that change when I do the final commit. Andrew
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 12/07/11 11:52, Richard Guenther wrote: Is this one ok? Ok. I've just committed this slightly modified patch. The changes are mainly in the context and the testcase. Andrew 2011-08-19 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single conversion statement separating multiply-and-accumulate. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target arm_dsp } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler "\tmul\t" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target arm_dsp } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2136,6 +2136,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, enum tree_code code) { gimple rhs1_stmt = NULL, rhs2_stmt = NULL; + gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt; tree type, type1, type2, tmp; tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs; enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK; @@ -2178,6 +2179,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, else return false; + /* Allow for one conversion statement between the multiply + and addition/subtraction statement. If there are more than + one conversions then we assume they would invalidate this + transformation. If that's not the case then they should have + been folded before now. */ + if (CONVERT_EXPR_CODE_P (rhs1_code)) +{ + conv1_stmt = rhs1_stmt; + rhs1 = gimple_assign_rhs1 (rhs1_stmt); + if (TREE_CODE (rhs1) == SSA_NAME) + { + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + } + else + return false; +} + if (CONVERT_EXPR_CODE_P (rhs2_code)) +{ + conv2_stmt = rhs2_stmt; + rhs2 = gimple_assign_rhs1 (rhs2_stmt); + if (TREE_CODE (rhs2) == SSA_NAME) + { + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) + rhs2_code = gimple_assign_rhs_code (rhs2_stmt); + } + else + return false; +} + /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call is_widening_mult_p, but we still need the rhs returns. @@ -2191,6 +2224,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs2; + conv_stmt = conv1_stmt; } else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR) { @@ -2198,6 +2232,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs1; + conv_stmt = conv2_stmt; } else return false; @@ -2208,6 +2243,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) return false; + /* If there was a conversion between the multiply and addition + then we need to make sure it fits a multiply-and-accumulate. + The should be a single mode change which does not change the + value. */ + if (conv_stmt) +{ + tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt)); + tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt)); + int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); + bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2); + + if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type)) + { + /* Conversion is a truncate. */ + if (TYPE_PRECISION (to_type) < data_size) + return false; + } + else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type)) + { + /* Conversion is an extend. Check it's the right sort. */ + if (TYPE_UNSIGNED (from_type) != is_unsigned + && !(is_unsigned && TYPE_PRECISION (from_type) > data_size)) + return false; + } + /* else convert is a no-op for our purposes. */ +} + /* Verify that the machine can perform a widening multiply accumulate in this mode/signedness combination, otherwise this transformation is likely to pessimize code. */