Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Fri, Oct 14, 2011 at 9:23 AM, Paolo Bonzini wrote: > On 10/14/2011 05:36 PM, H.J. Lu wrote: >> >> There is a testcase at >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 >> >> It passes with my patch. > > Cool, so let's wait for the results of testing. > > Paolo > Here is the complete patch with a testcase. I will check it in if there are no performance regressions with SPEC CPU 2K/2006 on Linux/ia32/x86-64/x32. Thanks. -- H.J. --- gcc/ 2011-10-13 H.J. Lu PR rtl-optimization/50696 * combine.c (make_compound_operation): Turn (and (OP) M) into extraction if M is an extraction mask. gcc/testsuite/ 2011-10-13 H.J. Lu PR rtl-optimization/50696 * gcc.target/i386/pr50696.c: New. gcc/ 2011-10-13 H.J. Lu PR rtl-optimization/50696 * combine.c (make_compound_operation): Turn (and (OP) M) into extraction if M is an extraction mask. gcc/testsuite/ 2011-10-13 H.J. Lu PR rtl-optimization/50696 * gcc.target/i386/pr50696.c: New. diff --git a/gcc/combine.c b/gcc/combine.c index 6c3b17c..4b57b88 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) XEXP (XEXP (x, 0), 1))); } - /* If the constant is one less than a power of two, this might be - representable by an extraction even if no shift is present. - If it doesn't end up being a ZERO_EXTEND, we will ignore it unless - we are in a COMPARE. */ - else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) - new_rtx = make_extraction (mode, - make_compound_operation (XEXP (x, 0), - next_code), - 0, NULL_RTX, i, 1, 0, in_code == COMPARE); - /* If we are in a comparison and this is an AND with a power of two, convert this into the appropriate bit extract. */ else if (in_code == COMPARE @@ -7758,6 +7748,26 @@ make_compound_operation (rtx x, enum rtx_code in_code) next_code), i, NULL_RTX, 1, 1, 0, 1); + /* If the constant is an extraction mask with the zero bits in + the first operand ignored, this might be representable by an + extraction even if no shift is present. If it doesn't end up + being a ZERO_EXTEND, we will ignore it unless we are in a + COMPARE. */ + else + { + unsigned HOST_WIDE_INT nonzero = + nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0))); + unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1)); + unsigned HOST_WIDE_INT len = ceil_log2 (mask); + if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1)) + == (nonzero & mask)) + { + new_rtx = make_compound_operation (XEXP (x, 0), next_code); + new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX, + len, 1, 0, in_code == COMPARE); + } + } + break; case LSHIFTRT: diff --git a/gcc/testsuite/gcc.target/i386/pr50696.c b/gcc/testsuite/gcc.target/i386/pr50696.c new file mode 100644 index 000..b1ec2c5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr50696.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target { x32 } } } */ +/* { dg-options "-O2 -mtune=generic" } */ + +struct s { int val[16]; }; + +extern void f (struct s pb); + +void +foo () +{ + struct s x; + int i; + + for (i = 0; i < 16; i++) +x.val[i] = i + 1; + f (x); +} + +/* { dg-final { scan-assembler-not "lea\[lq\]" } } */
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On 10/14/2011 05:36 PM, H.J. Lu wrote: There is a testcase at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 It passes with my patch. Cool, so let's wait for the results of testing. Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 11:51 PM, Paolo Bonzini wrote: > On 10/13/2011 10:07 PM, H.J. Lu wrote: >> >> On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner >> wrote: The answer to H.J.'s "Why do we do it for MEM then?" is simply "because no one ever thought about not doing it" >>> >>> No, that's false. The same expand_compound_operation / >>> make_compound_operation >>> pair is present in the MEM case as in the SET case. It's just that >>> there's some bug here that's noticable in not making proper MEMs that >>> doesn't show up in the SET case because of the way the insns are >>> structured. >>> >> >> When we have (and (OP) M) where >> >> (and (OP) M) == (and (OP) ((1<< ceil_log2 (M)) - 1) )) >> >> (and (OP) M) is zero_extract bits 0 to ceil_log2 (M). >> >> Does it look OK? > > Yes, it does. How did you test it? > There is a testcase at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 It passes with my patch. -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On 10/13/2011 10:07 PM, H.J. Lu wrote: On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner wrote: The answer to H.J.'s "Why do we do it for MEM then?" is simply "because no one ever thought about not doing it" No, that's false. The same expand_compound_operation / make_compound_operation pair is present in the MEM case as in the SET case. It's just that there's some bug here that's noticable in not making proper MEMs that doesn't show up in the SET case because of the way the insns are structured. When we have (and (OP) M) where (and (OP) M) == (and (OP) ((1<< ceil_log2 (M)) - 1) )) (and (OP) M) is zero_extract bits 0 to ceil_log2 (M). Does it look OK? Yes, it does. How did you test it? Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 3:52 PM, Richard Kenner wrote: >> Like ths? > > Yes, that's what I meant. Thanks. > > Again, I'd suggest doing some performance testing on this just to verify > that it doesn't pessimize things. > I will run SPEC CPU 2K/2006 on ia32, x86-64 and x32. -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> Like ths? Yes, that's what I meant. Thanks. Again, I'd suggest doing some performance testing on this just to verify that it doesn't pessimize things.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 3:33 PM, Richard Kenner wrote: >> I am testing this patch. The difference is it checks nonzero >> bits of the first operand. > > I would suggest moving (and expanding) the comments from the existing block > into your new block. > Like ths? -- H.J. --- diff --git a/gcc/combine.c b/gcc/combine.c index 6c3b17c..4b57b88 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) XEXP (XEXP (x, 0), 1))); } - /* If the constant is one less than a power of two, this might be -representable by an extraction even if no shift is present. -If it doesn't end up being a ZERO_EXTEND, we will ignore it unless -we are in a COMPARE. */ - else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) - new_rtx = make_extraction (mode, - make_compound_operation (XEXP (x, 0), - next_code), - 0, NULL_RTX, i, 1, 0, in_code == COMPARE); - /* If we are in a comparison and this is an AND with a power of two, convert this into the appropriate bit extract. */ else if (in_code == COMPARE @@ -7758,6 +7748,26 @@ make_compound_operation (rtx x, enum rtx_code in_code) next_code), i, NULL_RTX, 1, 1, 0, 1); + /* If the constant is an extraction mask with the zero bits in +the first operand ignored, this might be representable by an +extraction even if no shift is present. If it doesn't end up +being a ZERO_EXTEND, we will ignore it unless we are in a +COMPARE. */ + else + { + unsigned HOST_WIDE_INT nonzero = + nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0))); + unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1)); + unsigned HOST_WIDE_INT len = ceil_log2 (mask); + if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1)) + == (nonzero & mask)) + { + new_rtx = make_compound_operation (XEXP (x, 0), next_code); + new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX, +len, 1, 0, in_code == COMPARE); + } + } + break; case LSHIFTRT:
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> I am testing this patch. The difference is it checks nonzero > bits of the first operand. I would suggest moving (and expanding) the comments from the existing block into your new block.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> But the current code converts (and X 3) into a bit extraction > since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true > when UINTVAL (XEXP (x, 1)) == 3. Should we do it or not? By adding the test for nonzero bits, you'd potentially be doing the conversion more often (which is the point of this patch, after all) and it's therefore necessary to be sure you're not doing it *too* often.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 2:45 PM, H.J. Lu wrote: > On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner > wrote: >>> It is because mask 0x is optimized to 0xfffc by keeping track >>> of non-zero bits in registers and the above code doesn't take that >>> into account. >> >> Then I'd suggest modifying that code so that it does rather than >> essentially duplicating it. But I'd recommend running some >> performance tests to verify that you're not pessimizing things when >> you do that: this stuff can be very tricky and you want to make sure >> that you're not converting something like (and X 3) into a bit >> extraction unnecessarily. >> > > But the current code converts (and X 3) into a bit extraction > since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true > when UINTVAL (XEXP (x, 1)) == 3. Should we do it or not? > I am testing this patch. The difference is it checks nonzero bits of the first operand. -- H.J. -- diff --git a/gcc/combine.c b/gcc/combine.c index 6c3b17c..598dee3 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) XEXP (XEXP (x, 0), 1))); } - /* If the constant is one less than a power of two, this might be -representable by an extraction even if no shift is present. -If it doesn't end up being a ZERO_EXTEND, we will ignore it unless -we are in a COMPARE. */ - else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) - new_rtx = make_extraction (mode, - make_compound_operation (XEXP (x, 0), - next_code), - 0, NULL_RTX, i, 1, 0, in_code == COMPARE); - /* If we are in a comparison and this is an AND with a power of two, convert this into the appropriate bit extract. */ else if (in_code == COMPARE @@ -7758,6 +7748,23 @@ make_compound_operation (rtx x, enum rtx_code in_code) next_code), i, NULL_RTX, 1, 1, 0, 1); + /* If we are (and (OP) M) and M is an extraction mask, this is an +extraction. */ + else + { + unsigned HOST_WIDE_INT nonzero = + nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0))); + unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1)); + unsigned HOST_WIDE_INT len = ceil_log2 (mask); + if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1)) + == (nonzero & mask)) + { + new_rtx = make_compound_operation (XEXP (x, 0), next_code); + new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX, +len, 1, 0, in_code == COMPARE); + } + } + break; case LSHIFTRT:
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner wrote: >> It is because mask 0x is optimized to 0xfffc by keeping track >> of non-zero bits in registers and the above code doesn't take that >> into account. > > Then I'd suggest modifying that code so that it does rather than > essentially duplicating it. But I'd recommend running some > performance tests to verify that you're not pessimizing things when > you do that: this stuff can be very tricky and you want to make sure > that you're not converting something like (and X 3) into a bit > extraction unnecessarily. > But the current code converts (and X 3) into a bit extraction since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true when UINTVAL (XEXP (x, 1)) == 3. Should we do it or not? -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> It is because mask 0x is optimized to 0xfffc by keeping track > of non-zero bits in registers and the above code doesn't take that > into account. Then I'd suggest modifying that code so that it does rather than essentially duplicating it. But I'd recommend running some performance tests to verify that you're not pessimizing things when you do that: this stuff can be very tricky and you want to make sure that you're not converting something like (and X 3) into a bit extraction unnecessarily.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 2:23 PM, Richard Kenner wrote: >> Does it look OK? > > No. > > If I understand your code correctly, there's essentially the same code > as you have a bit above that: > > /* If the constant is one less than a power of two, this might be > representable by an extraction even if no shift is present. > If it doesn't end up being a ZERO_EXTEND, we will ignore it unless > we are in a COMPARE. */ > else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0) > new_rtx = make_extraction (mode, > make_compound_operation (XEXP (x, 0), > next_code), > 0, NULL_RTX, i, 1, 0, in_code == COMPARE); > > So you need to understand why your code "fires" and it doesn't. > > It is because mask 0x is optimized to 0xfffc by keeping track of non-zero bits in registers and the above code doesn't take that into account. -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> Does it look OK? No. If I understand your code correctly, there's essentially the same code as you have a bit above that: /* If the constant is one less than a power of two, this might be representable by an extraction even if no shift is present. If it doesn't end up being a ZERO_EXTEND, we will ignore it unless we are in a COMPARE. */ else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0) new_rtx = make_extraction (mode, make_compound_operation (XEXP (x, 0), next_code), 0, NULL_RTX, i, 1, 0, in_code == COMPARE); So you need to understand why your code "fires" and it doesn't.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner wrote: >> The answer to H.J.'s "Why do we do it for MEM then?" is simply >> "because no one ever thought about not doing it" > > No, that's false. The same expand_compound_operation / > make_compound_operation > pair is present in the MEM case as in the SET case. It's just that > there's some bug here that's noticable in not making proper MEMs that > doesn't show up in the SET case because of the way the insns are structured. > When we have (and (OP) M) where (and (OP) M) == (and (OP) ((1 << ceil_log2 (M)) - 1) )) (and (OP) M) is zero_extract bits 0 to ceil_log2 (M). Does it look OK? Thanks. -- H.J. --- diff --git a/gcc/combine.c b/gcc/combine.c index 6c3b17c..5962b1d 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7758,6 +7758,23 @@ make_compound_operation (rtx x, enum rtx_code in_code) next_code), i, NULL_RTX, 1, 1, 0, 1); + + /* If we are (and (OP) M) and M is an extraction mask, this is an +extraction. */ + else + { + unsigned HOST_WIDE_INT nonzero = + nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0))); + unsigned HOST_WIDE_INT mask = INTVAL (XEXP (x, 1)); + unsigned HOST_WIDE_INT len = ceil_log2 (mask); + if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1)) + == (nonzero & mask)) + { + new_rtx = make_compound_operation (XEXP (x, 0), next_code); + new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX, +len, 1, 0, in_code == COMPARE); + } + } break; case LSHIFTRT:
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> The answer to H.J.'s "Why do we do it for MEM then?" is simply > "because no one ever thought about not doing it" No, that's false. The same expand_compound_operation / make_compound_operation pair is present in the MEM case as in the SET case. It's just that there's some bug here that's noticable in not making proper MEMs that doesn't show up in the SET case because of the way the insns are structured.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> We first expand zero_extend:DI address to and:DI and then try > to restore zero_extend:DI. Why do we do this transformation > to begin with? Suppose there were an outer AND that duplicated what this one did. Then when you combine those two, you merge it to one AND. Then make_compound_operation puts it back. The net result is to eliminate the outer AND. There are lots of similar sorts of things. As I said, the strategy there was to convert extractions and expansions into the corresponding logical and shift operations, see if they can merge with something outside (which is similarly converted), then convert the result (possibly merged) back. This, for example, is the code that will remove nested SIGN_EXTENDs.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 19:06, Richard Kenner wrote: >> An and:DI is cheaper than a zero_extend:DI of an and:SI. > > That depends strongly on the constants and whether the machine is 32-bit > or 64-bit. Yes, the rtx_costs take care of that. > But that's irrelevant in this case since the and:SI will be removed (it > reflects what already been done). Do you refer to this in make_extraction: /* See if this can be done without an extraction. We never can if the width of the field is not the same as that of some integer mode. For registers, we can only avoid the extraction if the position is at the low-order bit and this is either not in the destination or we have the appropriate STRICT_LOW_PART operation available. */ and this call to force_to_mode in particular: new_rtx = force_to_mode (inner, tmode, len >= HOST_BITS_PER_WIDE_INT ? ~(unsigned HOST_WIDE_INT) 0 : ((unsigned HOST_WIDE_INT) 1 << len) - 1, 0); and from there the call to simplify_and_const_int that does this: if (constop == nonzero) return varop; ? Then indeed it should work if you call make_extraction more greedily than what we do now (which is, just if the constant is one less than a power of two). The answer to H.J.'s "Why do we do it for MEM then?" is simply "because no one ever thought about not doing it" (because there are no other POINTERS_EXTEND_UNSIGNED == 1 machines). In fact it may even be advantageous to do it in general, even if in_code != MEM. Only experimentation can tell. Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 10:21 AM, Paolo Bonzini wrote: > On Thu, Oct 13, 2011 at 19:19, H.J. Lu wrote: >> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini wrote: >>> On 10/13/2011 06:35 PM, Richard Kenner wrote: > > It never calls make_extraction. There are several cases handled > for AND operation. But > > (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) > (const_int 4 [0x4])) 0) > (subreg:DI (reg:SI 106) 0)) > (const_int 4294967292 [0xfffc])) > > isn't one of them. Yes, clearly. Otherwise it would work! The correct fix for this problem is to make it to do that. That's where this needs to be fixed: in make_compound_operation. >>> >>> An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct >>> in not doing this transformation. I think adding a case to >>> make_compound_operation that simply undoes the transformation (without >>> calling make_extraction) is fine if you guard it with if (in_code == MEM). >>> >> >> We first expand zero_extend:DI address to and:DI and then try >> to restore zero_extend:DI. Why do we do this transformation >> to begin with? > > Because outside of a MEM it may be beneficial _not_ to restore > zero_extend:DI in this case (depending on rtx_costs). > Why do we do it for MEM then? -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 19:19, H.J. Lu wrote: > On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini wrote: >> On 10/13/2011 06:35 PM, Richard Kenner wrote: It never calls make_extraction. There are several cases handled for AND operation. But (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffc])) isn't one of them. >>> >>> Yes, clearly. Otherwise it would work! The correct fix for this problem >>> is to make it to do that. That's where this needs to be fixed: in >>> make_compound_operation. >> >> An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct >> in not doing this transformation. I think adding a case to >> make_compound_operation that simply undoes the transformation (without >> calling make_extraction) is fine if you guard it with if (in_code == MEM). >> > > We first expand zero_extend:DI address to and:DI and then try > to restore zero_extend:DI. Why do we do this transformation > to begin with? Because outside of a MEM it may be beneficial _not_ to restore zero_extend:DI in this case (depending on rtx_costs). Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini wrote: > On 10/13/2011 06:35 PM, Richard Kenner wrote: >>> >>> It never calls make_extraction. There are several cases handled >>> for AND operation. But >>> >>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) >>> (const_int 4 [0x4])) 0) >>> (subreg:DI (reg:SI 106) 0)) >>> (const_int 4294967292 [0xfffc])) >>> >>> isn't one of them. >> >> Yes, clearly. Otherwise it would work! The correct fix for this problem >> is to make it to do that. That's where this needs to be fixed: in >> make_compound_operation. > > An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct > in not doing this transformation. I think adding a case to > make_compound_operation that simply undoes the transformation (without > calling make_extraction) is fine if you guard it with if (in_code == MEM). > We first expand zero_extend:DI address to and:DI and then try to restore zero_extend:DI. Why do we do this transformation to begin with? -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> An and:DI is cheaper than a zero_extend:DI of an and:SI. That depends strongly on the constants and whether the machine is 32-bit or 64-bit. But that's irrelevant in this case since the and:SI will be removed (it reflects what already been done).
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On 10/13/2011 06:35 PM, Richard Kenner wrote: It never calls make_extraction. There are several cases handled for AND operation. But (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffc])) isn't one of them. Yes, clearly. Otherwise it would work! The correct fix for this problem is to make it to do that. That's where this needs to be fixed: in make_compound_operation. An and:DI is cheaper than a zero_extend:DI of an and:SI. So GCC is correct in not doing this transformation. I think adding a case to make_compound_operation that simply undoes the transformation (without calling make_extraction) is fine if you guard it with if (in_code == MEM). Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> It never calls make_extraction. There are several cases handled > for AND operation. But > > (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) >(const_int 4 [0x4])) 0) >(subreg:DI (reg:SI 106) 0)) >(const_int 4294967292 [0xfffc])) > > isn't one of them. Yes, clearly. Otherwise it would work! The correct fix for this problem is to make it to do that. That's where this needs to be fixed: in make_compound_operation.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 9:11 AM, Richard Kenner wrote: >> at the end. make_compound_operation doesn't know how to >> restore ZERO_EXTEND. > > It does in general. See make_extraction, which it calls. The question is > why it doesn't in this case. That's the bug. > It never calls make_extraction. There are several cases handled for AND operation. But (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffc])) isn't one of them. -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> at the end. make_compound_operation doesn't know how to > restore ZERO_EXTEND. It does in general. See make_extraction, which it calls. The question is why it doesn't in this case. That's the bug.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Thu, Oct 13, 2011 at 7:14 AM, Richard Kenner wrote: >> Or being fooled by the 0xfffc masking, perhaps. > > No, I'm pretty sure that's NOT the case. The *whole point* of the > routine is to deal with that masking. > I got (gdb) step make_compound_operation (x=0x7139c4c8, in_code=MEM) at /export/gnu/import/git/gcc/gcc/combine.c:7572 7572 enum rtx_code code = GET_CODE (x); (gdb) call debug_rtx (x) (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffc])) and it produces (gdb) call debug_rtx (x) (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292 [0xfffc])) at the end. make_compound_operation doesn't know how to restore ZERO_EXTEND. BTW, there is a small testcase at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 You can reproduce it on Linux/x86-64. -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> Or being fooled by the 0xfffc masking, perhaps. No, I'm pretty sure that's NOT the case. The *whole point* of the routine is to deal with that masking.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On 10/13/2011 02:51 PM, Richard Kenner wrote: case MEM: /* Ensure that our address has any ASHIFTs converted to MULT in case address-recognizing predicates are called later. */ temp = make_compound_operation (XEXP (x, 0), MEM); SUBST (XEXP (x, 0), temp); break; THAT'S the code that should do the transformation that this patch contains. So I'd suggest doing some debugging and seeing why it isn't. This could just be a bug in make_compound_operation not handling the SUBREG. Or being fooled by the 0xfffc masking, perhaps. Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> Same here, I don't like it but I hardly see any alternative. The only > possibility could be to prevent calling expand_compound_operation > completely for addresses. Richard, what do you think? Don't worry, > combine hasn't changed much since your days. :) The problem wasn't potential changes to combine, but my memory ... But now I refreshed it and understand the issue. expand_compound_operation and make_compond_operation were meant to be used as a pair. You first do the former, then see if the result can be simplified, then call the latter in case it couldn't. In the SET case, we call the latter in simplify_set. But there's also this code in combine_simplify_rtx: case MEM: /* Ensure that our address has any ASHIFTs converted to MULT in case address-recognizing predicates are called later. */ temp = make_compound_operation (XEXP (x, 0), MEM); SUBST (XEXP (x, 0), temp); break; THAT'S the code that should do the transformation that this patch contains. So I'd suggest doing some debugging and seeing why it isn't. This could just be a bug in make_compound_operation not handling the SUBREG.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On 10/13/2011 01:04 AM, Richard Kenner wrote: I still don't like the patch, but I'm no longer as familiar with the code as I used to be so can't suggest a replacement. Let's see what others think about it. Same here, I don't like it but I hardly see any alternative. The only possibility could be to prevent calling expand_compound_operation completely for addresses. Richard, what do you think? Don't worry, combine hasn't changed much since your days. :) Paolo
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> 1. The placement of subreg in > > (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) > (const_int 4 [0x4])) 0) > (subreg:DI (reg:SI 106) 0)) > > isn't supported by x86 backend. That's easy to fix. > 2. The biggest problem is optimizing mask 0x to > 0xfffc by keeping track of non-zero bits in registers. > X86 backend doesn't have such information to know > ADDR & 0xfffc == ADDR & 0x. But this indeed isn't. I withdraw my comment. I still don't like the patch, but I'm no longer as familiar with the code as I used to be so can't suggest a replacement. Let's see what others think about it.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner wrote: >> X86 backend doesn't accept the new expression as valid address while >> (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND >> when zero-extending address to Pmode. It reduces number of lea from >> 24173 to 21428 in x32 libgfortran.so. Does it make any senses? > > I'd be inclined to have the x86 backend accept combine's canonicalized > form rather than doing a patch such as this. > The address format generated by combine is very unusual in 2 aspecst: 1. The placement of subreg in (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) isn't supported by x86 backend. 2. The biggest problem is optimizing mask 0x to 0xfffc by keeping track of non-zero bits in registers. X86 backend doesn't have such information to know ADDR & 0xfffc == ADDR & 0x. -- H.J.
Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea
> X86 backend doesn't accept the new expression as valid address while > (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND > when zero-extending address to Pmode. It reduces number of lea from > 24173 to 21428 in x32 libgfortran.so. Does it make any senses? I'd be inclined to have the x86 backend accept combine's canonicalized form rather than doing a patch such as this.