PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread H.J. Lu
Hi,

When combine tries to combine:

(insn 37 35 39 3 (set (reg:SI 90)
(plus:SI (mult:SI (reg/v:SI 84 [ i ])
(const_int 4 [0x4]))
(reg:SI 106))) x.i:11 247 {*leasi_2}
 (nil))

(insn 39 37 41 3 (set (mem:SI (zero_extend:DI (reg:SI 90)) [3
MEM[symbol: x,
index: D.2741_12, step: 4, offset: 4294967292B]+0 S4 A32])
(reg/v:SI 84 [ i ])) x.i:11 64 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 90)
(nil)))

it optimizes

(zero_extend:DI (plus:SI (mult:SI (reg/v:SI 84 [ i ])
(const_int 4 [0x4]))
(reg:SI 106)))

into

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc])) 

in make_compound_operation.  X86 backend doesn't accept the new
expression as valid address while (zero_extend:DI) works just fine.
This patches keeps ZERO_EXTEND when zero-extending address to Pmode.
It reduces number of lea from 24173 to 21428 in x32 libgfortran.so.
Does it make any senses?

Thanks.

H.J.
---
2011-10-12  H.J. Lu  

PR rtl-optimization/50696
* combine.c (subst): If an address is zero-extended to Pmode,
replace FROM with while keeping ZERO_EXTEND.

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..45180e5 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5078,6 +5078,23 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
}
}
 }
+#ifdef POINTERS_EXTEND_UNSIGNED
+  else if (POINTERS_EXTEND_UNSIGNED > 0
+  && code == MEM
+  && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+  && GET_MODE (XEXP (x, 0)) == Pmode)
+{
+  /* If an address is zero-extended to Pmode, replace FROM with
+TO while keeping ZERO_EXTEND.  */
+  new_rtx = subst (XEXP (XEXP (x, 0), 0), from, to, 0, 0,
+  unique_copy);
+  /* Drop ZERO_EXTEND on constant.  */
+  if (CONST_INT_P (new_rtx))
+   SUBST (XEXP (x, 0), new_rtx);
+  else
+   SUBST (XEXP (XEXP (x, 0), 0), new_rtx);
+}
+#endif
   else
 {
   len = GET_RTX_LENGTH (code);


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread Richard Kenner
> X86 backend doesn't accept the new expression as valid address while
> (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND
> when zero-extending address to Pmode.  It reduces number of lea from
> 24173 to 21428 in x32 libgfortran.so.  Does it make any senses?

I'd be inclined to have the x86 backend accept combine's canonicalized
form rather than doing a patch such as this.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread H.J. Lu
On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner
 wrote:
>> X86 backend doesn't accept the new expression as valid address while
>> (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND
>> when zero-extending address to Pmode.  It reduces number of lea from
>> 24173 to 21428 in x32 libgfortran.so.  Does it make any senses?
>
> I'd be inclined to have the x86 backend accept combine's canonicalized
> form rather than doing a patch such as this.
>

The address format generated by combine is very unusual in
2 aspecst:

1. The placement of subreg in

(plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))

isn't supported by x86 backend.

2. The biggest problem is optimizing mask 0x to
0xfffc by keeping track of non-zero bits in registers.
X86 backend doesn't have such information to know
ADDR & 0xfffc == ADDR & 0x.


-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread Richard Kenner
> 1. The placement of subreg in
> 
> (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
> (const_int 4 [0x4])) 0)
> (subreg:DI (reg:SI 106) 0))
> 
> isn't supported by x86 backend.

That's easy to fix.

> 2. The biggest problem is optimizing mask 0x to
> 0xfffc by keeping track of non-zero bits in registers.
> X86 backend doesn't have such information to know
> ADDR & 0xfffc == ADDR & 0x.

But this indeed isn't.

I withdraw my comment.  

I still don't like the patch, but I'm no longer as familiar with the code
as I used to be so can't suggest a replacement.  Let's see what others 
think about it.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini

On 10/13/2011 01:04 AM, Richard Kenner wrote:


I still don't like the patch, but I'm no longer as familiar with the code
as I used to be so can't suggest a replacement.  Let's see what others
think about it.


Same here, I don't like it but I hardly see any alternative.  The only 
possibility could be to prevent calling expand_compound_operation 
completely for addresses.  Richard, what do you think?  Don't worry, 
combine hasn't changed much since your days. :)


Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> Same here, I don't like it but I hardly see any alternative.  The only 
> possibility could be to prevent calling expand_compound_operation 
> completely for addresses.  Richard, what do you think?  Don't worry, 
> combine hasn't changed much since your days. :)

The problem wasn't potential changes to combine, but my memory ...

But now I refreshed it and understand the issue.

expand_compound_operation and make_compond_operation were meant to be used
as a pair.  You first do the former, then see if the result can be
simplified, then call the latter in case it couldn't.

In the SET case, we call the latter in simplify_set.  But there's also this
code in combine_simplify_rtx:

case MEM:
  /* Ensure that our address has any ASHIFTs converted to MULT in case
 address-recognizing predicates are called later.  */
  temp = make_compound_operation (XEXP (x, 0), MEM);
  SUBST (XEXP (x, 0), temp);
  break;

THAT'S the code that should do the transformation that this patch contains.
So I'd suggest doing some debugging and seeing why it isn't.  This could
just be a bug in make_compound_operation not handling the SUBREG.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini

On 10/13/2011 02:51 PM, Richard Kenner wrote:

 case MEM:
   /* Ensure that our address has any ASHIFTs converted to MULT in case
  address-recognizing predicates are called later.  */
   temp = make_compound_operation (XEXP (x, 0), MEM);
   SUBST (XEXP (x, 0), temp);
   break;

THAT'S the code that should do the transformation that this patch contains.
So I'd suggest doing some debugging and seeing why it isn't.  This could
just be a bug in make_compound_operation not handling the SUBREG.


Or being fooled by the 0xfffc masking, perhaps.

Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> Or being fooled by the 0xfffc masking, perhaps.

No, I'm pretty sure that's NOT the case.  The *whole point* of the
routine is to deal with that masking.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 7:14 AM, Richard Kenner
 wrote:
>> Or being fooled by the 0xfffc masking, perhaps.
>
> No, I'm pretty sure that's NOT the case.  The *whole point* of the
> routine is to deal with that masking.
>

I got

(gdb) step
make_compound_operation (x=0x7139c4c8, in_code=MEM)
at /export/gnu/import/git/gcc/gcc/combine.c:7572
7572  enum rtx_code code = GET_CODE (x);
(gdb) call debug_rtx (x)
(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc]))

and it produces

(gdb) call debug_rtx (x)
(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc]))

at the end.  make_compound_operation doesn't know how to
restore ZERO_EXTEND.

BTW, there is a small testcase at

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696

You can reproduce it on Linux/x86-64.

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> at the end.  make_compound_operation doesn't know how to
> restore ZERO_EXTEND.

It does in general.  See make_extraction, which it calls.  The question is
why it doesn't in this case.  That's the bug.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 9:11 AM, Richard Kenner
 wrote:
>> at the end.  make_compound_operation doesn't know how to
>> restore ZERO_EXTEND.
>
> It does in general.  See make_extraction, which it calls.  The question is
> why it doesn't in this case.  That's the bug.
>

It never calls make_extraction.  There are several cases handled
for AND operation. But

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
   (const_int 4 [0x4])) 0)
   (subreg:DI (reg:SI 106) 0))
   (const_int 4294967292 [0xfffc]))

isn't one of them.

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> It never calls make_extraction.  There are several cases handled
> for AND operation. But
> 
> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>(const_int 4 [0x4])) 0)
>(subreg:DI (reg:SI 106) 0))
>(const_int 4294967292 [0xfffc]))
> 
> isn't one of them.

Yes, clearly.  Otherwise it would work!  The correct fix for this problem
is to make it to do that.  That's where this needs to be fixed: in
make_compound_operation.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini

On 10/13/2011 06:35 PM, Richard Kenner wrote:

It never calls make_extraction.  There are several cases handled
for AND operation. But

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc]))

isn't one of them.


Yes, clearly.  Otherwise it would work!  The correct fix for this problem
is to make it to do that.  That's where this needs to be fixed: in
make_compound_operation.


An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is 
correct in not doing this transformation.  I think adding a case to 
make_compound_operation that simply undoes the transformation (without 
calling make_extraction) is fine if you guard it with if (in_code == MEM).


Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> An and:DI is cheaper than a zero_extend:DI of an and:SI.  

That depends strongly on the constants and whether the machine is 32-bit
or 64-bit. 

But that's irrelevant in this case since the and:SI will be removed (it
reflects what already been done).


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini  wrote:
> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>>>
>>> It never calls make_extraction.  There are several cases handled
>>> for AND operation. But
>>>
>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>>>                (const_int 4 [0x4])) 0)
>>>        (subreg:DI (reg:SI 106) 0))
>>>    (const_int 4294967292 [0xfffc]))
>>>
>>> isn't one of them.
>>
>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>> is to make it to do that.  That's where this needs to be fixed: in
>> make_compound_operation.
>
> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
> in not doing this transformation.  I think adding a case to
> make_compound_operation that simply undoes the transformation (without
> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>

We first expand zero_extend:DI address to and:DI and then try
to restore zero_extend:DI.   Why do we do this transformation
to begin with?


-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On Thu, Oct 13, 2011 at 19:19, H.J. Lu  wrote:
> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini  wrote:
>> On 10/13/2011 06:35 PM, Richard Kenner wrote:

 It never calls make_extraction.  There are several cases handled
 for AND operation. But

 (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
                (const_int 4 [0x4])) 0)
        (subreg:DI (reg:SI 106) 0))
    (const_int 4294967292 [0xfffc]))

 isn't one of them.
>>>
>>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>>> is to make it to do that.  That's where this needs to be fixed: in
>>> make_compound_operation.
>>
>> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
>> in not doing this transformation.  I think adding a case to
>> make_compound_operation that simply undoes the transformation (without
>> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>>
>
> We first expand zero_extend:DI address to and:DI and then try
> to restore zero_extend:DI.   Why do we do this transformation
> to begin with?

Because outside of a MEM it may be beneficial _not_ to restore
zero_extend:DI in this case (depending on rtx_costs).

Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 10:21 AM, Paolo Bonzini  wrote:
> On Thu, Oct 13, 2011 at 19:19, H.J. Lu  wrote:
>> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini  wrote:
>>> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>
> It never calls make_extraction.  There are several cases handled
> for AND operation. But
>
> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>                (const_int 4 [0x4])) 0)
>        (subreg:DI (reg:SI 106) 0))
>    (const_int 4294967292 [0xfffc]))
>
> isn't one of them.

 Yes, clearly.  Otherwise it would work!  The correct fix for this problem
 is to make it to do that.  That's where this needs to be fixed: in
 make_compound_operation.
>>>
>>> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
>>> in not doing this transformation.  I think adding a case to
>>> make_compound_operation that simply undoes the transformation (without
>>> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>>>
>>
>> We first expand zero_extend:DI address to and:DI and then try
>> to restore zero_extend:DI.   Why do we do this transformation
>> to begin with?
>
> Because outside of a MEM it may be beneficial _not_ to restore
> zero_extend:DI in this case (depending on rtx_costs).
>

Why do we do it for MEM then?

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On Thu, Oct 13, 2011 at 19:06, Richard Kenner
 wrote:
>> An and:DI is cheaper than a zero_extend:DI of an and:SI.
>
> That depends strongly on the constants and whether the machine is 32-bit
> or 64-bit.

Yes, the rtx_costs take care of that.

> But that's irrelevant in this case since the and:SI will be removed (it
> reflects what already been done).

Do you refer to this in make_extraction:

  /* See if this can be done without an extraction.  We never can if the
 width of the field is not the same as that of some integer mode. For
 registers, we can only avoid the extraction if the position is at the
 low-order bit and this is either not in the destination or we have the
 appropriate STRICT_LOW_PART operation available.  */

and this call to force_to_mode in particular:

new_rtx = force_to_mode (inner, tmode,
 len >= HOST_BITS_PER_WIDE_INT
 ? ~(unsigned HOST_WIDE_INT) 0
 : ((unsigned HOST_WIDE_INT) 1 << len) - 1,
 0);

and from there the call to simplify_and_const_int that does this:

  if (constop == nonzero)
return varop;

?

Then indeed it should work if you call make_extraction more greedily
than what we do now (which is, just if the constant is one less than a
power of two).

The answer to H.J.'s "Why do we do it for MEM then?" is simply
"because no one ever thought about not doing it" (because there are no
other POINTERS_EXTEND_UNSIGNED == 1 machines).  In fact it may even be
advantageous to do it in general, even if in_code != MEM.  Only
experimentation can tell.

Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> We first expand zero_extend:DI address to and:DI and then try
> to restore zero_extend:DI.   Why do we do this transformation
> to begin with?

Suppose there were an outer AND that duplicated what this one did.
Then when you combine those two, you merge it to one AND.  Then
make_compound_operation puts it back.  The net result is to eliminate
the outer AND.  There are lots of similar sorts of things.

As I said, the strategy there was to convert extractions and expansions
into the corresponding logical and shift operations, see if they can
merge with something outside (which is similarly converted), then convert
the result (possibly merged) back.

This, for example, is the code that will remove nested SIGN_EXTENDs.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> The answer to H.J.'s "Why do we do it for MEM then?" is simply
> "because no one ever thought about not doing it" 

No, that's false.  The same expand_compound_operation / make_compound_operation
pair is present in the MEM case as in the SET case.  It's just that
there's some bug here that's noticable in not making proper MEMs that
doesn't show up in the SET case because of the way the insns are structured.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
 wrote:
>> The answer to H.J.'s "Why do we do it for MEM then?" is simply
>> "because no one ever thought about not doing it"
>
> No, that's false.  The same expand_compound_operation / 
> make_compound_operation
> pair is present in the MEM case as in the SET case.  It's just that
> there's some bug here that's noticable in not making proper MEMs that
> doesn't show up in the SET case because of the way the insns are structured.
>

When we have (and (OP) M) where

(and (OP) M) == (and (OP) ((1 << ceil_log2 (M)) - 1) ))

(and (OP) M) is zero_extract bits 0 to ceil_log2 (M).

Does it look OK?

Thanks.

-- 
H.J.
---
diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..5962b1d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7758,6 +7758,23 @@ make_compound_operation (rtx x, enum rtx_code in_code)
next_code),
   i, NULL_RTX, 1, 1, 0, 1);

+
+  /* If we are (and (OP) M) and M is an extraction mask, this is an
+extraction.  */
+  else
+   {
+ unsigned HOST_WIDE_INT nonzero =
+   nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+ unsigned HOST_WIDE_INT mask = INTVAL (XEXP (x, 1));
+ unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+ if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+ == (nonzero & mask))
+   {
+ new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+len, 1, 0, in_code == COMPARE);
+   }
+   }
   break;

 case LSHIFTRT:


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> Does it look OK?

No.  

If I understand your code correctly, there's essentially the same code
as you have a bit above that:

  /* If the constant is one less than a power of two, this might be 
 representable by an extraction even if no shift is present. 
 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
 we are in a COMPARE.  */
  else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0)
new_rtx = make_extraction (mode,
   make_compound_operation (XEXP (x, 0),
next_code),
   0, NULL_RTX, i, 1, 0, in_code == COMPARE);

So you need to understand why your code "fires" and it doesn't.



Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 2:23 PM, Richard Kenner
 wrote:
>> Does it look OK?
>
> No.
>
> If I understand your code correctly, there's essentially the same code
> as you have a bit above that:
>
>      /* If the constant is one less than a power of two, this might be
>         representable by an extraction even if no shift is present.
>         If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>         we are in a COMPARE.  */
>      else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0)
>        new_rtx = make_extraction (mode,
>                               make_compound_operation (XEXP (x, 0),
>                                                        next_code),
>                               0, NULL_RTX, i, 1, 0, in_code == COMPARE);
>
> So you need to understand why your code "fires" and it doesn't.
>
>

It is because mask 0x is optimized to 0xfffc by keeping track
of non-zero bits in registers and the above code doesn't take that
into account.

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> It is because mask 0x is optimized to 0xfffc by keeping track
> of non-zero bits in registers and the above code doesn't take that
> into account.

Then I'd suggest modifying that code so that it does rather than
essentially duplicating it.  But I'd recommend running some
performance tests to verify that you're not pessimizing things when
you do that: this stuff can be very tricky and you want to make sure
that you're not converting something like (and X 3) into a bit
extraction unnecessarily.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner
 wrote:
>> It is because mask 0x is optimized to 0xfffc by keeping track
>> of non-zero bits in registers and the above code doesn't take that
>> into account.
>
> Then I'd suggest modifying that code so that it does rather than
> essentially duplicating it.  But I'd recommend running some
> performance tests to verify that you're not pessimizing things when
> you do that: this stuff can be very tricky and you want to make sure
> that you're not converting something like (and X 3) into a bit
> extraction unnecessarily.
>

But the current code converts (and X 3) into a bit extraction
since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
when UINTVAL (XEXP (x, 1))  == 3.  Should we do it or not?

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 2:45 PM, H.J. Lu  wrote:
> On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner
>  wrote:
>>> It is because mask 0x is optimized to 0xfffc by keeping track
>>> of non-zero bits in registers and the above code doesn't take that
>>> into account.
>>
>> Then I'd suggest modifying that code so that it does rather than
>> essentially duplicating it.  But I'd recommend running some
>> performance tests to verify that you're not pessimizing things when
>> you do that: this stuff can be very tricky and you want to make sure
>> that you're not converting something like (and X 3) into a bit
>> extraction unnecessarily.
>>
>
> But the current code converts (and X 3) into a bit extraction
> since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
> when UINTVAL (XEXP (x, 1))  == 3.  Should we do it or not?
>

I am testing this patch.  The difference is it checks nonzero
bits of the first operand.

-- 
H.J.
--

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..598dee3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 XEXP (XEXP (x, 0), 1)));
}

-  /* If the constant is one less than a power of two, this might be
-representable by an extraction even if no shift is present.
-If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
-we are in a COMPARE.  */
-  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
-   new_rtx = make_extraction (mode,
-  make_compound_operation (XEXP (x, 0),
-   next_code),
-  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
-
   /* If we are in a comparison and this is an AND with a power of two,
 convert this into the appropriate bit extract.  */
   else if (in_code == COMPARE
@@ -7758,6 +7748,23 @@ make_compound_operation (rtx x, enum rtx_code in_code)
next_code),
   i, NULL_RTX, 1, 1, 0, 1);

+  /* If we are (and (OP) M) and M is an extraction mask, this is an
+extraction.  */
+  else
+   {
+ unsigned HOST_WIDE_INT nonzero =
+   nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+ unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1));
+ unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+ if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+ == (nonzero & mask))
+   {
+ new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+len, 1, 0, in_code == COMPARE);
+   }
+   }
+
   break;

 case LSHIFTRT:


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> But the current code converts (and X 3) into a bit extraction
> since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
> when UINTVAL (XEXP (x, 1)) == 3.  Should we do it or not?

By adding the test for nonzero bits, you'd potentially be doing the
conversion more often (which is the point of this patch, after all) and
it's therefore necessary to be sure you're not doing it *too* often.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> I am testing this patch.  The difference is it checks nonzero
> bits of the first operand.

I would suggest moving (and expanding) the comments from the existing block
into your new block.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 3:33 PM, Richard Kenner
 wrote:
>> I am testing this patch.  The difference is it checks nonzero
>> bits of the first operand.
>
> I would suggest moving (and expanding) the comments from the existing block
> into your new block.
>

Like ths?

-- 
H.J.
---
diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..4b57b88 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 XEXP (XEXP (x, 0), 1)));
}

-  /* If the constant is one less than a power of two, this might be
-representable by an extraction even if no shift is present.
-If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
-we are in a COMPARE.  */
-  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
-   new_rtx = make_extraction (mode,
-  make_compound_operation (XEXP (x, 0),
-   next_code),
-  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
-
   /* If we are in a comparison and this is an AND with a power of two,
 convert this into the appropriate bit extract.  */
   else if (in_code == COMPARE
@@ -7758,6 +7748,26 @@ make_compound_operation (rtx x, enum rtx_code in_code)
next_code),
   i, NULL_RTX, 1, 1, 0, 1);

+  /* If the constant is an extraction mask with the zero bits in
+the first operand ignored, this might be representable by an
+extraction even if no shift is present.  If it doesn't end up
+being a ZERO_EXTEND, we will ignore it unless we are in a
+COMPARE.  */
+  else
+   {
+ unsigned HOST_WIDE_INT nonzero =
+   nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+ unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1));
+ unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+ if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+ == (nonzero & mask))
+   {
+ new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+len, 1, 0, in_code == COMPARE);
+   }
+   }
+
   break;

 case LSHIFTRT:


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
> Like ths?

Yes, that's what I meant.  Thanks.

Again, I'd suggest doing some performance testing on this just to verify
that it doesn't pessimize things.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 3:52 PM, Richard Kenner
 wrote:
>> Like ths?
>
> Yes, that's what I meant.  Thanks.
>
> Again, I'd suggest doing some performance testing on this just to verify
> that it doesn't pessimize things.
>

I will run SPEC CPU 2K/2006 on ia32, x86-64 and x32.

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini

On 10/13/2011 10:07 PM, H.J. Lu wrote:

On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
  wrote:

The answer to H.J.'s "Why do we do it for MEM then?" is simply
"because no one ever thought about not doing it"


No, that's false.  The same expand_compound_operation / make_compound_operation
pair is present in the MEM case as in the SET case.  It's just that
there's some bug here that's noticable in not making proper MEMs that
doesn't show up in the SET case because of the way the insns are structured.



When we have (and (OP) M) where

(and (OP) M) == (and (OP) ((1<<  ceil_log2 (M)) - 1) ))

(and (OP) M) is zero_extract bits 0 to ceil_log2 (M).

Does it look OK?


Yes, it does.  How did you test it?

Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread H.J. Lu
On Thu, Oct 13, 2011 at 11:51 PM, Paolo Bonzini  wrote:
> On 10/13/2011 10:07 PM, H.J. Lu wrote:
>>
>> On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
>>   wrote:

 The answer to H.J.'s "Why do we do it for MEM then?" is simply
 "because no one ever thought about not doing it"
>>>
>>> No, that's false.  The same expand_compound_operation /
>>> make_compound_operation
>>> pair is present in the MEM case as in the SET case.  It's just that
>>> there's some bug here that's noticable in not making proper MEMs that
>>> doesn't show up in the SET case because of the way the insns are
>>> structured.
>>>
>>
>> When we have (and (OP) M) where
>>
>> (and (OP) M) == (and (OP) ((1<<  ceil_log2 (M)) - 1) ))
>>
>> (and (OP) M) is zero_extract bits 0 to ceil_log2 (M).
>>
>> Does it look OK?
>
> Yes, it does.  How did you test it?
>

There is a testcase at

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696

It passes with my patch.

-- 
H.J.


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread Paolo Bonzini

On 10/14/2011 05:36 PM, H.J. Lu wrote:

There is a testcase at

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696

It passes with my patch.


Cool, so let's wait for the results of testing.

Paolo


Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread H.J. Lu
On Fri, Oct 14, 2011 at 9:23 AM, Paolo Bonzini  wrote:
> On 10/14/2011 05:36 PM, H.J. Lu wrote:
>>
>> There is a testcase at
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696
>>
>> It passes with my patch.
>
> Cool, so let's wait for the results of testing.
>
> Paolo
>

Here is the complete patch with a testcase. I will
check it in if there are no performance regressions
with SPEC CPU 2K/2006 on Linux/ia32/x86-64/x32.

Thanks.

-- 
H.J.
---
gcc/

2011-10-13  H.J. Lu  

PR rtl-optimization/50696
* combine.c (make_compound_operation): Turn (and (OP) M) into
extraction if M is an extraction mask.

gcc/testsuite/

2011-10-13  H.J. Lu  

PR rtl-optimization/50696
* gcc.target/i386/pr50696.c: New.
gcc/

2011-10-13  H.J. Lu  

	PR rtl-optimization/50696
	* combine.c (make_compound_operation): Turn (and (OP) M) into
	extraction if M is an extraction mask.

gcc/testsuite/

2011-10-13  H.J. Lu  

	PR rtl-optimization/50696
	* gcc.target/i386/pr50696.c: New.

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..4b57b88 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
  XEXP (XEXP (x, 0), 1)));
 	}
 
-  /* If the constant is one less than a power of two, this might be
-	 representable by an extraction even if no shift is present.
-	 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
-	 we are in a COMPARE.  */
-  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
-	new_rtx = make_extraction (mode,
-			   make_compound_operation (XEXP (x, 0),
-			next_code),
-			   0, NULL_RTX, i, 1, 0, in_code == COMPARE);
-
   /* If we are in a comparison and this is an AND with a power of two,
 	 convert this into the appropriate bit extract.  */
   else if (in_code == COMPARE
@@ -7758,6 +7748,26 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 			next_code),
 			   i, NULL_RTX, 1, 1, 0, 1);
 
+  /* If the constant is an extraction mask with the zero bits in
+	 the first operand ignored, this might be representable by an
+	 extraction even if no shift is present.  If it doesn't end up
+	 being a ZERO_EXTEND, we will ignore it unless we are in a
+	 COMPARE.  */
+  else
+	{
+	  unsigned HOST_WIDE_INT nonzero =
+	nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+	  unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1));
+	  unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+	  if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+	  == (nonzero & mask))
+	{
+	  new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+	  new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+	 len, 1, 0, in_code == COMPARE);
+	}
+	}
+
   break;
 
 case LSHIFTRT:
diff --git a/gcc/testsuite/gcc.target/i386/pr50696.c b/gcc/testsuite/gcc.target/i386/pr50696.c
new file mode 100644
index 000..b1ec2c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr50696.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { x32 } } } */
+/* { dg-options "-O2 -mtune=generic" } */
+
+struct s { int val[16]; };
+
+extern void f (struct s pb);
+
+void
+foo ()
+{
+  struct s x;
+  int i;
+
+  for (i = 0; i < 16; i++)
+x.val[i] = i + 1;
+  f (x);
+}
+
+/* { dg-final { scan-assembler-not "lea\[lq\]" } } */