Re: [PATCH]Construct canonical scaled address expression in IVOPT

Richard Biener Thu, 26 Sep 2013 06:23:20 -0700

On Thu, Sep 26, 2013 at 2:10 PM, bin.cheng <[email protected]> wrote:
> Sorry for missing the patch.
>
> Thanks.
> bin
>
>> -----Original Message-----
>> From: [email protected] [mailto:gcc-patches-
>> [email protected]] On Behalf Of bin.cheng
>> Sent: Thursday, September 26, 2013 8:05 PM
>> To: 'Richard Biener'; Bin.Cheng
>> Cc: GCC Patches; Richard Earnshaw
>> Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT
>>
>>
>>
>> > -----Original Message-----
>> > From: Richard Biener [mailto:[email protected]]
>> > Sent: Tuesday, September 24, 2013 7:58 PM
>> > To: Bin.Cheng
>> > Cc: Bin Cheng; GCC Patches; Richard Earnshaw
>> > Subject: Re: [PATCH]Construct canonical scaled address expression in
>> > IVOPT
>> >
>> > On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng <[email protected]>
>> > wrote:
>> > > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
>> > > <[email protected]> wrote:
>> > >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <[email protected]>
>> wrote:
>> > >>>
>> > >>>
>> > >>>> -----Original Message-----
>> > >>
>> > >> Or even [reg*scale] (not sure about that).  But yes, at least
>> > >> reg*scale + offset and reg*scale + reg.
>> > >>
>> > >>>   Apparently it's infeasible to check every possibility for each
>> > >>> architecture, is it ok we at least check "index", then "addr" if
>> > >>> "index" is failed?  By "any kind of addressing modes", I mean
>> > >>> modes supported by function get_address_cost,  i.e., in form of
>> > >>> "[base] + [off] + [var] + (reg|reg*scale)".
>> > >>
>> > >> I suppose so, but it of course depends on what IVOPTs uses the
>> > >> answer for in the end.  Appearantly it doesn't distinguish between
>> > >> the various cases even though TARGET_MEM_REF does support all the
>> > >> variants in question (reg * scale, reg * scale + reg, reg * scale +
>> > >> const, reg * scale + reg + const).
>> > >>
>> > >> So the better answer may be to teach the costs about the differences?
>> > > Ideally, IVOPT should be aware whether scaling is allowed in every
>> > > kind of addressing modes and account cost of multiplier accordingly.
>> > > For current code, there are two scenarios here
>> > > 1) If target supports reg*scale+reg, but not reg*scale, in this
>> > > case, IVOPT considers multiplier is not allowed in any addressing
>> > > mode and account multiplier with high cost.  This is the problem arm
>> having.
>> > > 2) If target supports reg*scale, but not some kind of addressing
>> > > mode (saying reg*scale+reg), in this case, IVOPT still constructs
>> > > various scaled addressing mode in get_address_cost and depends on
>> > > address_cost to compute correct cost for that addressing expression.
>> > > I think this happens to work even IVOPT doesn't know "reg*scale+reg"
>> > > is actually not supported.
>> > >
>> > >>
>> > >>>> The above also builds more RTX waste which you can fix by
>> > >>>> re-using the
>> > >>> PLUS
>> > >>>> by building it up-front similar to the multiplication.  You also
>> > >>>> miss the
>> > >>> Yes, this can be fixed.
>> > >>>
>> > >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2
>> > >>>> is
>> valid.
>> > >>> I
>> > >>>> would expect that many targets support reg1 * scale +
>> > >>>> constant-offset but not many reg1 * scale + reg2.
>> > >>> I thought scale==1 is unnecessary because the addressing mode
>> > >>> degrades into "reg" or "reg+reg".  Moreover, calls of
>> > >>> multiplier_allowed_in_address_p in both get_address_cost and
>> > get_computation_cost_at have scale other than 1.
>> > >>
>> > >> Ok.
>> > >>
>> > >>>>
>> > >>>> So no, the helper now checks sth completely different.  What's
>> > >>>> the problem with arm supporting reg1 * scale?  Why shouldn't it
>> > >>>> being able to handle
>> > >>> the
>> > >>>> implicit zero offset?
>> > >>>
>> > >>> As Richard clarified, ARM does not support scaled addressing mode
>> > >>> without base register.
>> > >>
>> > >> I see.
>> > >>
>> > > Also from the newer comments:
>> > >
>> > >> Btw, it should be reasonably possible to compute the whole
>> > >> multiplier_allowed_in_address_p table for all primary and secondary
>> > >> archs (simply build cross-cc1) and compare the results before /
>> > >> after a patch candidate.  Querying both reg * scale and reg + reg *
>> > >> scale if the first fails sounds like a good solution to me.
>> > > I take this as we should do minimal change by checking reg + reg *
>> > > scale if reg * scale is failed,  right?
>> >
>> > Yes, you can share a single RTL expression for all this and I think
>> querying reg
>> > + reg * scale first makes sense (then fallback to reg * scale for
>> compatibility).
>> >
>> I updated the patch as discussed.  Please review.
>> It bootstraps and passes regression test on x86/x86_64, but fails tree-
>> ssa/scev-4.c on arm cortex-a15.
>> Hi Richard, I investigated the failure and found out it reveals two other
>> problems in IVOPT we have discussed.
>> For scev-4.c like:
>>
>> typedef struct {
>>       int x;
>>       int y;
>> } S;
>> int *a_p;
>> S a[1000];
>> f(int k)
>> {
>>       int i;
>>
>>       for (i=k; i<1000; i+=k) {
>>               a_p = &a[i].y;
>>               *a_p = 100;
>>         }
>> }
>>
>> The iv candidates and uses are like:
>>
>> use 0
>>   generic
>>   in statement a_p.0_5 = &a[k_11].y;
>>
>>   at position
>>   type int *
>>   base (int *) &a + ((sizetype) k_3(D) * 8 + 4)
>>   step (sizetype) k_3(D) * 8
>>   base object (void *) &a
>>   related candidates
>> use 1
>>   address
>>   in statement MEM[(int *)&a][k_11].y = 100;
>>
>>   at position MEM[(int *)&a][k_11].y
>>   type int *
>>   base &MEM[(int *)&a][k_3(D)].y
>>   step (sizetype) k_3(D) * 8
>>   base object (void *) &a
>>   related candidates
>>
>> candidate 4 (important)
>>   depends on 1
>>   original biv
>>   type int
>>   base k_3(D)
>>   step k_3(D)
>>
>> candidate 7
>>   depends on 1
>>   var_before ivtmp.12
>>   var_after ivtmp.12
>>   incremented before exit test
>>   type unsigned int
>>   base (unsigned int) ((int *) &a + (sizetype) k_3(D) * 8)
>>   step (unsigned int) ((sizetype) k_3(D) * 8)
>>   base object (void *) &a
>> Candidate 7 is related to use 0
>>
>> Problem 1) When computing cost of use 1 using cand 4, the call of
>> strip_offset (&MEM[(int *)&a][k_3(D)].y, &off1) computes off1 == 0, which
>> is actually 4.
>> This can be fixed my previous patch which still in re-working.
>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01749.html
>>
>> Problem 2) It allocates iv with not simplified base like &MEM[(int
>> *)&a][k_3(D)].y,  while cost computation functions like
> ptr_difference_const
>> can't handle such complex address properly (also force_expr_to_var_cost
>> etc.).  In this case, it can't understand that "((int *) &a + (sizetype)
>> k_3(D) * 8)" and "&MEM[(int *)&a][k_3(D)]" are actually equal.
>> I think it would substantially simplify computation of cost in IVOPT if we
> can
>> allocate iv with simplified base expression.  I have another patch for
> this and
>> will send it for review.
>>
>> BTW, the failure can be fixed by the offset fixation patch.
>>
>> So is it OK to applied this updated patch?


Ok with the loop invariant

+  XEXP (addr, 0) = index;

moved out.  You may want to wait until the patch is in that avoids
the intermittent scev-4.c failure on arm.

Thanks,
Richard.

>> Thanks.
>> bin
>>
>>
>>

Re: [PATCH]Construct canonical scaled address expression in IVOPT

Reply via email to