On Thu, Sep 26, 2013 at 2:10 PM, bin.cheng <bin.ch...@arm.com> wrote: > Sorry for missing the patch. > > Thanks. > bin > >> -----Original Message----- >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >> ow...@gcc.gnu.org] On Behalf Of bin.cheng >> Sent: Thursday, September 26, 2013 8:05 PM >> To: 'Richard Biener'; Bin.Cheng >> Cc: GCC Patches; Richard Earnshaw >> Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT >> >> >> >> > -----Original Message----- >> > From: Richard Biener [mailto:richard.guent...@gmail.com] >> > Sent: Tuesday, September 24, 2013 7:58 PM >> > To: Bin.Cheng >> > Cc: Bin Cheng; GCC Patches; Richard Earnshaw >> > Subject: Re: [PATCH]Construct canonical scaled address expression in >> > IVOPT >> > >> > On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng <amker.ch...@gmail.com> >> > wrote: >> > > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener >> > > <richard.guent...@gmail.com> wrote: >> > >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng <bin.ch...@arm.com> >> wrote: >> > >>> >> > >>> >> > >>>> -----Original Message----- >> > >> >> > >> Or even [reg*scale] (not sure about that). But yes, at least >> > >> reg*scale + offset and reg*scale + reg. >> > >> >> > >>> Apparently it's infeasible to check every possibility for each >> > >>> architecture, is it ok we at least check "index", then "addr" if >> > >>> "index" is failed? By "any kind of addressing modes", I mean >> > >>> modes supported by function get_address_cost, i.e., in form of >> > >>> "[base] + [off] + [var] + (reg|reg*scale)". >> > >> >> > >> I suppose so, but it of course depends on what IVOPTs uses the >> > >> answer for in the end. Appearantly it doesn't distinguish between >> > >> the various cases even though TARGET_MEM_REF does support all the >> > >> variants in question (reg * scale, reg * scale + reg, reg * scale + >> > >> const, reg * scale + reg + const). >> > >> >> > >> So the better answer may be to teach the costs about the differences? >> > > Ideally, IVOPT should be aware whether scaling is allowed in every >> > > kind of addressing modes and account cost of multiplier accordingly. >> > > For current code, there are two scenarios here >> > > 1) If target supports reg*scale+reg, but not reg*scale, in this >> > > case, IVOPT considers multiplier is not allowed in any addressing >> > > mode and account multiplier with high cost. This is the problem arm >> having. >> > > 2) If target supports reg*scale, but not some kind of addressing >> > > mode (saying reg*scale+reg), in this case, IVOPT still constructs >> > > various scaled addressing mode in get_address_cost and depends on >> > > address_cost to compute correct cost for that addressing expression. >> > > I think this happens to work even IVOPT doesn't know "reg*scale+reg" >> > > is actually not supported. >> > > >> > >> >> > >>>> The above also builds more RTX waste which you can fix by >> > >>>> re-using the >> > >>> PLUS >> > >>>> by building it up-front similar to the multiplication. You also >> > >>>> miss the >> > >>> Yes, this can be fixed. >> > >>> >> > >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 >> > >>>> is >> valid. >> > >>> I >> > >>>> would expect that many targets support reg1 * scale + >> > >>>> constant-offset but not many reg1 * scale + reg2. >> > >>> I thought scale==1 is unnecessary because the addressing mode >> > >>> degrades into "reg" or "reg+reg". Moreover, calls of >> > >>> multiplier_allowed_in_address_p in both get_address_cost and >> > get_computation_cost_at have scale other than 1. >> > >> >> > >> Ok. >> > >> >> > >>>> >> > >>>> So no, the helper now checks sth completely different. What's >> > >>>> the problem with arm supporting reg1 * scale? Why shouldn't it >> > >>>> being able to handle >> > >>> the >> > >>>> implicit zero offset? >> > >>> >> > >>> As Richard clarified, ARM does not support scaled addressing mode >> > >>> without base register. >> > >> >> > >> I see. >> > >> >> > > Also from the newer comments: >> > > >> > >> Btw, it should be reasonably possible to compute the whole >> > >> multiplier_allowed_in_address_p table for all primary and secondary >> > >> archs (simply build cross-cc1) and compare the results before / >> > >> after a patch candidate. Querying both reg * scale and reg + reg * >> > >> scale if the first fails sounds like a good solution to me. >> > > I take this as we should do minimal change by checking reg + reg * >> > > scale if reg * scale is failed, right? >> > >> > Yes, you can share a single RTL expression for all this and I think >> querying reg >> > + reg * scale first makes sense (then fallback to reg * scale for >> compatibility). >> > >> I updated the patch as discussed. Please review. >> It bootstraps and passes regression test on x86/x86_64, but fails tree- >> ssa/scev-4.c on arm cortex-a15. >> Hi Richard, I investigated the failure and found out it reveals two other >> problems in IVOPT we have discussed. >> For scev-4.c like: >> >> typedef struct { >> int x; >> int y; >> } S; >> int *a_p; >> S a[1000]; >> f(int k) >> { >> int i; >> >> for (i=k; i<1000; i+=k) { >> a_p = &a[i].y; >> *a_p = 100; >> } >> } >> >> The iv candidates and uses are like: >> >> use 0 >> generic >> in statement a_p.0_5 = &a[k_11].y; >> >> at position >> type int * >> base (int *) &a + ((sizetype) k_3(D) * 8 + 4) >> step (sizetype) k_3(D) * 8 >> base object (void *) &a >> related candidates >> use 1 >> address >> in statement MEM[(int *)&a][k_11].y = 100; >> >> at position MEM[(int *)&a][k_11].y >> type int * >> base &MEM[(int *)&a][k_3(D)].y >> step (sizetype) k_3(D) * 8 >> base object (void *) &a >> related candidates >> >> candidate 4 (important) >> depends on 1 >> original biv >> type int >> base k_3(D) >> step k_3(D) >> >> candidate 7 >> depends on 1 >> var_before ivtmp.12 >> var_after ivtmp.12 >> incremented before exit test >> type unsigned int >> base (unsigned int) ((int *) &a + (sizetype) k_3(D) * 8) >> step (unsigned int) ((sizetype) k_3(D) * 8) >> base object (void *) &a >> Candidate 7 is related to use 0 >> >> Problem 1) When computing cost of use 1 using cand 4, the call of >> strip_offset (&MEM[(int *)&a][k_3(D)].y, &off1) computes off1 == 0, which >> is actually 4. >> This can be fixed my previous patch which still in re-working. >> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01749.html >> >> Problem 2) It allocates iv with not simplified base like &MEM[(int >> *)&a][k_3(D)].y, while cost computation functions like > ptr_difference_const >> can't handle such complex address properly (also force_expr_to_var_cost >> etc.). In this case, it can't understand that "((int *) &a + (sizetype) >> k_3(D) * 8)" and "&MEM[(int *)&a][k_3(D)]" are actually equal. >> I think it would substantially simplify computation of cost in IVOPT if we > can >> allocate iv with simplified base expression. I have another patch for > this and >> will send it for review. >> >> BTW, the failure can be fixed by the offset fixation patch. >> >> So is it OK to applied this updated patch?
Ok with the loop invariant + XEXP (addr, 0) = index; moved out. You may want to wait until the patch is in that avoids the intermittent scev-4.c failure on arm. Thanks, Richard. >> Thanks. >> bin >> >> >>