https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #18 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to kugan from comment #12)
> (In reply to rsand...@gcc.gnu.org from comment #10)
> > (In reply to kugan from comment #9)
> > > Created attachment 46040 [details]
> > > patch
> > 
> > Wasn't sure whether this patch was WIP or the final version
> > for review, but we need to do something more generic than
> > dividing by 4.  I think the test will still fail with "int"
> > changed to "short" for example.
> > 
> > I also don't think the new candidate should be tied to the
> > mask/load store functions.  Maybe one approach would be to
> > check when adding a zero-based candidate for a use in:
> > 
> >   /* Record common candidate with initial value zero.  */
> >   basetype = TREE_TYPE (iv->base);
> >   if (POINTER_TYPE_P (basetype))
> >     basetype = sizetype;
> >   record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> > 
> > whether the use actually benefits from this unscaled iv.
> > If the use is USE_REF_ADDRESS, we could compare the cost
> > of an address with an unscaled index with the cost of an address
> > with a scaled index.  I think the natural scale value to try
> > would be GET_MODE_INNER (TYPE_MODE (mem_type)).
> 
> Thanks for the comments. I agree this is the right place. But I am not sure
> if checking the cost at this point is what IV opt generally does. In
> general, IV-opt adds candidates which can be helpful and later decides the
> optimal set. 

But I was talking about comparing the cost of the address rather
than the cost of the iv.  Like you say, the idea is to add candidates
that might be useful, and what we want to know here is whether the
bytes offset is likely to be a useful candidate for this use.

Another way of deciding whether to go for a scaled candidate would
be to test for a legitimate address directly (rather than via
address costs) if you prefer that.  I just thought using address
costs might be easier.

We could also keep the unscaled candidate in addition to the
new scaled one if we have evidence that having both is useful.
The danger is that if we add too many, we'll trip the iv limit,
so I think we'd need positive evidence for keeping both.

> If we are to use get_computation_cost to see the costs, we have to create
> iv_cand and then discard. Since we are adding only one candidate and that
> too for SVE like targets, I am thinking that it is OK. If you still prefer
> to check the cost, I will change that.

IMO it's a generic concept that just happens to apply to SVE.
If an architecture is going to support just one "reg+reg" addressing
mode, the two obvious choices are for the offset register to be unscaled
(bytes) or scaled by the element or access size (indices).  SVE chose
the latter.  In that case, the most useful candidate is likely to be
the index rather than the byte offset.

This applies to single-vector loads and stores as well as
LOAD/STORE_LANES.  The reason we usually get good iv choices
for single vectors is that the index usually exists as a candidate
already, in the form of the loop control iv.  (This is of course the
main benefit to base+scaled addressing over base+unscaled addressing.)
But it's probably possible to construct examples in which the
index candidate doesn't already exist even for single vectors.

> Attached patch (only the ivopt changes) and testcase

Reply via email to