Re: Why vectorization didn't turn on by -O2

Hongtao Liu via Gcc-patches Thu, 05 Aug 2021 21:57:39 -0700

On Thu, Aug 5, 2021 at 5:20 AM Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
>
> On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote:
> > Segher Boessenkool <seg...@kernel.crashing.org> writes:
> > > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote:
> > >> Richard Biener <rguent...@suse.de> writes:
> > >> > Alternatively only enable loop vectorization at -O2 (the above checks
> > >> > flag_tree_slp_vectorize as well).  At least the cost model kind
> > >> > does not have any influence on BB vectorization, that is, we get the
> > >> > same pros and cons as we do for -O3.
> > >>
> > >> Yeah, but a lot of the loop vector cost model choice is about controlling
> > >> code size growth and avoiding excessive runtime versioning tests.
> > >
> > > Both of those depend a lot on the target, and target-specific conditions
> > > as well (which CPU model is selected for example).  Can we factor that
> > > in somehow?  Maybe we need some target hook that returns the expected
> > > percentage code growth for vectorising a given loop, for example, and
> > > -O2 vs. -O3 then selects what percentage is acceptable.
> > >
> > >> BB SLP
> > >> should be a win on both code size and performance (barring significant
> > >> target costing issues).
> > >
> > > Yeah -- but this could use a similar hook as well (just a straightline
> > > piece of code instead of a loop).
> >
> > I think anything like that should be driven by motivating use cases.
> > It's not something that we can easily decide in the abstract.
> >
> > The results so far with using very-cheap at -O2 have been promising,
> > so I don't think new hooks should block that becoming the default.
>
> Right, but it wouldn't hurt to think a sec if we are on the right path
> forward.  It's is crystal clear that to make good decisions about what
> and how to vectorise you need to take *some* target characteristics into
> account, and that will have to happen sooner rather than later.
>
> This was all in reply to
>
> > >> Yeah, but a lot of the loop vector cost model choice is about controlling
> > >> code size growth and avoiding excessive runtime versioning tests.
>
> It was not meant to hold up these patches :-)
>
> > >> PR100089 was an exception because we ended up keeping unvectorised
> > >> scalar code that would never have existed otherwise.  BB SLP proper
> > >> shouldn't have that problem.
> > >
> > > It also is a tiny piece of code.  There will always be tiny examples
> > > that are much worse (or much better) than average.
> >
> > Yeah, what makes PR100089 important isn't IMO the test itself, but the
> > underlying problem that the PR exposed.  Enabling this “BB SLP in loop
> > vectorisation” code can lead to the generation of scalar COND_EXPRs even
> > though we know that ifcvt doesn't have a proper cost model for deciding
> > whether scalar COND_EXPRs are a win.
> >
> > Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk
> > (although still dubious), but I think it's something we need to avoid
> > for -O2, even if that means losing the optimisation.
>
> Yeah -- -O2 should almost always do the right thing, while -O3 can do
> bad things more often, it just has to be better "on average".
>
>
> Segher


Move thread to gcc-patches and gcc

-- 
BR,
Hongtao

Re: Why vectorization didn't turn on by -O2

Reply via email to