On Thu, Aug 5, 2021 at 5:20 AM Segher Boessenkool <seg...@kernel.crashing.org> wrote: > > On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote: > > Segher Boessenkool <seg...@kernel.crashing.org> writes: > > > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote: > > >> Richard Biener <rguent...@suse.de> writes: > > >> > Alternatively only enable loop vectorization at -O2 (the above checks > > >> > flag_tree_slp_vectorize as well). At least the cost model kind > > >> > does not have any influence on BB vectorization, that is, we get the > > >> > same pros and cons as we do for -O3. > > >> > > >> Yeah, but a lot of the loop vector cost model choice is about controlling > > >> code size growth and avoiding excessive runtime versioning tests. > > > > > > Both of those depend a lot on the target, and target-specific conditions > > > as well (which CPU model is selected for example). Can we factor that > > > in somehow? Maybe we need some target hook that returns the expected > > > percentage code growth for vectorising a given loop, for example, and > > > -O2 vs. -O3 then selects what percentage is acceptable. > > > > > >> BB SLP > > >> should be a win on both code size and performance (barring significant > > >> target costing issues). > > > > > > Yeah -- but this could use a similar hook as well (just a straightline > > > piece of code instead of a loop). > > > > I think anything like that should be driven by motivating use cases. > > It's not something that we can easily decide in the abstract. > > > > The results so far with using very-cheap at -O2 have been promising, > > so I don't think new hooks should block that becoming the default. > > Right, but it wouldn't hurt to think a sec if we are on the right path > forward. It's is crystal clear that to make good decisions about what > and how to vectorise you need to take *some* target characteristics into > account, and that will have to happen sooner rather than later. > > This was all in reply to > > > >> Yeah, but a lot of the loop vector cost model choice is about controlling > > >> code size growth and avoiding excessive runtime versioning tests. > > It was not meant to hold up these patches :-) > > > >> PR100089 was an exception because we ended up keeping unvectorised > > >> scalar code that would never have existed otherwise. BB SLP proper > > >> shouldn't have that problem. > > > > > > It also is a tiny piece of code. There will always be tiny examples > > > that are much worse (or much better) than average. > > > > Yeah, what makes PR100089 important isn't IMO the test itself, but the > > underlying problem that the PR exposed. Enabling this “BB SLP in loop > > vectorisation” code can lead to the generation of scalar COND_EXPRs even > > though we know that ifcvt doesn't have a proper cost model for deciding > > whether scalar COND_EXPRs are a win. > > > > Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk > > (although still dubious), but I think it's something we need to avoid > > for -O2, even if that means losing the optimisation. > > Yeah -- -O2 should almost always do the right thing, while -O3 can do > bad things more often, it just has to be better "on average". > > > Segher
Move thread to gcc-patches and gcc -- BR, Hongtao