[Bug target/92651] [10 Regression] Unnecessary stv transform in some x86 backend

wwwhhhyyy333 at gmail dot com Tue, 26 Nov 2019 02:06:27 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651


--- Comment #7 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to rguent...@suse.de from comment #6)
> On Tue, 26 Nov 2019, wwwhhhyyy333 at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
> > 
> > --- Comment #4 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
> > (In reply to Richard Biener from comment #2)
> > > Btw, which variant is actually the fastest for you?   abs expansion 
> > > doesn't
> > > do any cost comparison but just uses direct abs, max and then the xor with
> > > shift as third option (and after that fall back to compare & jump which 
> > > later
> > > might be if-converted into cmov).
> > 
> > Actually the xor with shift is could be the fastest, which improves 
> > about 8% on 525.x264_r comparing to the pmaxsd one, and with cmove the 
> > improvement is 6.5%.
> 
> I see.  So I wonder if it makes sense to add some costing checks to
> abs expansion... - the simplest way is probably to make the x86 backends
> have abs patterns and drive expansion itself here.
> 
> > I don't think this conversion should happen on every cmove instruction,
> > regardless of how many sse register it would use. I think the simplest way 
> > to
> > avoid this is adjusting the cost.
> 
> Well, for STV the issue is that "costing" is done on individual
> chains.  Note that STV doesn't transform cmovs, it transforms min/max
> instructions which exist on integer modes just for the sake of STV ...
> 
> STV (like many other combine-like transforms) doesn't consider the
> global picture (multiple min/max chains in the same code region, etc.)
> but only works locally.  So any costing wrenches you throw in has
> an effect on _all_ chains.
> 
> Clearly abs expansion had a successful non-cmov path before the STV
> changes and the intention was not to make min/max the new abs expansion
> of choice.  So I guess we need to rectify that - and the easiest and
> least intrusive way (for other targets) is to add abs expansion
> patterns.

Thanks for your explanation. The concern is if we add abs expansion patterns on
x86 target, other expansions may be affected by the change like what is done
with smax. And it is a little bit redundant to add such expansion just by
duplicate the original code to generate the xor version.

[Bug target/92651] [10 Regression] Unnecessary stv transform in some x86 backend

Reply via email to