[Bug target/92651] [10 Regression] Unnecessary stv transform in some x86 backend

rguenther at suse dot de Tue, 26 Nov 2019 04:17:20 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651


--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 26 Nov 2019, wwwhhhyyy333 at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
> 
> --- Comment #7 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
> (In reply to rguent...@suse.de from comment #6)
> > On Tue, 26 Nov 2019, wwwhhhyyy333 at gmail dot com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
> > > 
> > > --- Comment #4 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
> > > (In reply to Richard Biener from comment #2)
> > > > Btw, which variant is actually the fastest for you?   abs expansion 
> > > > doesn't
> > > > do any cost comparison but just uses direct abs, max and then the xor 
> > > > with
> > > > shift as third option (and after that fall back to compare & jump which 
> > > > later
> > > > might be if-converted into cmov).
> > > 
> > > Actually the xor with shift is could be the fastest, which improves 
> > > about 8% on 525.x264_r comparing to the pmaxsd one, and with cmove the 
> > > improvement is 6.5%.
> > 
> > I see.  So I wonder if it makes sense to add some costing checks to
> > abs expansion... - the simplest way is probably to make the x86 backends
> > have abs patterns and drive expansion itself here.
> > 
> > > I don't think this conversion should happen on every cmove instruction,
> > > regardless of how many sse register it would use. I think the simplest 
> > > way to
> > > avoid this is adjusting the cost.
> > 
> > Well, for STV the issue is that "costing" is done on individual
> > chains.  Note that STV doesn't transform cmovs, it transforms min/max
> > instructions which exist on integer modes just for the sake of STV ...
> > 
> > STV (like many other combine-like transforms) doesn't consider the
> > global picture (multiple min/max chains in the same code region, etc.)
> > but only works locally.  So any costing wrenches you throw in has
> > an effect on _all_ chains.
> > 
> > Clearly abs expansion had a successful non-cmov path before the STV
> > changes and the intention was not to make min/max the new abs expansion
> > of choice.  So I guess we need to rectify that - and the easiest and
> > least intrusive way (for other targets) is to add abs expansion
> > patterns.
> 
> Thanks for your explanation. The concern is if we add abs expansion patterns 
> on
> x86 target, other expansions may be affected by the change like what is done
> with smax. And it is a little bit redundant to add such expansion just by
> duplicate the original code to generate the xor version.

Sure.  Another option would be to enhance STV even further
(or add some peephole patterns - combine runs before STV2) to
transform the

        psubd   xmm3, xmm0
        psubd   xmm0, xmm1
        pmaxsd  xmm0, xmm3

into

        psubd  %xmm3, %xmm0
        pabsd  %xmm0, %xmm0

for enhancing STV that means adding abs() patterns (or adding
combine-like matching to the pass which I'd suggest not do).

Clearly that the above conversion isn't done is a generic
missed optimization.  Maybe you can benchmark that as well
though I guess it won't come near the xor variant?

[Bug target/92651] [10 Regression] Unnecessary stv transform in some x86 backend

Reply via email to