https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 26 Nov 2019, wwwhhhyyy333 at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651 > > --- Comment #7 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> --- > (In reply to rguent...@suse.de from comment #6) > > On Tue, 26 Nov 2019, wwwhhhyyy333 at gmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651 > > > > > > --- Comment #4 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> --- > > > (In reply to Richard Biener from comment #2) > > > > Btw, which variant is actually the fastest for you? abs expansion > > > > doesn't > > > > do any cost comparison but just uses direct abs, max and then the xor > > > > with > > > > shift as third option (and after that fall back to compare & jump which > > > > later > > > > might be if-converted into cmov). > > > > > > Actually the xor with shift is could be the fastest, which improves > > > about 8% on 525.x264_r comparing to the pmaxsd one, and with cmove the > > > improvement is 6.5%. > > > > I see. So I wonder if it makes sense to add some costing checks to > > abs expansion... - the simplest way is probably to make the x86 backends > > have abs patterns and drive expansion itself here. > > > > > I don't think this conversion should happen on every cmove instruction, > > > regardless of how many sse register it would use. I think the simplest > > > way to > > > avoid this is adjusting the cost. > > > > Well, for STV the issue is that "costing" is done on individual > > chains. Note that STV doesn't transform cmovs, it transforms min/max > > instructions which exist on integer modes just for the sake of STV ... > > > > STV (like many other combine-like transforms) doesn't consider the > > global picture (multiple min/max chains in the same code region, etc.) > > but only works locally. So any costing wrenches you throw in has > > an effect on _all_ chains. > > > > Clearly abs expansion had a successful non-cmov path before the STV > > changes and the intention was not to make min/max the new abs expansion > > of choice. So I guess we need to rectify that - and the easiest and > > least intrusive way (for other targets) is to add abs expansion > > patterns. > > Thanks for your explanation. The concern is if we add abs expansion patterns > on > x86 target, other expansions may be affected by the change like what is done > with smax. And it is a little bit redundant to add such expansion just by > duplicate the original code to generate the xor version. Sure. Another option would be to enhance STV even further (or add some peephole patterns - combine runs before STV2) to transform the psubd xmm3, xmm0 psubd xmm0, xmm1 pmaxsd xmm0, xmm3 into psubd %xmm3, %xmm0 pabsd %xmm0, %xmm0 for enhancing STV that means adding abs() patterns (or adding combine-like matching to the pass which I'd suggest not do). Clearly that the above conversion isn't done is a generic missed optimization. Maybe you can benchmark that as well though I guess it won't come near the xor variant?