https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
--- Comment #4 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> --- (In reply to Richard Biener from comment #2) > Btw, which variant is actually the fastest for you? abs expansion doesn't > do any cost comparison but just uses direct abs, max and then the xor with > shift as third option (and after that fall back to compare & jump which later > might be if-converted into cmov). Actually the xor with shift is could be the fastest, which improves about 8% on 525.x264_r comparing to the pmaxsd one, and with cmove the improvement is 6.5%. I don't think this conversion should happen on every cmove instruction, regardless of how many sse register it would use. I think the simplest way to avoid this is adjusting the cost.