https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2019-02-01 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Ever confirmed|0 |1 --- Comment #14 from Uroš Bizjak <ubizjak at gmail dot com> --- Created attachment 45582 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45582&action=edit Additional patch to break partial SSE reg dependencies Here is another patch that may help with partial SSE reg dependencies for {R,}SQRTS{S,D}, RCPS{S,D} and ROUNDS{S,D} instructions. It takes the same strategy as both ICC and clang take, that is: a) load from mem with MOVS{S,D} and b) in case of SSE, match input and output register. The implementation uses preferred_for_speed attribute, so in cold sections or when compiled with -Os, the compiler is still able to create direct load from memory (SSE, AVX) and use unmatched registers for SSE targets. So, the sqrt from memory is now compikled to: movsd z(%rip), %xmm0 sqrtsd %xmm0, %xmm0 (SSE) or vmovsd z(%rip), %xmm1 vsqrtsd %xmm1, %xmm1, %xmm0 (AVX). And sqrt from unmatched input register will compile to: sqrtsd %xmm1, %xmm1 movapd %xmm1, %xmm0 (SSE) or vsqrtsd %xmm1, %xmm1, %xmm0 . HJ, can you please benchmark this patch?