https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2019-02-01
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com
     Ever confirmed|0                           |1

--- Comment #14 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 45582
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45582&action=edit
Additional patch to break partial SSE reg dependencies

Here is another patch that may help with partial SSE reg dependencies for
{R,}SQRTS{S,D}, RCPS{S,D} and ROUNDS{S,D} instructions. It takes the same
strategy as both ICC and clang take, that is:

a) load from mem with MOVS{S,D} and
b) in case of SSE, match input and output register.

The implementation uses preferred_for_speed attribute, so in cold sections or
when compiled with -Os, the compiler is still able to create direct load from
memory (SSE, AVX) and use unmatched registers for SSE targets.

So, the sqrt from memory is now compikled to:

        movsd   z(%rip), %xmm0
        sqrtsd  %xmm0, %xmm0


(SSE) or

        vmovsd  z(%rip), %xmm1
        vsqrtsd %xmm1, %xmm1, %xmm0

(AVX).

And sqrt from unmatched input register will compile to:

        sqrtsd  %xmm1, %xmm1
        movapd  %xmm1, %xmm0

(SSE) or

       vsqrtsd %xmm1, %xmm1, %xmm0
.

HJ, can you please benchmark this patch?

Reply via email to