[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

ubizjak at gmail dot com via Gcc-bugs Wed, 26 Jul 2023 00:30:38 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762


Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com
             Status|NEW                         |ASSIGNED

--- Comment #16 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55636
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55636&action=edit
Proposed patch

Proposed patch clears the upper half of a V4SFmode operand register before all
potentially trapping instructions. The testcase from comment #12 now compiles
to:

        movq    %xmm1, %xmm1    # 9     [c=4 l=4]  *vec_concatv4sf_0
        movq    %xmm0, %xmm0    # 10    [c=4 l=4]  *vec_concatv4sf_0
        addps   %xmm1, %xmm0    # 11    [c=12 l=3]  *addv4sf3/0

This approach addresses issues with traps (Comment #0), as well as with
denormal/invalid values (Comment #14). An obvious exception to the rule is a
division, where the value != 0.0 should be loaded into the upper half of the
denominator.

The patch effectively tightens the solution from PR95046 by clearing upper
halves of all operand registers before every potentially trapping instruction.
The testcase:

--cut here--
typedef float __attribute__((vector_size(8))) v2sf;

v2sf test (v2sf a, v2sf b, v2sf c)
{
  return a * b - c;
}
--cut here--

compiles to:

        movq    %xmm1, %xmm1    # 8     [c=4 l=4]  *vec_concatv4sf_0
        movq    %xmm0, %xmm0    # 9     [c=4 l=4]  *vec_concatv4sf_0
        movq    %xmm2, %xmm2    # 12    [c=4 l=4]  *vec_concatv4sf_0
        mulps   %xmm1, %xmm0    # 10    [c=16 l=3]  *mulv4sf3/0
        movq    %xmm0, %xmm0    # 13    [c=4 l=4]  *vec_concatv4sf_0
        subps   %xmm2, %xmm0    # 14    [c=12 l=3]  *subv4sf3/0

The implementation simply calls V4SFmode operation, so we can remove all
"emulated" SSE2 V2SFmode instructions and SSE2 V2SFmode alternatives from
3dNOW! insn patterns.

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

Reply via email to