https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |ASSIGNED --- Comment #16 from Uroš Bizjak <ubizjak at gmail dot com> --- Created attachment 55636 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55636&action=edit Proposed patch Proposed patch clears the upper half of a V4SFmode operand register before all potentially trapping instructions. The testcase from comment #12 now compiles to: movq %xmm1, %xmm1 # 9 [c=4 l=4] *vec_concatv4sf_0 movq %xmm0, %xmm0 # 10 [c=4 l=4] *vec_concatv4sf_0 addps %xmm1, %xmm0 # 11 [c=12 l=3] *addv4sf3/0 This approach addresses issues with traps (Comment #0), as well as with denormal/invalid values (Comment #14). An obvious exception to the rule is a division, where the value != 0.0 should be loaded into the upper half of the denominator. The patch effectively tightens the solution from PR95046 by clearing upper halves of all operand registers before every potentially trapping instruction. The testcase: --cut here-- typedef float __attribute__((vector_size(8))) v2sf; v2sf test (v2sf a, v2sf b, v2sf c) { return a * b - c; } --cut here-- compiles to: movq %xmm1, %xmm1 # 8 [c=4 l=4] *vec_concatv4sf_0 movq %xmm0, %xmm0 # 9 [c=4 l=4] *vec_concatv4sf_0 movq %xmm2, %xmm2 # 12 [c=4 l=4] *vec_concatv4sf_0 mulps %xmm1, %xmm0 # 10 [c=16 l=3] *mulv4sf3/0 movq %xmm0, %xmm0 # 13 [c=4 l=4] *vec_concatv4sf_0 subps %xmm2, %xmm0 # 14 [c=12 l=3] *subv4sf3/0 The implementation simply calls V4SFmode operation, so we can remove all "emulated" SSE2 V2SFmode instructions and SSE2 V2SFmode alternatives from 3dNOW! insn patterns.