Hello!

> avx-vzeroupper-3 fails because reload moves AVX store through vzeroupper.
>
> Before reload
> (insn 2 27 3 2 (set (reg/v:V4DI 61 [ src ])

...

> After reload
> (insn 6 3 29 2 (set (reg:QI 0 ax)
>
> I think it is data flow analyze problem. Uros  refers to code at
> df-scan.c, line 3248 . He wrote
>
> This kind of defeat the purpose of UNSPEC_VOLATILE, and is probably
> the root cause of moves.  I don't know how to attack this efficiently,
> I suggest to ask on the list about the issue.
>
> What is possible solution in this case?

It looks to me that we have to introduce post-reload LCM insertion
pass. Please note that vzeroupper is defined with hard registers only
(and FWIW, vzero too), so there is no concept of virtuals in these
patterns.

The instruction that is inserted post-reload can be defined as:

--cut here--
;; Clear the upper 128bits of AVX registers, equivalent to a NOP
;; if the upper 128bits are unused.
(define_expand "avx_vzeroupper"
  [(match_par_dup 1 [(match_operand 0 "const_int_operand")])]
  "TARGET_AVX"
{
  int nregs = TARGET_64BIT ? 16 : 8;
  int regno;

  operands[1] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (nregs + 1));

  XVECEXP (operands[1], 0, 0)
    = gen_rtx_UNSPEC_VOLATILE (VOIDmode, gen_rtvec (1, operands[0]),
                               UNSPECV_VZEROUPPER);

  for (regno = 0; regno < nregs; regno++)
    XVECEXP (operands[1], 0, regno + 1)
      = gen_rtx_SET (VOIDmode,
                     gen_rtx_REG (V8SImode, SSE_REGNO (regno)),
                     gen_rtx_VEC_MERGE (V4SImode,
                                        gen_rtx_REG (V4SImode,
                                                     SSE_REGNO (regno)),
                                        CONST0_RTX (V4SImode),
                                        const1_rtx));
})

(define_insn "*avx_vzeroupper"
  [(match_parallel 1 "vzeroupper_operation"
    [(unspec_volatile [(match_operand 0 "const_int_operand")]
                      UNSPECV_VZEROUPPER)])]
  "TARGET_AVX"
  "vzeroupper\t# %0"
  [(set_attr "type" "sse")
   (set_attr "modrm" "0")
   (set_attr "memory" "none")
   (set_attr "prefix" "vex")
   (set_attr "mode" "OI")])
--cut here--

Also, vzeroupper and vzero that are generated via __builtin_ia32_*
should be generated in a different way. The call to builtin should
insert unspec_volatile marker that will be split post-reload to a real
pattern with all hard registers enumerated in the insn RTL body.

Uros.

Reply via email to