On Fri, Feb 22, 2019 at 8:25 AM H.J. Lu <hongjiu...@intel.com> wrote:
>
> Hi Jan, Uros,
>
> This patch fixes the wrong code bug:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229
>
> Tested on AVX2 and AVX512 with and without --with-arch=native.
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> --
> i386 backend has
>
> INT_MODE (OI, 32);
> INT_MODE (XI, 64);
>
> So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> in case of const_1, all 512 bits set.
>
> We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> zeroing of highpart in case of 128 bit xor), so TImode in this case.
>
> Some targets prefer V4SF mode, so they will emit float xorps for zeroing.
>
> sse.md has
>
> (define_insn "mov<mode>_internal"
>   [(set (match_operand:VMOVE 0 "nonimmediate_operand"
>          "=v,v ,v ,m")
>         (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
>          " C,BC,vm,v"))]
> ....
>       /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
>          in avx512f, so we need to use workarounds, to access sse registers
>          16-31, which are evex-only. In avx512vl we don't need workarounds.  
> */
>       if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
>           && (EXT_REX_SSE_REG_P (operands[0])
>               || EXT_REX_SSE_REG_P (operands[1])))
>         {
>           if (memory_operand (operands[0], <MODE>mode))
>             {
>               if (<MODE_SIZE> == 32)
>                 return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 
> 0x0}";
>               else if (<MODE_SIZE> == 16)
>                 return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 
> 0x0}";
>               else
>                 gcc_unreachable ();
>             }
> ...
>
> However, since ix86_hard_regno_mode_ok has
>
>      /* TODO check for QI/HI scalars.  */
>       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
>       if (TARGET_AVX512VL
>           && (mode == OImode
>               || mode == TImode
>               || VALID_AVX256_REG_MODE (mode)
>               || VALID_AVX512VL_128_REG_MODE (mode)))
>         return true;
>
>       /* xmm16-xmm31 are only available for AVX-512.  */
>       if (EXT_REX_SSE_REGNO_P (regno))
>         return false;
>
>       if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
>           && (EXT_REX_SSE_REG_P (operands[0])
>               || EXT_REX_SSE_REG_P (operands[1])))
>
> is a dead code.
>
> Also for
>
> long long *p;
> volatile __m256i yy;
>
> void
> foo (void)
> {
>    _mm256_store_epi64 (p, yy);
> }
>
> with AVX512VL, we should generate
>
>         vmovdqa         %ymm0, (%rax)
>
> not
>
>         vmovdqa64       %ymm0, (%rax)
>
> All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov:
>
> 1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector
> moves will be generated.
> 2. If xmm16-xmm31/ymm16-ymm31 registers are used:
>    a. With AVX512VL, AVX512VL vector moves will be generated.
>    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>       move will be done with zmm register move.
>
> ext_sse_reg_operand is removed since it is no longer needed.
>
> Tested on AVX2 and AVX512 with and without --with-arch=native.
>
> gcc/
>
>         PR target/89229
>         PR target/89346
>         * config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
>         * config/i386/i386.c (ix86_get_ssemov): New function.
>         (ix86_output_ssemov): Likewise.
>         * config/i386/i386.md (*movxi_internal_avx512f): Call
>         ix86_output_ssemov for TYPE_SSEMOV.
>         (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
>         Remove ext_sse_reg_operand and TARGET_AVX512VL check.
>         (*movti_internal): Likewise.
>         (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
>         Remove ext_sse_reg_operand check.
>         (*movsi_internal): Likewise.
>         (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
>         (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
>         Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL
>         and ext_sse_reg_operand check.
>         (*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
>         Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and
>         ext_sse_reg_operand check.
>         * config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call
>         ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
>         check.
>         * config/i386/sse.md (VMOVE:mov<mode>_internal): Call
>         ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
>         check.
>         * config/i386/predicates.md (ext_sse_reg_operand): Removed.
>
> gcc/testsuite/
>
>         PR target/89229
>         PR target/89346
>         * gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
>         * gcc.target/i386/pr89229-2a.c: New test.
>         * gcc.target/i386/pr89229-2b.c: Likewise.
>         * gcc.target/i386/pr89229-2c.c: Likewise.
>         * gcc.target/i386/pr89229-3a.c: Likewise.
>         * gcc.target/i386/pr89229-3b.c: Likewise.
>         * gcc.target/i386/pr89229-3c.c: Likewise.
>         * gcc.target/i386/pr89229-4a.c: Likewise.
>         * gcc.target/i386/pr89229-4b.c: Likewise.
>         * gcc.target/i386/pr89229-4c.c: Likewise.
>         * gcc.target/i386/pr89229-5a.c: Likewise.
>         * gcc.target/i386/pr89229-5b.c: Likewise.
>         * gcc.target/i386/pr89229-5c.c: Likewise.
>         * gcc.target/i386/pr89229-6a.c: Likewise.
>         * gcc.target/i386/pr89229-6b.c: Likewise.
>         * gcc.target/i386/pr89229-6c.c: Likewise.
>         * gcc.target/i386/pr89229-7a.c: Likewise.
>         * gcc.target/i386/pr89229-7b.c: Likewise.
>         * gcc.target/i386/pr89229-7c.c: Likewise.
> ---

PING:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html


-- 
H.J.

Reply via email to