On Fri, Feb 22, 2019 at 8:25 AM H.J. Lu <hongjiu...@intel.com> wrote: > > Hi Jan, Uros, > > This patch fixes the wrong code bug: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229 > > Tested on AVX2 and AVX512 with and without --with-arch=native. > > OK for trunk? > > Thanks. > > H.J. > -- > i386 backend has > > INT_MODE (OI, 32); > INT_MODE (XI, 64); > > So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation, > in case of const_1, all 512 bits set. > > We can load zeros with narrower instruction, (e.g. 256 bit by inherent > zeroing of highpart in case of 128 bit xor), so TImode in this case. > > Some targets prefer V4SF mode, so they will emit float xorps for zeroing. > > sse.md has > > (define_insn "mov<mode>_internal" > [(set (match_operand:VMOVE 0 "nonimmediate_operand" > "=v,v ,v ,m") > (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand" > " C,BC,vm,v"))] > .... > /* There is no evex-encoded vmov* for sizes smaller than 64-bytes > in avx512f, so we need to use workarounds, to access sse registers > 16-31, which are evex-only. In avx512vl we don't need workarounds. > */ > if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL > && (EXT_REX_SSE_REG_P (operands[0]) > || EXT_REX_SSE_REG_P (operands[1]))) > { > if (memory_operand (operands[0], <MODE>mode)) > { > if (<MODE_SIZE> == 32) > return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, > 0x0}"; > else if (<MODE_SIZE> == 16) > return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, > 0x0}"; > else > gcc_unreachable (); > } > ... > > However, since ix86_hard_regno_mode_ok has > > /* TODO check for QI/HI scalars. */ > /* AVX512VL allows sse regs16+ for 128/256 bit modes. */ > if (TARGET_AVX512VL > && (mode == OImode > || mode == TImode > || VALID_AVX256_REG_MODE (mode) > || VALID_AVX512VL_128_REG_MODE (mode))) > return true; > > /* xmm16-xmm31 are only available for AVX-512. */ > if (EXT_REX_SSE_REGNO_P (regno)) > return false; > > if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL > && (EXT_REX_SSE_REG_P (operands[0]) > || EXT_REX_SSE_REG_P (operands[1]))) > > is a dead code. > > Also for > > long long *p; > volatile __m256i yy; > > void > foo (void) > { > _mm256_store_epi64 (p, yy); > } > > with AVX512VL, we should generate > > vmovdqa %ymm0, (%rax) > > not > > vmovdqa64 %ymm0, (%rax) > > All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov: > > 1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector > moves will be generated. > 2. If xmm16-xmm31/ymm16-ymm31 registers are used: > a. With AVX512VL, AVX512VL vector moves will be generated. > b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register > move will be done with zmm register move. > > ext_sse_reg_operand is removed since it is no longer needed. > > Tested on AVX2 and AVX512 with and without --with-arch=native. > > gcc/ > > PR target/89229 > PR target/89346 > * config/i386/i386-protos.h (ix86_output_ssemov): New prototype. > * config/i386/i386.c (ix86_get_ssemov): New function. > (ix86_output_ssemov): Likewise. > * config/i386/i386.md (*movxi_internal_avx512f): Call > ix86_output_ssemov for TYPE_SSEMOV. > (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove ext_sse_reg_operand and TARGET_AVX512VL check. > (*movti_internal): Likewise. > (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove ext_sse_reg_operand check. > (*movsi_internal): Likewise. > (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL > and ext_sse_reg_operand check. > (*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and > ext_sse_reg_operand check. > * config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call > ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand > check. > * config/i386/sse.md (VMOVE:mov<mode>_internal): Call > ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_AVX512VL > check. > * config/i386/predicates.md (ext_sse_reg_operand): Removed. > > gcc/testsuite/ > > PR target/89229 > PR target/89346 > * gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated. > * gcc.target/i386/pr89229-2a.c: New test. > * gcc.target/i386/pr89229-2b.c: Likewise. > * gcc.target/i386/pr89229-2c.c: Likewise. > * gcc.target/i386/pr89229-3a.c: Likewise. > * gcc.target/i386/pr89229-3b.c: Likewise. > * gcc.target/i386/pr89229-3c.c: Likewise. > * gcc.target/i386/pr89229-4a.c: Likewise. > * gcc.target/i386/pr89229-4b.c: Likewise. > * gcc.target/i386/pr89229-4c.c: Likewise. > * gcc.target/i386/pr89229-5a.c: Likewise. > * gcc.target/i386/pr89229-5b.c: Likewise. > * gcc.target/i386/pr89229-5c.c: Likewise. > * gcc.target/i386/pr89229-6a.c: Likewise. > * gcc.target/i386/pr89229-6b.c: Likewise. > * gcc.target/i386/pr89229-6c.c: Likewise. > * gcc.target/i386/pr89229-7a.c: Likewise. > * gcc.target/i386/pr89229-7b.c: Likewise. > * gcc.target/i386/pr89229-7c.c: Likewise. > ---
PING: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html -- H.J.