On Sun, Oct 21, 2018 at 6:04 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Sat, Oct 20, 2018 at 8:46 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > Many AVX512 vector operations can broadcast from a scalar memory source. > > This patch enables memory broadcast for FMSUB operations. In order to > > support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are > > also added, instead of passing the negated value to FMA builtin functions. > > > > gcc/ > > > > PR target/72782 > > * config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use > > __builtin_ia32_vfmsubpd512_mask. > > (_mm512_mask_fmsub_round_pd): Likewise. > > (_mm512_fmsub_pd): Likewise. > > (_mm512_mask_fmsub_pd): Likewise. > > (_mm512_maskz_fmsub_round_pd): Use > > __builtin_ia32_vfmsubpd512_maskz. > > (_mm512_maskz_fmsub_pd): Likewise. > > (_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask. > > (_mm512_mask_fmsub_round_ps): Likewise. > > (_mm512_fmsub_ps): Likewise. > > (_mm512_mask_fmsub_ps): Likewise. > > (_mm512_maskz_fmsub_round_ps): Use > > __builtin_ia32_vfmsubps512_maskz. > > (_mm512_maskz_fmsub_ps): Likewise. > > * config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use > > __builtin_ia32_vfmsubpd256_mask. > > (_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz. > > (_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask > > (_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz. > > (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask. > > (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask. > > (_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz. > > (_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask. > > (_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz. > > * config/i386/fmaintrin.h (_mm_fmsub_pd): Use > > __builtin_ia32_vfmsubpd. > > (_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256. > > (_mm_fmsub_ps): Use __builtin_ia32_vfmsubps. > > (_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256. > > (_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3. > > (_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3. > > * config/i386/i386-builtin.def: Add > > __builtin_ia32_vfmsubpd256_mask, > > __builtin_ia32_vfmsubpd256_maskz, > > __builtin_ia32_vfmsubpd128_mask, > > __builtin_ia32_vfmsubpd128_maskz, > > __builtin_ia32_vfmsubps256_mask, > > __builtin_ia32_vfmsubps256_maskz, > > __builtin_ia32_vfmsubps128_mask, > > __builtin_ia32_vfmsubps128_maskz, > > __builtin_ia32_vfmsubpd512_mask, > > __builtin_ia32_vfmsubpd512_maskz, > > __builtin_ia32_vfmsubps512_mask, > > __builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3, > > __builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps, > > __builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and. > > __builtin_ia32_vfmsubpd256. > > * config/i386/sse.md (fma4i_fmsub_<mode>): New. > > (<avx512>_fmsub_<mode>_maskz<round_expand_name>): Likewise. > > (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1): > > Likewise. > > (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2): > > Likewise. > > (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3): > > Likewise. > > (fmai_vmfmsub_<mode><round_name>): Likewise. > > > > gcc/testsuite/ > > > > PR target/72782 > > * gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise. > > * gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise. > > * gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise. > > * gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise. > > LGTM.
LGTM for the whole patch serie (all patches implement the same approach). Thanks, Uros.