[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 --- Comment #29 from Uroš Bizjak --- Fixed for gcc-12.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #28 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #18) > codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have > vec_setv16hf_0 and with vpblendw. True, some opportunities are missing from expand_vec_perm* functions, someone should go through these expanders and add corresponding VxHFmode near VxHImode handling.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #27 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #17) > (In reply to Hongtao.liu from comment #16) > > There're already testcases for vec_extract/vec_set/vec_duplicate, but those > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and > > test them w/o avx512fp16. > > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to > avx512vl/f16c so that vect-float16-1.c could be vectorized. > > vect-float16-1.c > > void > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, > _Float16 *__restrict__ c) > { > for (int i = 0; i < 256; i++) > a[i] = b[i] + c[i]; > } This was recently fixed, for -O2 -ftree-vectorize -mfp16c I get: vpxor %xmm2, %xmm2, %xmm2 vpinsrw $0, (%rsi,%rax), %xmm2, %xmm0 vpinsrw $0, (%rdx,%rax), %xmm2, %xmm1 vcvtph2ps %xmm0, %xmm0 vcvtph2ps %xmm1, %xmm1 vaddss %xmm1, %xmm0, %xmm0 vinsertps $0xe, %xmm0, %xmm0, %xmm0 vcvtps2ph $4, %xmm0, %xmm0 vpextrw $0, %xmm0, (%rdi,%rax) addq$2, %rax cmpq$512, %rax jne .L2 ret While it would be nice to partially vectorize with vcvtph2ps/vcvtps2ph, the compiler doesn't reach that far.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #26 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:271e36d9d5b3a75e7f1a927e594477e6a5dd6fc0 commit r12-6021-g271e36d9d5b3a75e7f1a927e594477e6a5dd6fc0 Author: Uros Bizjak Date: Thu Dec 16 19:34:50 2021 +0100 i386: Enable VxHF vector modes lower ABI levels [PR103571] Enable VxHF vector modes for SSE2, AVX and AVX512F ABIs. 2021-12-16 Uroš Bizjak gcc/ChangeLog: PR target/103571 * config/i386/i386.h (VALID_AVX256_REG_MODE): Add V16HFmode. (VALID_AVX256_REG_OR_OI_VHF_MODE): Replace with ... (VALID_AVX256_REG_OR_OI_MODE): ... this. Remove V16HFmode. (VALID_AVX512F_SCALAR_MODE): Remove HImode and HFmode. (VALID_AVX512FP16_SCALAR_MODE): New. (VALID_AVX512F_REG_MODE): Add V32HFmode. (VALID_SSE2_REG_MODE): Add V8HFmode, V4HFmode and V2HFmode. (VALID_SSE2_REG_VHF_MODE): Remove. (VALID_INT_MODE_P): Add V2HFmode. * config/i386/i386.c (function_arg_advance_64): Remove explicit mention of V16HFmode and V32HFmode. (ix86_hard_regno_mode_ok): Remove explicit mention of XImode and V32HFmode, use VALID_AVX512F_REG_OR_XI_MODE instead. Use VALID_AVX512FP_SCALAR_MODE for TARGET_aVX512FP16. Use VALID_AVX256_REG_OR_OI_MODE instead of VALID_AVX256_REG_OR_OI_VHF_MODE and VALID_SSE2_REG_MODE instead of VALID_SSE2_REG_VHF_MODE. (ix86_set_reg_reg_cost): Remove usge of VALID_AVX512FP16_REG_MODE. (ix86_vector_mode_supported): Ditto. gcc/testsuite/ChangeLog: PR target/103571 * gcc.target/i386/pr102812.c (dg-final): Do not scan for movdqa.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #25 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #22) > Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached > patch looks fine, the condition should be binded to real instructions but > not mode. OK, will commit the patch to enable vector modes later today: - mavx512fp16 is unchanged - vactorizer middle end can be (will be) fixed as a follow-up (I'll open a PR). - it will be possible to test various ISA levels, possible ICE is relatively easy to fix by enabling/disabling various code paths in expanders.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Uroš Bizjak changed: What|Removed |Added Attachment #51950|0 |1 is obsolete|| --- Comment #24 from Uroš Bizjak --- Created attachment 52002 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52002=edit Current patch to enable vector VxHF modes for TARGET_SSE+ Current patch after preparation patch was committed.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #23 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:7a54d3deecf967029f18aa5ed1fcbdb752e213b9 commit r12-5966-g7a54d3deecf967029f18aa5ed1fcbdb752e213b9 Author: Uros Bizjak Date: Tue Dec 14 18:27:22 2021 +0100 i386: Implement VxHF vector set/insert/extract with lower ABI levels This is a preparation patch that moves VxHF vector set/insert/extract expansions from AVX512FP16 ABI to lower ABIs. There are no functional changes for -mavx512fp16 and a follow-up patch is needed to actually enable VxHF vector modes for lower ABIs. 2021-12-14 Uroš Bizjak gcc/ChangeLog: PR target/103571 * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate) : Implement for TARGET_SSE2. : Implement for TARGET_AVX. : Implement for TARGET_AVX512F. (ix86_expand_vector_set_var): Handle V32HFmode without TARGET_AVX512BW. (ix86_expand_vector_extract) : Implement for TARGET_SSE2. : Implement for TARGET_AVX. : Implement for TARGET_AVX512BW. (expand_vec_perm_broadcast_1) : New. * config/i386/sse.md (VI12HF_AVX512VL): Remove TARGET_AVX512FP16 condition. (V): Ditto. (V_256_512): Ditto. (avx_vbroadcastf128_): Use V_256H mode iterator.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #22 from Hongtao.liu --- reply to Uroš Bizjak from comment #21) > (In reply to Hongtao.liu from comment #19) > > (In reply to Hongtao.liu from comment #17) > > > (In reply to Hongtao.liu from comment #16) > > > > There're already testcases for vec_extract/vec_set/vec_duplicate, but > > > > those > > > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them > > > > and > > > > test them w/o avx512fp16. > > > > > > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to > > > avx512vl/f16c so that vect-float16-1.c could be vectorized. > > > > > > vect-float16-1.c > > > > > > void > > > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, > > > _Float16 *__restrict__ c) > > > { > > > for (int i = 0; i < 256; i++) > > > a[i] = b[i] + c[i]; > > > } > > > > Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf > > addition to scalar version. And the mismatch is vectorizer assume '+/-' is > > supported by default(w/o check optab, just cehck if v8hf is supported in > > vector_mode_supported_p), and then vectorize the loop, but veclower lower > > vector operation back to scalar which create much worse code than not > > vectorized version. > > I was under impression that autovectorizer won't vectorize if > TARGET_VECTORIZE_PRFERRED_SIMD_MODE returns word_mode. Also, the word_mode is also returned for 64-bit/32-bit vector, but they're vectorized.(In > documentation for TARGET_VECTOR_MODE_SUPPORTED_P claims that only moves are > needed. > > So, it looks that middle end is somehow inconsistent here. Adding CC. > > > Could veclower try widen mode for addition, even veclower can, vNhfmode > > better be supported under avx512vl or f16c, orelse vectorized code is really > > bad, then why should we supported vector mode under generic target. > > We should use it for parameter passing, moves, inserts, extracts and > shuffles. In case of VxHF, we can reuse HImode insns for all these > operations. Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached patch looks fine, the condition should be binded to real instructions but not mode.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Uroš Bizjak changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #21 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #19) > (In reply to Hongtao.liu from comment #17) > > (In reply to Hongtao.liu from comment #16) > > > There're already testcases for vec_extract/vec_set/vec_duplicate, but > > > those > > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them > > > and > > > test them w/o avx512fp16. > > > > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to > > avx512vl/f16c so that vect-float16-1.c could be vectorized. > > > > vect-float16-1.c > > > > void > > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, > > _Float16 *__restrict__ c) > > { > > for (int i = 0; i < 256; i++) > > a[i] = b[i] + c[i]; > > } > > Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf > addition to scalar version. And the mismatch is vectorizer assume '+/-' is > supported by default(w/o check optab, just cehck if v8hf is supported in > vector_mode_supported_p), and then vectorize the loop, but veclower lower > vector operation back to scalar which create much worse code than not > vectorized version. I was under impression that autovectorizer won't vectorize if TARGET_VECTORIZE_PRFERRED_SIMD_MODE returns word_mode. Also, the documentation for TARGET_VECTOR_MODE_SUPPORTED_P claims that only moves are needed. So, it looks that middle end is somehow inconsistent here. Adding CC. > Could veclower try widen mode for addition, even veclower can, vNhfmode > better be supported under avx512vl or f16c, orelse vectorized code is really > bad, then why should we supported vector mode under generic target. We should use it for parameter passing, moves, inserts, extracts and shuffles. In case of VxHF, we can reuse HImode insns for all these operations.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #20 from Hongtao.liu --- V2HF/V4HF should also be restricted under AVX512FP16.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #19 from Hongtao.liu --- (In reply to Hongtao.liu from comment #17) > (In reply to Hongtao.liu from comment #16) > > There're already testcases for vec_extract/vec_set/vec_duplicate, but those > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and > > test them w/o avx512fp16. > > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to > avx512vl/f16c so that vect-float16-1.c could be vectorized. > > vect-float16-1.c > > void > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, > _Float16 *__restrict__ c) > { > for (int i = 0; i < 256; i++) > a[i] = b[i] + c[i]; > } Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf addition to scalar version. And the mismatch is vectorizer assume '+/-' is supported by default(w/o check optab, just cehck if v8hf is supported in vector_mode_supported_p), and then vectorize the loop, but veclower lower vector operation back to scalar which create much worse code than not vectorized version. after loop vectorizer, dump is quite optimized: vect__4.6_27 = MEM [(_Float16 *)vectp_b.4_29]; vect__6.9_24 = MEM [(_Float16 *)vectp_c.7_26]; vect__8.10_23 = vect__4.6_27 + vect__6.9_24; MEM [(_Float16 *)vectp_a.11_22] = vect__8.10_23; vectp_b.4_28 = vectp_b.4_29 + 8; vectp_c.7_25 = vectp_c.7_26 + 8; vectp_a.11_21 = vectp_a.11_22 + 8; But after veclower vect__4.6_4 = MEM [(_Float16 *)b_12(D)]; vect__6.9_5 = MEM [(_Float16 *)c_13(D)]; _28 = BIT_FIELD_REF ; _25 = BIT_FIELD_REF ; _21 = _28 + _25; _15 = BIT_FIELD_REF ; _10 = BIT_FIELD_REF ; _17 = _15 + _10; _22 = BIT_FIELD_REF ; _26 = BIT_FIELD_REF ; _29 = _22 + _26; _20 = BIT_FIELD_REF ; _3 = BIT_FIELD_REF ; _2 = _20 + _3; vect__8.10_6 = {_21, _17, _29, _2}; MEM [(_Float16 *)a_14(D)] = vect__8.10_6; vectp_b.4_8 = b_12(D) + 8; vectp_c.7_16 = c_13(D) + 8; vectp_a.11_30 = a_14(D) + 8; vect__4.6_27 = MEM [(_Float16 *)vectp_b.4_8]; vect__6.9_24 = MEM [(_Float16 *)vectp_c.7_16]; _1 = BIT_FIELD_REF ; _19 = BIT_FIELD_REF ; _31 = _1 + _19; _9 = BIT_FIELD_REF ; _32 = BIT_FIELD_REF ; _33 = _9 + _32; _34 = BIT_FIELD_REF ; _35 = BIT_FIELD_REF ; _36 = _34 + _35; _37 = BIT_FIELD_REF ; _38 = BIT_FIELD_REF ; _39 = _37 + _38; vect__8.10_23 = {_31, _33, _36, _39}; MEM [(_Float16 *)vectp_a.11_30] = vect__8.10_23; Could veclower try widen mode for addition, even veclower can, vNhfmode better be supported under avx512vl or f16c, orelse vectorized code is really bad, then why should we supported vector mode under generic target.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #18 from Hongtao.liu --- codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have vec_setv16hf_0 and with vpblendw. typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32))); typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); __m256h __attribute__ ((noinline, noclone)) foo1 (_Float16 x) { return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; } __m256h __attribute__ ((noinline, noclone)) foo2 (_Float16 *x) { return __extension__ (__m256h)(__v16hf) { *x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }; } foo1: .LFB0: .cfi_startproc vpxor %xmm1, %xmm1, %xmm1 vpbroadcastw%xmm0, %ymm0 vpblendw$1, %ymm0, %ymm1, %ymm0 vpblendd$15, %ymm0, %ymm1, %ymm1 vmovdqa %ymm1, %ymm0 ret .cfi_endproc .LFE0: .size foo1, .-foo1 .p2align 4 .globl foo2 .type foo2, @function foo2: .LFB1: .cfi_startproc vpbroadcastw(%rdi), %ymm1 vpxor %xmm0, %xmm0, %xmm0 vpblendw$1, %ymm1, %ymm0, %ymm1 vpblendd$15, %ymm1, %ymm0, %ymm0 ret .cfi_endproc
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #17 from Hongtao.liu --- (In reply to Hongtao.liu from comment #16) > There're already testcases for vec_extract/vec_set/vec_duplicate, but those > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and > test them w/o avx512fp16. Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to avx512vl/f16c so that vect-float16-1.c could be vectorized. vect-float16-1.c void foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b, _Float16 *__restrict__ c) { for (int i = 0; i < 256; i++) a[i] = b[i] + c[i]; }
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #16 from Hongtao.liu --- There're already testcases for vec_extract/vec_set/vec_duplicate, but those testcases are written under TARGET_AVX512FP16, i'll make a copy of them and test them w/o avx512fp16.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #15 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #12) > (In reply to Hongtao.liu from comment #10) > > > Sure. > Please find attached the complete patch that enables HF vector modes in > Comment #11. The patch survives bootstrap and regression test and works OK > for the following testcase: > > --cut here-- > typedef _Float16 vf64 __attribute__((vector_size(64))); > typedef _Float16 vf32 __attribute__((vector_size(32))); > typedef _Float16 vf16 __attribute__((vector_size(16))); > > #ifdef __AVX512F__ > vf64 bar64 (_Float16 a) > { > return (vf64) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, > a, a, a, a, a, a, a, a, a, a, a, a, a }; > } > #endif > > #ifdef __AVX__ > vf32 bar32 (_Float16 a) > { > return (vf32) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; > } > #endif > > #ifdef __SSE2__ > vf16 baz16 (_Float16 a) > { > return (vf16) { a, a, a, a, a, a, a, a }; > } > #endif > --cut here-- > > for -msse2, -mavx, -mavx512f and -mavx512bw. > > Perhaps some VxHF patterns need to be re-enabled for lower ABIs, but the > generic target code auto-detects them. Now the generic target code does not > assume that vector HF modes depend solely on TARGET_AVX512FP16. > > Hongtao, can you please review the patch and perhaps test it a bit more? Sure.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Uroš Bizjak changed: What|Removed |Added Attachment #51948|0 |1 is obsolete|| --- Comment #14 from Uroš Bizjak --- Created attachment 51950 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51950=edit Proposed patch to enable vector HF modes for TARGET_SSE2+ Updated patch, see Comment #13.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #13 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #12) > Hongtao, can you please review the patch and perhaps test it a bit more? This part is missing from ix86_expand_vector_set_var: --cut here @@ -15912,7 +15921,8 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx) /* 512-bits vector byte/word broadcast and comparison only available under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector when without TARGET_AVX512BW. */ - if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW) + if ((mode == V32HImode || mode == V32HFmode || mode == V64QImode) + && !TARGET_AVX512BW) { gcc_assert (TARGET_AVX512F); rtx vhi, vlo, idx_hi; @@ -15926,6 +15936,12 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx) extract_hi = gen_vec_extract_hi_v32hi; extract_lo = gen_vec_extract_lo_v32hi; } + else if (mode == V32HFmode) + { + half_mode = V16HFmode; + extract_hi = gen_vec_extract_hi_v32hf; + extract_lo = gen_vec_extract_lo_v32hf; + } else { half_mode = V32QImode; @@ -15973,7 +15989,6 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx) case E_V16SFmode: cmp_mode = V16SImode; break; - /* TARGET_AVX512FP16 implies TARGET_AVX512BW. */ case E_V8HFmode: cmp_mode = V8HImode; break; --cut here--
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #12 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #10) > Sure. Please find attached the complete patch that enables HF vector modes in Comment #11. The patch survives bootstrap and regression test and works OK for the following testcase: --cut here-- typedef _Float16 vf64 __attribute__((vector_size(64))); typedef _Float16 vf32 __attribute__((vector_size(32))); typedef _Float16 vf16 __attribute__((vector_size(16))); #ifdef __AVX512F__ vf64 bar64 (_Float16 a) { return (vf64) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } #endif #ifdef __AVX__ vf32 bar32 (_Float16 a) { return (vf32) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } #endif #ifdef __SSE2__ vf16 baz16 (_Float16 a) { return (vf16) { a, a, a, a, a, a, a, a }; } #endif --cut here-- for -msse2, -mavx, -mavx512f and -mavx512bw. Perhaps some VxHF patterns need to be re-enabled for lower ABIs, but the generic target code auto-detects them. Now the generic target code does not assume that vector HF modes depend solely on TARGET_AVX512FP16. Hongtao, can you please review the patch and perhaps test it a bit more?
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Uroš Bizjak changed: What|Removed |Added Attachment #51941|0 |1 is obsolete|| --- Comment #11 from Uroš Bizjak --- Created attachment 51948 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51948=edit Proposed patch to enable vector HF modes for TARGET_SSE2+ Attached patch enables vector HF modes for TARGET_SSE2+. In addition to enabling vector modes for SSE2, AVX and AVX512F targets, it enables corresponding move insns in sse.md, redefines some mode iterators and moves a couple of patterns from TARGET_AVX512FP16 to lower ABIs. The patch also fixes ix86_expand_vector_init_duplicate, ix86_expand_vector_extract and expand_vec_perm_broadcast_1, as mentioned in Comment #8.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #10 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #9) > (In reply to Hongtao.liu from comment #8) > > (In reply to Uroš Bizjak from comment #6) > > > (In reply to Hongtao.liu from comment #5) > > > > > > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 > > > > for > > > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in > > > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first. > > > > > > These are of the type: > > > > > > use_vector_set = TARGET_AVX512FP16 && one_var == 0; > > > gen_vec_set_0 = gen_vec_setv8hf_0; > > > > > > So they look immune to the above change. > > > > This is ok. > > > > I mean in ix86_expand_vector_init_duplicate > > > > case E_V8HFmode: > > case E_V16HFmode: > > case E_V32HFmode: > > return ix86_vector_duplicate_value (mode, target, val); > > > > AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for > > V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode. > > > > Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW, > > other wise, vector_extract go through stack. > > > > case E_V32HFmode: > > tmp = gen_reg_rtx (V16HFmode); > > if (elt < 16) > > emit_insn (gen_vec_extract_lo_v32hf (tmp, vec)); > > else > > emit_insn (gen_vec_extract_hi_v32hf (tmp, vec)); > > ix86_expand_vector_extract (false, target, tmp, elt & 15); > > return; > > > > > > others seems to be ok. > > Please note that the change mainly affects moves between SSE and GP > registers. Expansion is done way before register allocation, and if we allow > these modes earlier, I'm not sure I understand how it affects expand. > > I propose we proceed with my patch and fix eventual fallout as a follow-up. Sure.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #9 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #8) > (In reply to Uroš Bizjak from comment #6) > > (In reply to Hongtao.liu from comment #5) > > > > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 > > > for > > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in > > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first. > > > > These are of the type: > > > > use_vector_set = TARGET_AVX512FP16 && one_var == 0; > > gen_vec_set_0 = gen_vec_setv8hf_0; > > > > So they look immune to the above change. > > This is ok. > > I mean in ix86_expand_vector_init_duplicate > > case E_V8HFmode: > case E_V16HFmode: > case E_V32HFmode: > return ix86_vector_duplicate_value (mode, target, val); > > AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for > V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode. > > Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW, > other wise, vector_extract go through stack. > > case E_V32HFmode: > tmp = gen_reg_rtx (V16HFmode); > if (elt < 16) > emit_insn (gen_vec_extract_lo_v32hf (tmp, vec)); > else > emit_insn (gen_vec_extract_hi_v32hf (tmp, vec)); > ix86_expand_vector_extract (false, target, tmp, elt & 15); > return; > > > others seems to be ok. Please note that the change mainly affects moves between SSE and GP registers. Expansion is done way before register allocation, and if we allow these modes earlier, I'm not sure I understand how it affects expand. I propose we proceed with my patch and fix eventual fallout as a follow-up.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #8 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #6) > (In reply to Hongtao.liu from comment #5) > > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first. > > These are of the type: > > use_vector_set = TARGET_AVX512FP16 && one_var == 0; > gen_vec_set_0 = gen_vec_setv8hf_0; > > So they look immune to the above change. This is ok. I mean in ix86_expand_vector_init_duplicate case E_V8HFmode: case E_V16HFmode: case E_V32HFmode: return ix86_vector_duplicate_value (mode, target, val); AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode. Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW, other wise, vector_extract go through stack. case E_V32HFmode: tmp = gen_reg_rtx (V16HFmode); if (elt < 16) emit_insn (gen_vec_extract_lo_v32hf (tmp, vec)); else emit_insn (gen_vec_extract_hi_v32hf (tmp, vec)); ix86_expand_vector_extract (false, target, tmp, elt & 15); return; others seems to be ok.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #7 from Uroš Bizjak --- Created attachment 51941 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51941=edit Proposed patch The patch moves put V2HF+V4HF+V8HF/V16HF/V32HF TO VALID_SSE2/AVX256/AVX512F_REG_MODE. Also, introduces VALID_AVX512FP16_SCALAR_MODE to simplify some code. (Probably we need to add V2HFmode to VALID_INT_MODE_P, but nevertheless the patch fixes all the issues from the description): 64-bit targets: foo: movl%esi, %eax ret bar: movaps %xmm1, %xmm0 ret baz: movdqa %xmm1, %xmm0 ret and for 32-bit targets: foo: movl8(%esp), %eax ret bar: movq%mm1, %mm0 ret baz: movdqa %xmm1, %xmm0 ret The patch "regresses" 32bit testsuite: FAIL: gcc.target/i386/pr102812.c scan-assembler movdqa but only due to better generated code: pxor%xmm0, %xmm0 pinsrw $0, 4(%esp), %xmm0 ret vs. the above demonstrated mess.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #6 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #5) > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first. These are of the type: use_vector_set = TARGET_AVX512FP16 && one_var == 0; gen_vec_set_0 = gen_vec_setv8hf_0; So they look immune to the above change.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #5 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #4) > (In reply to Hongyu Wang from comment #3) > > > So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want > > to modify those function_args and function_value stuff. > > We have V8HFmode moves for TARGET_SSE, So I guress we can enable it for > VALID_SSE2_REG_MODE. There're several places in i386-expand.c which assume TARGET_AVX512FP16 for case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first. For insn patterns, it's ok since condition is binded to real instruction but not mode.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #4 from Uroš Bizjak --- (In reply to Hongyu Wang from comment #3) > So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want > to modify those function_args and function_value stuff. We have V8HFmode moves for TARGET_SSE, So I guress we can enable it for VALID_SSE2_REG_MODE.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Hongyu Wang changed: What|Removed |Added CC||wwwhhhyyy333 at gmail dot com --- Comment #3 from Hongyu Wang --- (In reply to Hongtao.liu from comment #2) > > > > Also, baz iz highly un-optimal for 32bit targets. > > Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal > on 32-bit target, maybe related to vector_mode_supported_p, but then why > codegen for baz on 64-bit target is optimal w/o TARGET_AVX512FP16? For V8HFmode that is unsupported in VALID_SSE2_REG_MODE, function_value_32 has return gen_rtx_REG (orig_mode, regno); so the retval is (reg:BLK 20 xmm0). while function_value_64 uses construct_container and returns (parallel:BLK [ (expr_list:REG_DEP_TRUE (reg:V8HF 20 xmm0) (const_int 0 [0])) ]) This could be optimized to simple movaps finally. So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want to modify those function_args and function_value stuff.
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 --- Comment #2 from Hongtao.liu --- > > Also, baz iz highly un-optimal for 32bit targets. Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal on 32-bit target, maybe related to vector_mode_supported_p, but then why codegen for baz on 64-bit target is optimal w/o TARGET_AVX512FP16?
[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1 from Hongtao.liu --- I remember psABI does not specify how to pass the 32-bit vector, PR102197 have reported similar issue.