[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-16 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #29 from Uroš Bizjak  ---
Fixed for gcc-12.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-16 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #28 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #18)
> codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have
> vec_setv16hf_0 and with vpblendw.

True, some opportunities are missing from expand_vec_perm* functions, someone
should go through these expanders and add corresponding VxHFmode near VxHImode
handling.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-16 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #27 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> > test them w/o avx512fp16.
> 
> Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> avx512vl/f16c so that vect-float16-1.c could be vectorized.
> 
> vect-float16-1.c
> 
> void
> foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
>  _Float16 *__restrict__ c)
> {
>   for (int i = 0; i < 256; i++)
> a[i] = b[i] + c[i];
> }

This was recently fixed, for -O2 -ftree-vectorize -mfp16c I get:

vpxor   %xmm2, %xmm2, %xmm2
vpinsrw $0, (%rsi,%rax), %xmm2, %xmm0
vpinsrw $0, (%rdx,%rax), %xmm2, %xmm1
vcvtph2ps   %xmm0, %xmm0
vcvtph2ps   %xmm1, %xmm1
vaddss  %xmm1, %xmm0, %xmm0
vinsertps   $0xe, %xmm0, %xmm0, %xmm0
vcvtps2ph   $4, %xmm0, %xmm0
vpextrw $0, %xmm0, (%rdi,%rax)
addq$2, %rax
cmpq$512, %rax
jne .L2
ret

While it would be nice to partially vectorize with vcvtph2ps/vcvtps2ph, the
compiler doesn't reach that far.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #26 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:271e36d9d5b3a75e7f1a927e594477e6a5dd6fc0

commit r12-6021-g271e36d9d5b3a75e7f1a927e594477e6a5dd6fc0
Author: Uros Bizjak 
Date:   Thu Dec 16 19:34:50 2021 +0100

i386: Enable VxHF vector modes lower ABI levels [PR103571]

Enable VxHF vector modes for SSE2, AVX and AVX512F ABIs.

2021-12-16  Uroš Bizjak  

gcc/ChangeLog:

PR target/103571
* config/i386/i386.h (VALID_AVX256_REG_MODE): Add V16HFmode.
(VALID_AVX256_REG_OR_OI_VHF_MODE): Replace with ...
(VALID_AVX256_REG_OR_OI_MODE): ... this.  Remove V16HFmode.
(VALID_AVX512F_SCALAR_MODE): Remove HImode and HFmode.
(VALID_AVX512FP16_SCALAR_MODE): New.
(VALID_AVX512F_REG_MODE): Add V32HFmode.
(VALID_SSE2_REG_MODE): Add V8HFmode, V4HFmode and V2HFmode.
(VALID_SSE2_REG_VHF_MODE): Remove.
(VALID_INT_MODE_P): Add V2HFmode.
* config/i386/i386.c (function_arg_advance_64):
Remove explicit mention of V16HFmode and V32HFmode.
(ix86_hard_regno_mode_ok): Remove explicit mention of XImode
and V32HFmode, use VALID_AVX512F_REG_OR_XI_MODE instead.
Use VALID_AVX512FP_SCALAR_MODE for TARGET_aVX512FP16.
Use VALID_AVX256_REG_OR_OI_MODE instead of
VALID_AVX256_REG_OR_OI_VHF_MODE and VALID_SSE2_REG_MODE instead
of VALID_SSE2_REG_VHF_MODE.
(ix86_set_reg_reg_cost): Remove usge of VALID_AVX512FP16_REG_MODE.
(ix86_vector_mode_supported): Ditto.

gcc/testsuite/ChangeLog:

PR target/103571
* gcc.target/i386/pr102812.c (dg-final): Do not scan for movdqa.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-16 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #25 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #22)
> Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached
> patch looks fine, the condition should be binded to real instructions but
> not mode.

OK, will commit the patch to enable vector modes later today:

- mavx512fp16 is unchanged
- vactorizer middle end can be (will be) fixed as a follow-up (I'll open a PR).
- it will be possible to test various ISA levels, possible ICE is relatively
easy to fix by enabling/disabling various code paths in expanders.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #51950|0   |1
is obsolete||

--- Comment #24 from Uroš Bizjak  ---
Created attachment 52002
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52002=edit
Current patch to enable vector VxHF modes for TARGET_SSE+

Current patch after preparation patch was committed.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #23 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:7a54d3deecf967029f18aa5ed1fcbdb752e213b9

commit r12-5966-g7a54d3deecf967029f18aa5ed1fcbdb752e213b9
Author: Uros Bizjak 
Date:   Tue Dec 14 18:27:22 2021 +0100

i386: Implement VxHF vector set/insert/extract with lower ABI levels

This is a preparation patch that moves VxHF vector set/insert/extract
expansions from AVX512FP16 ABI to lower ABIs.  There are no functional
changes for -mavx512fp16 and a follow-up patch is needed to actually
enable VxHF vector modes for lower ABIs.

2021-12-14  Uroš Bizjak  

gcc/ChangeLog:

PR target/103571
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate)
: Implement for TARGET_SSE2.
: Implement for TARGET_AVX.
: Implement for TARGET_AVX512F.
(ix86_expand_vector_set_var): Handle V32HFmode
without TARGET_AVX512BW.
(ix86_expand_vector_extract)
: Implement for TARGET_SSE2.
: Implement for TARGET_AVX.
: Implement for TARGET_AVX512BW.
(expand_vec_perm_broadcast_1) : New.
* config/i386/sse.md (VI12HF_AVX512VL): Remove
TARGET_AVX512FP16 condition.
(V): Ditto.
(V_256_512): Ditto.
(avx_vbroadcastf128_): Use V_256H mode iterator.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #22 from Hongtao.liu  ---

 reply to Uroš Bizjak from comment #21)
> (In reply to Hongtao.liu from comment #19)
> > (In reply to Hongtao.liu from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > > > There're already testcases for vec_extract/vec_set/vec_duplicate, but 
> > > > those
> > > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them 
> > > > and
> > > > test them w/o avx512fp16.
> > > 
> > > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> > > avx512vl/f16c so that vect-float16-1.c could be vectorized.
> > > 
> > > vect-float16-1.c
> > > 
> > > void
> > > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> > >  _Float16 *__restrict__ c)
> > > {
> > >   for (int i = 0; i < 256; i++)
> > > a[i] = b[i] + c[i];
> > > }
> > 
> > Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf
> > addition to scalar version. And the mismatch is vectorizer assume '+/-' is
> > supported by default(w/o check optab, just cehck if v8hf is supported in
> > vector_mode_supported_p), and then vectorize the loop, but veclower lower
> > vector operation back to scalar which create much worse code than not
> > vectorized version. 
> 
> I was under impression that autovectorizer won't vectorize if
> TARGET_VECTORIZE_PRFERRED_SIMD_MODE returns word_mode. Also, the
word_mode is also returned for 64-bit/32-bit vector, but they're vectorized.(In
> documentation for TARGET_VECTOR_MODE_SUPPORTED_P claims that only moves are
> needed.
> 
> So, it looks that middle end is somehow inconsistent here. Adding CC.
> 
> > Could veclower try widen mode for addition, even veclower can, vNhfmode
> > better be supported under avx512vl or f16c, orelse vectorized code is really
> > bad, then why should we supported vector mode under generic target.
> 
> We should use it for parameter passing, moves, inserts, extracts and
> shuffles. In case of VxHF, we can reuse HImode insns for all these
> operations.

Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached patch
looks fine, the condition should be binded to real instructions but not mode.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-09 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #21 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #19)
> (In reply to Hongtao.liu from comment #17)
> > (In reply to Hongtao.liu from comment #16)
> > > There're already testcases for vec_extract/vec_set/vec_duplicate, but 
> > > those
> > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them 
> > > and
> > > test them w/o avx512fp16.
> > 
> > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> > avx512vl/f16c so that vect-float16-1.c could be vectorized.
> > 
> > vect-float16-1.c
> > 
> > void
> > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> >  _Float16 *__restrict__ c)
> > {
> >   for (int i = 0; i < 256; i++)
> > a[i] = b[i] + c[i];
> > }
> 
> Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf
> addition to scalar version. And the mismatch is vectorizer assume '+/-' is
> supported by default(w/o check optab, just cehck if v8hf is supported in
> vector_mode_supported_p), and then vectorize the loop, but veclower lower
> vector operation back to scalar which create much worse code than not
> vectorized version. 

I was under impression that autovectorizer won't vectorize if
TARGET_VECTORIZE_PRFERRED_SIMD_MODE returns word_mode. Also, the documentation
for TARGET_VECTOR_MODE_SUPPORTED_P claims that only moves are needed.

So, it looks that middle end is somehow inconsistent here. Adding CC.

> Could veclower try widen mode for addition, even veclower can, vNhfmode
> better be supported under avx512vl or f16c, orelse vectorized code is really
> bad, then why should we supported vector mode under generic target.

We should use it for parameter passing, moves, inserts, extracts and shuffles.
In case of VxHF, we can reuse HImode insns for all these operations.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #20 from Hongtao.liu  ---
V2HF/V4HF should also be restricted under AVX512FP16.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #19 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> > test them w/o avx512fp16.
> 
> Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> avx512vl/f16c so that vect-float16-1.c could be vectorized.
> 
> vect-float16-1.c
> 
> void
> foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
>  _Float16 *__restrict__ c)
> {
>   for (int i = 0; i < 256; i++)
> a[i] = b[i] + c[i];
> }

Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf addition
to scalar version. And the mismatch is vectorizer assume '+/-' is supported by
default(w/o check optab, just cehck if v8hf is supported in
vector_mode_supported_p), and then vectorize the loop, but veclower lower
vector operation back to scalar which create much worse code than not
vectorized version. 

after loop vectorizer, dump is quite optimized:
  vect__4.6_27 = MEM  [(_Float16 *)vectp_b.4_29];
  vect__6.9_24 = MEM  [(_Float16 *)vectp_c.7_26];
  vect__8.10_23 = vect__4.6_27 + vect__6.9_24;
  MEM  [(_Float16 *)vectp_a.11_22] = vect__8.10_23;
  vectp_b.4_28 = vectp_b.4_29 + 8;
  vectp_c.7_25 = vectp_c.7_26 + 8;
  vectp_a.11_21 = vectp_a.11_22 + 8;

But after veclower

  vect__4.6_4 = MEM  [(_Float16 *)b_12(D)];
  vect__6.9_5 = MEM  [(_Float16 *)c_13(D)];
  _28 = BIT_FIELD_REF ;
  _25 = BIT_FIELD_REF ;
  _21 = _28 + _25;
  _15 = BIT_FIELD_REF ;
  _10 = BIT_FIELD_REF ;
  _17 = _15 + _10;
  _22 = BIT_FIELD_REF ;
  _26 = BIT_FIELD_REF ;
  _29 = _22 + _26;
  _20 = BIT_FIELD_REF ;
  _3 = BIT_FIELD_REF ;
  _2 = _20 + _3;
  vect__8.10_6 = {_21, _17, _29, _2};
  MEM  [(_Float16 *)a_14(D)] = vect__8.10_6;
  vectp_b.4_8 = b_12(D) + 8;
  vectp_c.7_16 = c_13(D) + 8;
  vectp_a.11_30 = a_14(D) + 8;
  vect__4.6_27 = MEM  [(_Float16 *)vectp_b.4_8];
  vect__6.9_24 = MEM  [(_Float16 *)vectp_c.7_16];
  _1 = BIT_FIELD_REF ;
  _19 = BIT_FIELD_REF ;
  _31 = _1 + _19;
  _9 = BIT_FIELD_REF ;
  _32 = BIT_FIELD_REF ;
  _33 = _9 + _32;
  _34 = BIT_FIELD_REF ;
  _35 = BIT_FIELD_REF ;
  _36 = _34 + _35;
  _37 = BIT_FIELD_REF ;
  _38 = BIT_FIELD_REF ;
  _39 = _37 + _38;
  vect__8.10_23 = {_31, _33, _36, _39};
  MEM  [(_Float16 *)vectp_a.11_30] = vect__8.10_23;


Could veclower try widen mode for addition, even veclower can, vNhfmode better
be supported under avx512vl or f16c, orelse vectorized code is really bad, then
why should we supported vector mode under generic target.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #18 from Hongtao.liu  ---
codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have
vec_setv16hf_0 and with vpblendw.

typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));

__m256h
__attribute__ ((noinline, noclone))
foo1 (_Float16 x)
{
  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
0.0f, 0.0f, 0.0f, 0.0f,
0.0f, 0.0f, 0.0f, 0.0f,
0.0f, 0.0f, 0.0f, 0.0f };
}

__m256h
__attribute__ ((noinline, noclone))
foo2 (_Float16 *x)
{
  return __extension__ (__m256h)(__v16hf) { *x, 0.0f, 0.0f, 0.0f,
0.0f, 0.0f, 0.0f, 0.0f,
0.0f, 0.0f, 0.0f, 0.0f,
0.0f, 0.0f, 0.0f, 0.0f };
}


foo1:
.LFB0:
.cfi_startproc
vpxor   %xmm1, %xmm1, %xmm1
vpbroadcastw%xmm0, %ymm0
vpblendw$1, %ymm0, %ymm1, %ymm0
vpblendd$15, %ymm0, %ymm1, %ymm1
vmovdqa %ymm1, %ymm0
ret
.cfi_endproc
.LFE0:
.size   foo1, .-foo1
.p2align 4
.globl  foo2
.type   foo2, @function
foo2:
.LFB1:
.cfi_startproc
vpbroadcastw(%rdi), %ymm1
vpxor   %xmm0, %xmm0, %xmm0
vpblendw$1, %ymm1, %ymm0, %ymm1
vpblendd$15, %ymm1, %ymm0, %ymm0
ret
.cfi_endproc

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #17 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #16)
> There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> test them w/o avx512fp16.

Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
avx512vl/f16c so that vect-float16-1.c could be vectorized.

vect-float16-1.c

void
foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
 _Float16 *__restrict__ c)
{
  for (int i = 0; i < 256; i++)
a[i] = b[i] + c[i];
}

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #16 from Hongtao.liu  ---
There're already testcases for vec_extract/vec_set/vec_duplicate, but those
testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
test them w/o avx512fp16.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #15 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #12)
> (In reply to Hongtao.liu from comment #10)
> 
> > Sure.
> Please find attached the complete patch that enables HF vector modes in
> Comment #11. The patch survives bootstrap and regression test and works OK
> for the following testcase:
> 
> --cut here--
> typedef _Float16 vf64 __attribute__((vector_size(64)));
> typedef _Float16 vf32 __attribute__((vector_size(32)));
> typedef _Float16 vf16 __attribute__((vector_size(16)));
> 
> #ifdef __AVX512F__
> vf64 bar64 (_Float16 a)
> {
>   return (vf64) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
> a, a, a, a, a, a, a, a, a, a, a, a, a };
> }
> #endif
> 
> #ifdef __AVX__
> vf32 bar32 (_Float16 a)
> {
>   return (vf32) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
> }
> #endif
> 
> #ifdef __SSE2__
> vf16 baz16 (_Float16 a)
> {
>   return (vf16) { a, a, a, a, a, a, a, a };
> }
> #endif
> --cut here--
> 
> for -msse2, -mavx, -mavx512f and -mavx512bw.
> 
> Perhaps some VxHF patterns need to be re-enabled for lower ABIs, but the
> generic target code auto-detects them. Now the generic target code does not
> assume that vector HF modes depend solely on TARGET_AVX512FP16.
> 
> Hongtao, can you please review the patch and perhaps test it a bit more?

Sure.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #51948|0   |1
is obsolete||

--- Comment #14 from Uroš Bizjak  ---
Created attachment 51950
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51950=edit
Proposed patch to enable vector HF modes for TARGET_SSE2+

Updated patch, see Comment #13.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #13 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #12)
> Hongtao, can you please review the patch and perhaps test it a bit more?

This part is missing from ix86_expand_vector_set_var:

--cut here
@@ -15912,7 +15921,8 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx
idx)
   /* 512-bits vector byte/word broadcast and comparison only available
  under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
  when without TARGET_AVX512BW.  */
-  if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW)
+  if ((mode == V32HImode || mode == V32HFmode || mode == V64QImode)
+  && !TARGET_AVX512BW)
 {
   gcc_assert (TARGET_AVX512F);
   rtx vhi, vlo, idx_hi;
@@ -15926,6 +15936,12 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx
idx)
  extract_hi = gen_vec_extract_hi_v32hi;
  extract_lo = gen_vec_extract_lo_v32hi;
}
+  else if (mode == V32HFmode)
+   {
+ half_mode = V16HFmode;
+ extract_hi = gen_vec_extract_hi_v32hf;
+ extract_lo = gen_vec_extract_lo_v32hf;
+   }
   else
{
  half_mode = V32QImode;
@@ -15973,7 +15989,6 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx
idx)
case E_V16SFmode:
  cmp_mode = V16SImode;
  break;
-   /* TARGET_AVX512FP16 implies TARGET_AVX512BW.  */
case E_V8HFmode:
  cmp_mode = V8HImode;
  break;
--cut here--

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #12 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #10)

> Sure.
Please find attached the complete patch that enables HF vector modes in Comment
#11. The patch survives bootstrap and regression test and works OK for the
following testcase:

--cut here--
typedef _Float16 vf64 __attribute__((vector_size(64)));
typedef _Float16 vf32 __attribute__((vector_size(32)));
typedef _Float16 vf16 __attribute__((vector_size(16)));

#ifdef __AVX512F__
vf64 bar64 (_Float16 a)
{
  return (vf64) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
a, a, a, a, a, a, a, a, a, a, a, a };
}
#endif

#ifdef __AVX__
vf32 bar32 (_Float16 a)
{
  return (vf32) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
}
#endif

#ifdef __SSE2__
vf16 baz16 (_Float16 a)
{
  return (vf16) { a, a, a, a, a, a, a, a };
}
#endif
--cut here--

for -msse2, -mavx, -mavx512f and -mavx512bw.

Perhaps some VxHF patterns need to be re-enabled for lower ABIs, but the
generic target code auto-detects them. Now the generic target code does not
assume that vector HF modes depend solely on TARGET_AVX512FP16.

Hongtao, can you please review the patch and perhaps test it a bit more?

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #51941|0   |1
is obsolete||

--- Comment #11 from Uroš Bizjak  ---
Created attachment 51948
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51948=edit
Proposed patch to enable vector HF modes for TARGET_SSE2+

Attached patch enables vector HF modes for TARGET_SSE2+. In addition to
enabling vector modes for SSE2, AVX and AVX512F targets, it enables
corresponding move insns in sse.md, redefines some mode iterators and moves a
couple of patterns  from TARGET_AVX512FP16 to lower ABIs.

The patch also fixes ix86_expand_vector_init_duplicate,
ix86_expand_vector_extract and expand_vec_perm_broadcast_1, as mentioned in
Comment #8.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #10 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Hongtao.liu from comment #8)
> > (In reply to Uroš Bizjak from comment #6)
> > > (In reply to Hongtao.liu from comment #5)
> > > 
> > > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 
> > > > for
> > > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> > > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.
> > > 
> > > These are of the type:
> > > 
> > >   use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> > >   gen_vec_set_0 = gen_vec_setv8hf_0;
> > > 
> > > So they look immune to the above change.
> > 
> > This is ok.
> > 
> > I mean in ix86_expand_vector_init_duplicate
> > 
> > case E_V8HFmode:
> > case E_V16HFmode:
> > case E_V32HFmode:
> >   return ix86_vector_duplicate_value (mode, target, val);
> > 
> > AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for
> > V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode.
> > 
> > Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW,
> > other wise, vector_extract go through stack.
> > 
> > case E_V32HFmode:
> >   tmp = gen_reg_rtx (V16HFmode);
> >   if (elt < 16)
> > emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
> >   else
> > emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
> >   ix86_expand_vector_extract (false, target, tmp, elt & 15);
> >   return;
> > 
> > 
> > others seems to be ok.
> 
> Please note that the change mainly affects moves between SSE and GP
> registers. Expansion is done way before register allocation, and if we allow
> these modes earlier, I'm not sure I understand how it affects expand.
> 
> I propose we proceed with my patch and fix eventual fallout as a follow-up.

Sure.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #9 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Uroš Bizjak from comment #6)
> > (In reply to Hongtao.liu from comment #5)
> > 
> > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 
> > > for
> > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.
> > 
> > These are of the type:
> > 
> >   use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> >   gen_vec_set_0 = gen_vec_setv8hf_0;
> > 
> > So they look immune to the above change.
> 
> This is ok.
> 
> I mean in ix86_expand_vector_init_duplicate
> 
> case E_V8HFmode:
> case E_V16HFmode:
> case E_V32HFmode:
>   return ix86_vector_duplicate_value (mode, target, val);
> 
> AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for
> V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode.
> 
> Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW,
> other wise, vector_extract go through stack.
> 
> case E_V32HFmode:
>   tmp = gen_reg_rtx (V16HFmode);
>   if (elt < 16)
>   emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
>   else
>   emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
>   ix86_expand_vector_extract (false, target, tmp, elt & 15);
>   return;
> 
> 
> others seems to be ok.

Please note that the change mainly affects moves between SSE and GP registers.
Expansion is done way before register allocation, and if we allow these modes
earlier, I'm not sure I understand how it affects expand.

I propose we proceed with my patch and fix eventual fallout as a follow-up.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #8 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to Hongtao.liu from comment #5)
> 
> > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.
> 
> These are of the type:
> 
>   use_vector_set = TARGET_AVX512FP16 && one_var == 0;
>   gen_vec_set_0 = gen_vec_setv8hf_0;
> 
> So they look immune to the above change.

This is ok.

I mean in ix86_expand_vector_init_duplicate

case E_V8HFmode:
case E_V16HFmode:
case E_V32HFmode:
  return ix86_vector_duplicate_value (mode, target, val);

AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for
V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode.

Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW,
other wise, vector_extract go through stack.

case E_V32HFmode:
  tmp = gen_reg_rtx (V16HFmode);
  if (elt < 16)
emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
  else
emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
  ix86_expand_vector_extract (false, target, tmp, elt & 15);
  return;


others seems to be ok.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #7 from Uroš Bizjak  ---
Created attachment 51941
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51941=edit
Proposed patch

The patch moves put V2HF+V4HF+V8HF/V16HF/V32HF TO
VALID_SSE2/AVX256/AVX512F_REG_MODE.

Also, introduces VALID_AVX512FP16_SCALAR_MODE to simplify some code.

(Probably we need to add V2HFmode to VALID_INT_MODE_P, but nevertheless the
patch fixes all the issues from the description):

64-bit targets:

foo:
movl%esi, %eax
ret

bar:
movaps  %xmm1, %xmm0
ret

baz:
movdqa  %xmm1, %xmm0
ret

and for 32-bit targets:

foo:
movl8(%esp), %eax
ret

bar:
movq%mm1, %mm0
ret

baz:
movdqa  %xmm1, %xmm0
ret

The patch "regresses" 32bit testsuite:

FAIL: gcc.target/i386/pr102812.c scan-assembler movdqa

but only due to better generated code:

pxor%xmm0, %xmm0
pinsrw  $0, 4(%esp), %xmm0
ret

vs. the above demonstrated mess.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #6 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #5)

> There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.

These are of the type:

  use_vector_set = TARGET_AVX512FP16 && one_var == 0;
  gen_vec_set_0 = gen_vec_setv8hf_0;

So they look immune to the above change.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #5 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongyu Wang from comment #3)
>  
> > So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want
> > to modify those function_args and function_value stuff.
> 
> We have V8HFmode moves for TARGET_SSE, So I guress we can enable it for
> VALID_SSE2_REG_MODE.

There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.

For insn patterns, it's ok since condition is binded to real instruction but
not mode.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #4 from Uroš Bizjak  ---
(In reply to Hongyu Wang from comment #3)

> So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want
> to modify those function_args and function_value stuff.

We have V8HFmode moves for TARGET_SSE, So I guress we can enable it for
VALID_SSE2_REG_MODE.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-06 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Hongyu Wang  changed:

   What|Removed |Added

 CC||wwwhhhyyy333 at gmail dot com

--- Comment #3 from Hongyu Wang  ---
(In reply to Hongtao.liu from comment #2)
> > 
> > Also, baz iz highly un-optimal for 32bit targets.
> 
> Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal
> on 32-bit target, maybe related to vector_mode_supported_p, but then why
> codegen for baz on 64-bit target is optimal w/o TARGET_AVX512FP16?

For V8HFmode that is unsupported in VALID_SSE2_REG_MODE, function_value_32 has

return gen_rtx_REG (orig_mode, regno); 

so the retval is (reg:BLK 20 xmm0).

while function_value_64 uses construct_container and returns

(parallel:BLK [   
(expr_list:REG_DEP_TRUE (reg:V8HF 20 xmm0)
(const_int 0 [0]))
])

This could be optimized to simple movaps finally.

So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want to
modify those function_args and function_value stuff.

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #2 from Hongtao.liu  ---

> 
> Also, baz iz highly un-optimal for 32bit targets.

Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal on
32-bit target, maybe related to vector_mode_supported_p, but then why codegen
for baz on 64-bit target is optimal w/o TARGET_AVX512FP16?

[Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues

2021-12-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #1 from Hongtao.liu  ---
I remember psABI does not specify how to pass the 32-bit vector, PR102197 have
reported similar issue.