Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
> int32_t x = (int32_t)0x1.0p32; > int32_t y = (int32_t)(int64_t)0x1.0p32; > > sets x to 2147483647 and y to 0. >>> >>> Hmm, good question. GENERIC has a direct truncation to unsigned char >>> for example, the C standard generally says if the integral part cannot >>> be represented then the behavior is undefined. So I think we should be >>> safe here (0x1.0p32 doesn't fit an int). >> >> We should be following Annex F (unspecified value plus "invalid" exception >> for out-of-range floating-to-integer conversions rather than undefined >> behavior). But we don't achieve that very well at present (see bug 93806 >> comments 27-29 for examples of how such conversions produce wobbly >> values). > > That would mean guarding this with !flag_trapping_math would be the > appropriate > thing to do. Follow-up on this: When we do a NARROW_DST multiple-step conversion we do not guard with !flag_trapping_math. Is this intentional and if so, why do we only require it for the NONE case? I was thinking of implementing an expander for double -> int16 conversion for RISC-V with multiple steps but that would just circumvent the !flag_trapping_math check. Then I wondered why we vectorize this using multiple steps on x86 even with trapping math and it turns out that the difference is the NARROW_DST modifier but we emit VEC_PACK_FIX_TRUNC_EXPR anyway. Is this "safe"? Regards Robin
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
Hello, liuhongt via Gcc-patches writes: > I notice there's some refactor in vectorizable_conversion > for code_helper,so I've adjusted my patch to that. > Here's the patch I'm going to commit. > > We have already use intermidate type in case WIDEN, but not for NONE, > this patch extended that. > > gcc/ChangeLog: > > PR target/110018 > * tree-vect-stmts.cc (vectorizable_conversion): Use > intermiediate integer type for float_expr/fix_trunc_expr when > direct optab is not existed. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr110018-1.c: New test. > --- > gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ > gcc/tree-vect-stmts.cc | 66 ++- > 2 files changed, 158 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c Our CI detected regressions on aarch64-linux-gnu with this commit, in some aarch64-sve and gfortran tests. I filed the following bug report with the details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371 Could you please check? -- Thiago
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Wed, Jun 21, 2023 at 10:39 PM Joseph Myers wrote: > > On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote: > > > > > int32_t x = (int32_t)0x1.0p32; > > > > int32_t y = (int32_t)(int64_t)0x1.0p32; > > > > > > > > sets x to 2147483647 and y to 0. > > > > Hmm, good question. GENERIC has a direct truncation to unsigned char > > for example, the C standard generally says if the integral part cannot > > be represented then the behavior is undefined. So I think we should be > > safe here (0x1.0p32 doesn't fit an int). > > We should be following Annex F (unspecified value plus "invalid" exception > for out-of-range floating-to-integer conversions rather than undefined > behavior). But we don't achieve that very well at present (see bug 93806 > comments 27-29 for examples of how such conversions produce wobbly > values). That would mean guarding this with !flag_trapping_math would be the appropriate thing to do. Richard. > -- > Joseph S. Myers > jos...@codesourcery.com
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote: > > > int32_t x = (int32_t)0x1.0p32; > > > int32_t y = (int32_t)(int64_t)0x1.0p32; > > > > > > sets x to 2147483647 and y to 0. > > Hmm, good question. GENERIC has a direct truncation to unsigned char > for example, the C standard generally says if the integral part cannot > be represented then the behavior is undefined. So I think we should be > safe here (0x1.0p32 doesn't fit an int). We should be following Annex F (unspecified value plus "invalid" exception for out-of-range floating-to-integer conversions rather than undefined behavior). But we don't achieve that very well at present (see bug 93806 comments 27-29 for examples of how such conversions produce wobbly values). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Wed, Jun 21, 2023 at 10:22 AM Richard Biener wrote: > > > + /* For conversions between float and smaller integer types try > > > whether we > > > +can use intermediate signed integer types to support the > > > +conversion. */ > > > > I'm trying to enhance testcase coverage with explicit signed/unsigned > > types (patch attached), and I have noticed that zero-extension is used > > for unsigned types. So, the above comment that mentions only signed > > integer types is not entirely correct. > > The comment says the intermediate sized vector types are always > signed (because float conversions to/from unsigned are always somewhat > awkward), but yes, if the original type was unsigned zero-extension is > used and if it was signed sign-extension. > > The testcase adjustments / additions look good to me btw. Thanks, pushed with the following ChangeLog: vect: Add testcases for unsigned conversions [PR110018] Also test conversions with unsigned types. PR target/110018 gcc/testsuite/ChangeLog: * gcc.target/i386/pr110018-1.c: Use explicit signed types. * gcc.target/i386/pr110018-2.c: New test. Tested on x86_64-linux-gnu {,-m32}. Uros.
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
Richard Biener writes: > On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford > wrote: >> >> Richard Sandiford writes: >> > Richard Biener via Gcc-patches writes: >> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches >> >> wrote: >> >>> >> >>> We have already use intermidate type in case WIDEN, but not for NONE, >> >>> this patch extended that. >> >>> >> >>> I didn't do that in pattern recog since we need to know whether the >> >>> stmt belongs to any slp_node to decide the vectype, the related optabs >> >>> are checked according to vectype_in and vectype_out. For non-slp case, >> >>> vec_pack/unpack are always used when lhs has different size from rhs, >> >>> for slp case, sometimes vec_pack/unpack is used, somethings >> >>> direct conversion is used. >> >>> >> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> >>> Ok for trunk? >> >>> >> >>> gcc/ChangeLog: >> >>> >> >>> PR target/110018 >> >>> * tree-vect-stmts.cc (vectorizable_conversion): Use >> >>> intermiediate integer type for float_expr/fix_trunc_expr when >> >>> direct optab is not existed. >> >>> >> >>> gcc/testsuite/ChangeLog: >> >>> >> >>> * gcc.target/i386/pr110018-1.c: New test. >> >>> --- >> >>> gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ >> >>> gcc/tree-vect-stmts.cc | 56 - >> >>> 2 files changed, 149 insertions(+), 1 deletion(-) >> >>> create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> >> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> new file mode 100644 >> >>> index 000..b1baffd7af1 >> >>> --- /dev/null >> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> @@ -0,0 +1,94 @@ >> >>> +/* { dg-do compile } */ >> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ >> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ >> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ >> >>> + >> >>> +void >> >>> +foo (double* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo1 (float* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo2 (_Float16* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> + a[4] = b[4]; >> >>> + a[5] = b[5]; >> >>> + a[6] = b[6]; >> >>> + a[7] = b[7]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo3 (double* __restrict a, short* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo4 (float* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo5 (double* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo6 (float* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo7 (_Float16* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> + a[4] = b[4]; >> >>> + a[5] = b[5]; >> >>> + a[6] = b[6]; >> >>> + a[7] = b[7]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo8 (double* __restrict b, short* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo9 (float* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> >>> index bd3b07a3aa1..1118c89686d 100644 >> >>> --- a/gcc/tree-vect-stmts.cc >> >>> +++ b/gcc/tree-vect-stmts.cc >> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, >> >>> return false; >> >>>if (supportable_convert_operation (code, vectype_out, vectype_in, >> >>> &code1)) >> >>> break; >> >> >> >> A comment would be nice here. Like >> >> >> >>/* For conversions between float and smaller integer types try whether >> >> we can >> >> use intermediate signed integer types to support the conversion. */ >> >> >> >>> + if ((code == FLOAT_EXPR >> >>> + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) >> >>> + || (code == FIX_TRUNC_EXPR >> >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) >> > >> > Is the FIX_TRUNC_EXPR case safe without some flag? >> > >> > #include >> > int32_t x = (int32_t)0x1.0p32; >> > int32_t y = (int32_t)(int64_t)0x1.0p32; >> > >> > sets x to 2147483647 and y to 0. > > Hmm, good question. GENERIC has a direct truncation to unsigne
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford wrote: > > Richard Sandiford writes: > > Richard Biener via Gcc-patches writes: > >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches > >> wrote: > >>> > >>> We have already use intermidate type in case WIDEN, but not for NONE, > >>> this patch extended that. > >>> > >>> I didn't do that in pattern recog since we need to know whether the > >>> stmt belongs to any slp_node to decide the vectype, the related optabs > >>> are checked according to vectype_in and vectype_out. For non-slp case, > >>> vec_pack/unpack are always used when lhs has different size from rhs, > >>> for slp case, sometimes vec_pack/unpack is used, somethings > >>> direct conversion is used. > >>> > >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > >>> Ok for trunk? > >>> > >>> gcc/ChangeLog: > >>> > >>> PR target/110018 > >>> * tree-vect-stmts.cc (vectorizable_conversion): Use > >>> intermiediate integer type for float_expr/fix_trunc_expr when > >>> direct optab is not existed. > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> * gcc.target/i386/pr110018-1.c: New test. > >>> --- > >>> gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ > >>> gcc/tree-vect-stmts.cc | 56 - > >>> 2 files changed, 149 insertions(+), 1 deletion(-) > >>> create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c > >>> > >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c > >>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c > >>> new file mode 100644 > >>> index 000..b1baffd7af1 > >>> --- /dev/null > >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c > >>> @@ -0,0 +1,94 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ > >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ > >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ > >>> + > >>> +void > >>> +foo (double* __restrict a, char* b) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> +} > >>> + > >>> +void > >>> +foo1 (float* __restrict a, char* b) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> + a[2] = b[2]; > >>> + a[3] = b[3]; > >>> +} > >>> + > >>> +void > >>> +foo2 (_Float16* __restrict a, char* b) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> + a[2] = b[2]; > >>> + a[3] = b[3]; > >>> + a[4] = b[4]; > >>> + a[5] = b[5]; > >>> + a[6] = b[6]; > >>> + a[7] = b[7]; > >>> +} > >>> + > >>> +void > >>> +foo3 (double* __restrict a, short* b) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> +} > >>> + > >>> +void > >>> +foo4 (float* __restrict a, char* b) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> + a[2] = b[2]; > >>> + a[3] = b[3]; > >>> +} > >>> + > >>> +void > >>> +foo5 (double* __restrict b, char* a) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> +} > >>> + > >>> +void > >>> +foo6 (float* __restrict b, char* a) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> + a[2] = b[2]; > >>> + a[3] = b[3]; > >>> +} > >>> + > >>> +void > >>> +foo7 (_Float16* __restrict b, char* a) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> + a[2] = b[2]; > >>> + a[3] = b[3]; > >>> + a[4] = b[4]; > >>> + a[5] = b[5]; > >>> + a[6] = b[6]; > >>> + a[7] = b[7]; > >>> +} > >>> + > >>> +void > >>> +foo8 (double* __restrict b, short* a) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> +} > >>> + > >>> +void > >>> +foo9 (float* __restrict b, char* a) > >>> +{ > >>> + a[0] = b[0]; > >>> + a[1] = b[1]; > >>> + a[2] = b[2]; > >>> + a[3] = b[3]; > >>> +} > >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > >>> index bd3b07a3aa1..1118c89686d 100644 > >>> --- a/gcc/tree-vect-stmts.cc > >>> +++ b/gcc/tree-vect-stmts.cc > >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, > >>> return false; > >>>if (supportable_convert_operation (code, vectype_out, vectype_in, > >>> &code1)) > >>> break; > >> > >> A comment would be nice here. Like > >> > >>/* For conversions between float and smaller integer types try whether > >> we can > >> use intermediate signed integer types to support the conversion. */ > >> > >>> + if ((code == FLOAT_EXPR > >>> + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) > >>> + || (code == FIX_TRUNC_EXPR > >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) > > > > Is the FIX_TRUNC_EXPR case safe without some flag? > > > > #include > > int32_t x = (int32_t)0x1.0p32; > > int32_t y = (int32_t)(int64_t)0x1.0p32; > > > > sets x to 2147483647 and y to 0. Hmm, good question. GENERIC has a direct truncation to unsigned char for example, the C standard generally says if the integral part cannot be represented then the behavior is undefined. So I think we should be safe here (0x1.0p32 doesn't fit an int). > A
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
Richard Sandiford writes: > Richard Biener via Gcc-patches writes: >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches >> wrote: >>> >>> We have already use intermidate type in case WIDEN, but not for NONE, >>> this patch extended that. >>> >>> I didn't do that in pattern recog since we need to know whether the >>> stmt belongs to any slp_node to decide the vectype, the related optabs >>> are checked according to vectype_in and vectype_out. For non-slp case, >>> vec_pack/unpack are always used when lhs has different size from rhs, >>> for slp case, sometimes vec_pack/unpack is used, somethings >>> direct conversion is used. >>> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >>> Ok for trunk? >>> >>> gcc/ChangeLog: >>> >>> PR target/110018 >>> * tree-vect-stmts.cc (vectorizable_conversion): Use >>> intermiediate integer type for float_expr/fix_trunc_expr when >>> direct optab is not existed. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/i386/pr110018-1.c: New test. >>> --- >>> gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ >>> gcc/tree-vect-stmts.cc | 56 - >>> 2 files changed, 149 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c >>> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c >>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c >>> new file mode 100644 >>> index 000..b1baffd7af1 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c >>> @@ -0,0 +1,94 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ >>> + >>> +void >>> +foo (double* __restrict a, char* b) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> +} >>> + >>> +void >>> +foo1 (float* __restrict a, char* b) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> + a[2] = b[2]; >>> + a[3] = b[3]; >>> +} >>> + >>> +void >>> +foo2 (_Float16* __restrict a, char* b) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> + a[2] = b[2]; >>> + a[3] = b[3]; >>> + a[4] = b[4]; >>> + a[5] = b[5]; >>> + a[6] = b[6]; >>> + a[7] = b[7]; >>> +} >>> + >>> +void >>> +foo3 (double* __restrict a, short* b) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> +} >>> + >>> +void >>> +foo4 (float* __restrict a, char* b) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> + a[2] = b[2]; >>> + a[3] = b[3]; >>> +} >>> + >>> +void >>> +foo5 (double* __restrict b, char* a) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> +} >>> + >>> +void >>> +foo6 (float* __restrict b, char* a) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> + a[2] = b[2]; >>> + a[3] = b[3]; >>> +} >>> + >>> +void >>> +foo7 (_Float16* __restrict b, char* a) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> + a[2] = b[2]; >>> + a[3] = b[3]; >>> + a[4] = b[4]; >>> + a[5] = b[5]; >>> + a[6] = b[6]; >>> + a[7] = b[7]; >>> +} >>> + >>> +void >>> +foo8 (double* __restrict b, short* a) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> +} >>> + >>> +void >>> +foo9 (float* __restrict b, char* a) >>> +{ >>> + a[0] = b[0]; >>> + a[1] = b[1]; >>> + a[2] = b[2]; >>> + a[3] = b[3]; >>> +} >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >>> index bd3b07a3aa1..1118c89686d 100644 >>> --- a/gcc/tree-vect-stmts.cc >>> +++ b/gcc/tree-vect-stmts.cc >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, >>> return false; >>>if (supportable_convert_operation (code, vectype_out, vectype_in, >>> &code1)) >>> break; >> >> A comment would be nice here. Like >> >>/* For conversions between float and smaller integer types try whether we >> can >> use intermediate signed integer types to support the conversion. */ >> >>> + if ((code == FLOAT_EXPR >>> + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) >>> + || (code == FIX_TRUNC_EXPR >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) > > Is the FIX_TRUNC_EXPR case safe without some flag? > > #include > int32_t x = (int32_t)0x1.0p32; > int32_t y = (int32_t)(int64_t)0x1.0p32; > > sets x to 2147483647 and y to 0. Also, I think multi_step_cvt should influence the costs, since at the moment we cost one statement but generate two. This makes a difference for SVE with VECT_COMPARE_COSTS. Would changing it to: vect_model_simple_cost (vinfo, stmt_info, ncopies * (multi_step_cvt + 1), dt, ndts, slp_node, cost_vec); be OK? There again, I wonder if we should handle this using patterns instead. That makes both conversions explicit and therefore easier to cost. E.g. for SVE, an integer extension is free if the source is a load, and we do try
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
Richard Biener via Gcc-patches writes: > On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches > wrote: >> >> We have already use intermidate type in case WIDEN, but not for NONE, >> this patch extended that. >> >> I didn't do that in pattern recog since we need to know whether the >> stmt belongs to any slp_node to decide the vectype, the related optabs >> are checked according to vectype_in and vectype_out. For non-slp case, >> vec_pack/unpack are always used when lhs has different size from rhs, >> for slp case, sometimes vec_pack/unpack is used, somethings >> direct conversion is used. >> >> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> Ok for trunk? >> >> gcc/ChangeLog: >> >> PR target/110018 >> * tree-vect-stmts.cc (vectorizable_conversion): Use >> intermiediate integer type for float_expr/fix_trunc_expr when >> direct optab is not existed. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/i386/pr110018-1.c: New test. >> --- >> gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ >> gcc/tree-vect-stmts.cc | 56 - >> 2 files changed, 149 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c >> b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> new file mode 100644 >> index 000..b1baffd7af1 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> @@ -0,0 +1,94 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ >> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ >> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ >> + >> +void >> +foo (double* __restrict a, char* b) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> +} >> + >> +void >> +foo1 (float* __restrict a, char* b) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> + a[2] = b[2]; >> + a[3] = b[3]; >> +} >> + >> +void >> +foo2 (_Float16* __restrict a, char* b) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> + a[2] = b[2]; >> + a[3] = b[3]; >> + a[4] = b[4]; >> + a[5] = b[5]; >> + a[6] = b[6]; >> + a[7] = b[7]; >> +} >> + >> +void >> +foo3 (double* __restrict a, short* b) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> +} >> + >> +void >> +foo4 (float* __restrict a, char* b) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> + a[2] = b[2]; >> + a[3] = b[3]; >> +} >> + >> +void >> +foo5 (double* __restrict b, char* a) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> +} >> + >> +void >> +foo6 (float* __restrict b, char* a) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> + a[2] = b[2]; >> + a[3] = b[3]; >> +} >> + >> +void >> +foo7 (_Float16* __restrict b, char* a) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> + a[2] = b[2]; >> + a[3] = b[3]; >> + a[4] = b[4]; >> + a[5] = b[5]; >> + a[6] = b[6]; >> + a[7] = b[7]; >> +} >> + >> +void >> +foo8 (double* __restrict b, short* a) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> +} >> + >> +void >> +foo9 (float* __restrict b, char* a) >> +{ >> + a[0] = b[0]; >> + a[1] = b[1]; >> + a[2] = b[2]; >> + a[3] = b[3]; >> +} >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index bd3b07a3aa1..1118c89686d 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, >> return false; >>if (supportable_convert_operation (code, vectype_out, vectype_in, >> &code1)) >> break; > > A comment would be nice here. Like > >/* For conversions between float and smaller integer types try whether we > can > use intermediate signed integer types to support the conversion. */ > >> + if ((code == FLOAT_EXPR >> + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) >> + || (code == FIX_TRUNC_EXPR >> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) Is the FIX_TRUNC_EXPR case safe without some flag? #include int32_t x = (int32_t)0x1.0p32; int32_t y = (int32_t)(int64_t)0x1.0p32; sets x to 2147483647 and y to 0. Thanks, Richard >> + { >> + bool float_expr_p = code == FLOAT_EXPR; >> + scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode; >> + fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode); >> + code1 = float_expr_p ? code : NOP_EXPR; >> + codecvt1 = float_expr_p ? NOP_EXPR : code; >> + FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode) >> + { >> + imode = rhs_mode_iter.require (); >> + if (GET_MODE_SIZE (imode) > fltsz) >> + break; >> + >> + cvt_type >> + = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode), >> + 0); >> + cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type, >> + slp_
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Wed, Jun 21, 2023 at 9:50 AM Uros Bizjak via Gcc-patches wrote: > > On Tue, Jun 20, 2023 at 6:11 PM liuhongt via Gcc-patches > wrote: > > > > I notice there's some refactor in vectorizable_conversion > > for code_helper,so I've adjusted my patch to that. > > Here's the patch I'm going to commit. > > > > We have already use intermidate type in case WIDEN, but not for NONE, > > this patch extended that. > > > > gcc/ChangeLog: > > > > PR target/110018 > > * tree-vect-stmts.cc (vectorizable_conversion): Use > > intermiediate integer type for float_expr/fix_trunc_expr when > > direct optab is not existed. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr110018-1.c: New test. > > > + > > + /* For conversions between float and smaller integer types try > > whether we > > +can use intermediate signed integer types to support the > > +conversion. */ > > I'm trying to enhance testcase coverage with explicit signed/unsigned > types (patch attached), and I have noticed that zero-extension is used > for unsigned types. So, the above comment that mentions only signed > integer types is not entirely correct. The comment says the intermediate sized vector types are always signed (because float conversions to/from unsigned are always somewhat awkward), but yes, if the original type was unsigned zero-extension is used and if it was signed sign-extension. The testcase adjustments / additions look good to me btw. Thanks, Richard. > > Uros.
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Tue, Jun 20, 2023 at 6:11 PM liuhongt via Gcc-patches wrote: > > I notice there's some refactor in vectorizable_conversion > for code_helper,so I've adjusted my patch to that. > Here's the patch I'm going to commit. > > We have already use intermidate type in case WIDEN, but not for NONE, > this patch extended that. > > gcc/ChangeLog: > > PR target/110018 > * tree-vect-stmts.cc (vectorizable_conversion): Use > intermiediate integer type for float_expr/fix_trunc_expr when > direct optab is not existed. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr110018-1.c: New test. > + > + /* For conversions between float and smaller integer types try whether > we > +can use intermediate signed integer types to support the > +conversion. */ I'm trying to enhance testcase coverage with explicit signed/unsigned types (patch attached), and I have noticed that zero-extension is used for unsigned types. So, the above comment that mentions only signed integer types is not entirely correct. Uros. diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c index b1baffd7af1..b6a3be7b7a2 100644 --- a/gcc/testsuite/gcc.target/i386/pr110018-1.c +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c @@ -4,14 +4,14 @@ /* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ void -foo (double* __restrict a, char* b) +foo (double* __restrict a, signed char* b) { a[0] = b[0]; a[1] = b[1]; } void -foo1 (float* __restrict a, char* b) +foo1 (float* __restrict a, signed char* b) { a[0] = b[0]; a[1] = b[1]; @@ -20,7 +20,7 @@ foo1 (float* __restrict a, char* b) } void -foo2 (_Float16* __restrict a, char* b) +foo2 (_Float16* __restrict a, signed char* b) { a[0] = b[0]; a[1] = b[1]; @@ -33,14 +33,14 @@ foo2 (_Float16* __restrict a, char* b) } void -foo3 (double* __restrict a, short* b) +foo3 (double* __restrict a, signed short* b) { a[0] = b[0]; a[1] = b[1]; } void -foo4 (float* __restrict a, char* b) +foo4 (float* __restrict a, signed char* b) { a[0] = b[0]; a[1] = b[1]; @@ -49,14 +49,14 @@ foo4 (float* __restrict a, char* b) } void -foo5 (double* __restrict b, char* a) +foo5 (double* __restrict b, signed char* a) { a[0] = b[0]; a[1] = b[1]; } void -foo6 (float* __restrict b, char* a) +foo6 (float* __restrict b, signed char* a) { a[0] = b[0]; a[1] = b[1]; @@ -65,7 +65,7 @@ foo6 (float* __restrict b, char* a) } void -foo7 (_Float16* __restrict b, char* a) +foo7 (_Float16* __restrict b, signed char* a) { a[0] = b[0]; a[1] = b[1]; @@ -78,14 +78,14 @@ foo7 (_Float16* __restrict b, char* a) } void -foo8 (double* __restrict b, short* a) +foo8 (double* __restrict b, signed short* a) { a[0] = b[0]; a[1] = b[1]; } void -foo9 (float* __restrict b, char* a) +foo9 (float* __restrict b, signed char* a) { a[0] = b[0]; a[1] = b[1]; diff --git a/gcc/testsuite/gcc.target/i386/pr110018-2.c b/gcc/testsuite/gcc.target/i386/pr110018-2.c new file mode 100644 index 000..a663e074698 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110018-2.c @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ + +void +foo (double* __restrict a, unsigned char* b) +{ + a[0] = b[0]; + a[1] = b[1]; +} + +void +foo1 (float* __restrict a, unsigned char* b) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; +} + +void +foo2 (_Float16* __restrict a, unsigned char* b) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; + a[4] = b[4]; + a[5] = b[5]; + a[6] = b[6]; + a[7] = b[7]; +} + +void +foo3 (double* __restrict a, unsigned short* b) +{ + a[0] = b[0]; + a[1] = b[1]; +} + +void +foo4 (float* __restrict a, unsigned char* b) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; +} + +void +foo5 (double* __restrict b, unsigned char* a) +{ + a[0] = b[0]; + a[1] = b[1]; +} + +void +foo6 (float* __restrict b, unsigned char* a) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; +} + +void +foo7 (_Float16* __restrict b, unsigned char* a) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; + a[4] = b[4]; + a[5] = b[5]; + a[6] = b[6]; + a[7] = b[7]; +} + +void +foo8 (double* __restrict b, unsigned short* a) +{ + a[0] = b[0]; + a[1] = b[1]; +} + +void +foo9 (float* __restrict b, unsigned char* a) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; +}
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Tue, Jun 20, 2023 at 11:02 AM Hongtao Liu wrote: > > On Tue, Jun 20, 2023 at 4:41 PM Richard Biener > wrote: > > > > On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches > > wrote: > > > > > > We have already use intermidate type in case WIDEN, but not for NONE, > > > this patch extended that. > > > > > > I didn't do that in pattern recog since we need to know whether the > > > stmt belongs to any slp_node to decide the vectype, the related optabs > > > are checked according to vectype_in and vectype_out. For non-slp case, > > > vec_pack/unpack are always used when lhs has different size from rhs, > > > for slp case, sometimes vec_pack/unpack is used, somethings > > > direct conversion is used. > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > PR target/110018 > > > * tree-vect-stmts.cc (vectorizable_conversion): Use > > > intermiediate integer type for float_expr/fix_trunc_expr when > > > direct optab is not existed. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/pr110018-1.c: New test. > > > --- > > > gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ > > > gcc/tree-vect-stmts.cc | 56 - > > > 2 files changed, 149 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c > > > b/gcc/testsuite/gcc.target/i386/pr110018-1.c > > > new file mode 100644 > > > index 000..b1baffd7af1 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c > > > @@ -0,0 +1,94 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ > > > +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ > > > +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ > > > + > > > +void > > > +foo (double* __restrict a, char* b) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > +} > > > + > > > +void > > > +foo1 (float* __restrict a, char* b) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > + a[2] = b[2]; > > > + a[3] = b[3]; > > > +} > > > + > > > +void > > > +foo2 (_Float16* __restrict a, char* b) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > + a[2] = b[2]; > > > + a[3] = b[3]; > > > + a[4] = b[4]; > > > + a[5] = b[5]; > > > + a[6] = b[6]; > > > + a[7] = b[7]; > > > +} > > > + > > > +void > > > +foo3 (double* __restrict a, short* b) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > +} > > > + > > > +void > > > +foo4 (float* __restrict a, char* b) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > + a[2] = b[2]; > > > + a[3] = b[3]; > > > +} > > > + > > > +void > > > +foo5 (double* __restrict b, char* a) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > +} > > > + > > > +void > > > +foo6 (float* __restrict b, char* a) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > + a[2] = b[2]; > > > + a[3] = b[3]; > > > +} > > > + > > > +void > > > +foo7 (_Float16* __restrict b, char* a) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > + a[2] = b[2]; > > > + a[3] = b[3]; > > > + a[4] = b[4]; > > > + a[5] = b[5]; > > > + a[6] = b[6]; > > > + a[7] = b[7]; > > > +} > > > + > > > +void > > > +foo8 (double* __restrict b, short* a) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > +} > > > + > > > +void > > > +foo9 (float* __restrict b, char* a) > > > +{ > > > + a[0] = b[0]; > > > + a[1] = b[1]; > > > + a[2] = b[2]; > > > + a[3] = b[3]; > > > +} > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > index bd3b07a3aa1..1118c89686d 100644 > > > --- a/gcc/tree-vect-stmts.cc > > > +++ b/gcc/tree-vect-stmts.cc > > > @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, > > > return false; > > >if (supportable_convert_operation (code, vectype_out, vectype_in, > > > &code1)) > > > break; > > > > A comment would be nice here. Like > > > >/* For conversions between float and smaller integer types try whether > > we can > > use intermediate signed integer types to support the conversion. */ > > > > > + if ((code == FLOAT_EXPR > > > + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) > > > + || (code == FIX_TRUNC_EXPR > > > + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) > > > + { > > > + bool float_expr_p = code == FLOAT_EXPR; > > > + scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode; > > > + fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode); > > > + code1 = float_expr_p ? code : NOP_EXPR; > > > + codecvt1 = float_expr_p ? NOP_EXPR : code; > > > + FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode) > > > + { > > > + imode = rhs_mode_iter.require (); > > > +
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Tue, Jun 20, 2023 at 4:41 PM Richard Biener wrote: > > On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches > wrote: > > > > We have already use intermidate type in case WIDEN, but not for NONE, > > this patch extended that. > > > > I didn't do that in pattern recog since we need to know whether the > > stmt belongs to any slp_node to decide the vectype, the related optabs > > are checked according to vectype_in and vectype_out. For non-slp case, > > vec_pack/unpack are always used when lhs has different size from rhs, > > for slp case, sometimes vec_pack/unpack is used, somethings > > direct conversion is used. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR target/110018 > > * tree-vect-stmts.cc (vectorizable_conversion): Use > > intermiediate integer type for float_expr/fix_trunc_expr when > > direct optab is not existed. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr110018-1.c: New test. > > --- > > gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ > > gcc/tree-vect-stmts.cc | 56 - > > 2 files changed, 149 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c > > b/gcc/testsuite/gcc.target/i386/pr110018-1.c > > new file mode 100644 > > index 000..b1baffd7af1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c > > @@ -0,0 +1,94 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ > > +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ > > +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ > > + > > +void > > +foo (double* __restrict a, char* b) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > +} > > + > > +void > > +foo1 (float* __restrict a, char* b) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > +} > > + > > +void > > +foo2 (_Float16* __restrict a, char* b) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > + a[4] = b[4]; > > + a[5] = b[5]; > > + a[6] = b[6]; > > + a[7] = b[7]; > > +} > > + > > +void > > +foo3 (double* __restrict a, short* b) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > +} > > + > > +void > > +foo4 (float* __restrict a, char* b) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > +} > > + > > +void > > +foo5 (double* __restrict b, char* a) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > +} > > + > > +void > > +foo6 (float* __restrict b, char* a) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > +} > > + > > +void > > +foo7 (_Float16* __restrict b, char* a) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > + a[4] = b[4]; > > + a[5] = b[5]; > > + a[6] = b[6]; > > + a[7] = b[7]; > > +} > > + > > +void > > +foo8 (double* __restrict b, short* a) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > +} > > + > > +void > > +foo9 (float* __restrict b, char* a) > > +{ > > + a[0] = b[0]; > > + a[1] = b[1]; > > + a[2] = b[2]; > > + a[3] = b[3]; > > +} > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > index bd3b07a3aa1..1118c89686d 100644 > > --- a/gcc/tree-vect-stmts.cc > > +++ b/gcc/tree-vect-stmts.cc > > @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, > > return false; > >if (supportable_convert_operation (code, vectype_out, vectype_in, > > &code1)) > > break; > > A comment would be nice here. Like > >/* For conversions between float and smaller integer types try whether we > can > use intermediate signed integer types to support the conversion. */ > > > + if ((code == FLOAT_EXPR > > + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) > > + || (code == FIX_TRUNC_EXPR > > + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) > > + { > > + bool float_expr_p = code == FLOAT_EXPR; > > + scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode; > > + fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode); > > + code1 = float_expr_p ? code : NOP_EXPR; > > + codecvt1 = float_expr_p ? NOP_EXPR : code; > > + FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode) > > + { > > + imode = rhs_mode_iter.require (); > > + if (GET_MODE_SIZE (imode) > fltsz) > > + break; > > + > > + cvt_type > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode), > > + 0); > > + cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type, > > + slp_node); >
Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches wrote: > > We have already use intermidate type in case WIDEN, but not for NONE, > this patch extended that. > > I didn't do that in pattern recog since we need to know whether the > stmt belongs to any slp_node to decide the vectype, the related optabs > are checked according to vectype_in and vectype_out. For non-slp case, > vec_pack/unpack are always used when lhs has different size from rhs, > for slp case, sometimes vec_pack/unpack is used, somethings > direct conversion is used. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/110018 > * tree-vect-stmts.cc (vectorizable_conversion): Use > intermiediate integer type for float_expr/fix_trunc_expr when > direct optab is not existed. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr110018-1.c: New test. > --- > gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++ > gcc/tree-vect-stmts.cc | 56 - > 2 files changed, 149 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c > > diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c > b/gcc/testsuite/gcc.target/i386/pr110018-1.c > new file mode 100644 > index 000..b1baffd7af1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c > @@ -0,0 +1,94 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ > +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ > +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ > + > +void > +foo (double* __restrict a, char* b) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > +} > + > +void > +foo1 (float* __restrict a, char* b) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > + a[2] = b[2]; > + a[3] = b[3]; > +} > + > +void > +foo2 (_Float16* __restrict a, char* b) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > + a[2] = b[2]; > + a[3] = b[3]; > + a[4] = b[4]; > + a[5] = b[5]; > + a[6] = b[6]; > + a[7] = b[7]; > +} > + > +void > +foo3 (double* __restrict a, short* b) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > +} > + > +void > +foo4 (float* __restrict a, char* b) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > + a[2] = b[2]; > + a[3] = b[3]; > +} > + > +void > +foo5 (double* __restrict b, char* a) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > +} > + > +void > +foo6 (float* __restrict b, char* a) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > + a[2] = b[2]; > + a[3] = b[3]; > +} > + > +void > +foo7 (_Float16* __restrict b, char* a) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > + a[2] = b[2]; > + a[3] = b[3]; > + a[4] = b[4]; > + a[5] = b[5]; > + a[6] = b[6]; > + a[7] = b[7]; > +} > + > +void > +foo8 (double* __restrict b, short* a) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > +} > + > +void > +foo9 (float* __restrict b, char* a) > +{ > + a[0] = b[0]; > + a[1] = b[1]; > + a[2] = b[2]; > + a[3] = b[3]; > +} > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index bd3b07a3aa1..1118c89686d 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, > return false; >if (supportable_convert_operation (code, vectype_out, vectype_in, > &code1)) > break; A comment would be nice here. Like /* For conversions between float and smaller integer types try whether we can use intermediate signed integer types to support the conversion. */ > + if ((code == FLOAT_EXPR > + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) > + || (code == FIX_TRUNC_EXPR > + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) > + { > + bool float_expr_p = code == FLOAT_EXPR; > + scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode; > + fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode); > + code1 = float_expr_p ? code : NOP_EXPR; > + codecvt1 = float_expr_p ? NOP_EXPR : code; > + FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode) > + { > + imode = rhs_mode_iter.require (); > + if (GET_MODE_SIZE (imode) > fltsz) > + break; > + > + cvt_type > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode), > + 0); > + cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type, > + slp_node); > + /* This should only happened for SLP as long as loop vectorizer > +only supports same-sized vector. */ > + if (cvt_type == NULL_TREE > + || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in) > + || !supportable_convert_operation (code1, vectype_out, > +cvt_type, &