Re: Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
Hi, Richard. Thanks for the comments.

>> If we use SELECT_VL to refer only to the target-independent ifn, I don't
>> see why this last bit is true.
Could you give me more details and information about this since I am not sure 
whether I catch up with you. 
You mean the current SELECT_VL is not an appropriate IFN?


>>Like I said in the previous message,
>>when it comes to determining the length of each control, the approach we
>>take for MIN_EXPR IVs should work for SELECT_VL IVs.  The point is that,
>>in both cases, any inactive lanes are always the last lanes.
>>E.g. suppose that, for one particular iteration, SELECT_VL decides that
>>6 lanes should be active in a loop with VF==8.  If there is a 2-control
>>rgroup with 4 lanes each, the first control must be 4 and the second
>>control must be 2, just as if a MIN_EXPR had decided that 6 lanes of
>>the final iteration are active.
>>What I don't understand is why this isn't also a problem with the
>>fallback MIN_EXPR approach.  That is, with the same example as above,
>>but using MIN_EXPR IVs, I would have expected:
>>  VF == 8
>>  1-control rgroup "A":
>>A set by MIN_EXPR IV
>>  2-control rgroup "B1", "B2":
>>B1 = MIN (A, 4)
>>B2 = A - B1
>>and so the vectors controlled by A, B1 and B2 would all have different
>>lengths.
>>Is the point that, when using MIN_EXPR, this only happens in the
>>final iteration?  And that you use a tail/epilogue loop for that,
>>so that the main loop body operates on full vectors only?
In general, I think your description is correct and  comprehensive. 
I'd like to share more my understanding to make sure we are on the same page.

Take the example as you said:

FOR one  particular iteration, SELECT_VL decides that 6 lanes should be active 
in a loop with VF==8.
and 2-control rgroup with 4 lanes each
which means:

VF = 8;
each control VF = 4;

Total length = SELECT_VL(or MIN_EXPR) (remain, 8)
Then, IMHO, we can have 3 solutions to deduce the length of 2-control base on 
current flow we already built

Also, let me share "vsetvl" ISA spec:
ceil(AVL / 2) ≤ vl ≤ VF if  VF  suppose Total length value = 6

control 1 length = SELECT_VL (Total length, 4) ===> If we use "vsetvl" 
intruction to get the control 1 length,
 it can be 3 or 4, since RVV ISA: ceil(AVL / 2) ≤ vl ≤ VF if AVL < (2 * VF), 
the outcome of SELECT_VL may be Total length / 2 = 3
Depending on the hardware implementation of "vsetvli", Let's say some RVV CPU 
likes "even distribution" the outcome = 3

control 2 length = Total length - control 1 length  ===> 6 - 3 = 3 (if control 
1 = 3) or 6 - 4 = 2 (if control 1 = 4) .

Since RVV ISA gives the flexible definition of "vsetvli", we will end up with 
this deduction.

Solution 2:

Total length = SELECT_VL (remain, 8)  > 6
control 1 length = MIN_EXPR (Total length, 4)  > since use MIN, so always 4
control 2 length = Total length - control 1 length  ===> 6 - 4 = 2

Solution 3 (Current flow):

Total length = MIN_EXPR  (remain, 8)  > 6 only when the remain  = 6 in 
tail/epilogue, otherwise, it always be 8 in loop body.
control 1 length = MIN_EXPR (Total length, 4)  > since use MIN, so always 4
control 2 length = Total length - control 1 length  ===> Total length -  4

I'd like to say these 3 solutions all work for RVV. 
However, RVV length configuration unlike IBM or ARM SVE using a mask. (I would 
like to say mask or length they are the same thing, use for control of each 
operations).
For example, ARM SVE has 8 mask registers, whenever it generate a mask, it can 
be include in use list in the instructions, since ARM SVE use encoding to 
specify the mask
register.

For example:
If we are using solution 1 in a target that control by length and length is 
specified in general registers, we can simulate the codegen as below.

max length = select_vl (vf=8)
length 1 = select_vl (vf=4)
length 2 = max length - length 1
...
load (...use general register which storing length 1 let's said r0, r0 is 
specified in the load encoding)
...
load (...use general register which storing length 2 let's said r1, r1 is 
specified in the load encoding)


However, for RVV, we don't specify the length in the instructions encoding.
Instead, we have only one VL register, and every time we want to change the 
length, we need"vsetvli"

So for solution 1, we will have:

max length = vsetvli (vf=8)
length 1 = vsetlvi (vf=4)
length 2 = max length = length 1
...
vsetvli zero, length 1 <==insert by "VSETVL" PASS of RISC-V backend
load
vsetvli zero, length 2 <==insert by "VSETVL" PASS of RISC-V backend
load

"vsetlvi" instruction is the instruction much more expensive than the general 
scalar instruction (for example "min" is much cheaper than "vsetvli").
So I am 100% sure that solution 3 (current MIN flow in GCC) is much better than 
above:

max length = min (vf=8) ===> replaced "vsetli" by "min"
length 1 = min (vf=4) ===> replaced "vsetli" by "min"
length 2 = max length = length 1
...
vsetvli zero, length 1 <=

Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_32)
Is this used in vfwcvt ? convert FP16 -> FP32, if yes, you should add ZVFHMIN 
or ZVFH require checking.


+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, 0)

same


+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, 0

same

+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, 0)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, 0)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, 0)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, 0)


same

+DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, 0)
+DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, 0)
+DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, 0)
+DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, 0)

same



Otherwise, LGTM.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-05 14:50
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:
 
vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |  32 ++
gcc/config/riscv/vector-iterators.md  |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..348aa05dd91 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_32)
+
DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, 0)
+
DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_I_OPS (v

RE: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread Li, Pan2 via Gcc-patches
Thanks, make sense, will update V2 for this.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, June 5, 2023 3:30 PM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic 
API


+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)

+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)

+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)

+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_32)
Is this used in vfwcvt ? convert FP16 -> FP32, if yes, you should add ZVFHMIN 
or ZVFH require checking.



+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m1_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m2_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m4_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m8_t, 0)

same



+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, 0

same


+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, 0)

+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, 0)

+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, 0)

+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, 0)


same


+DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, 0)

+DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, 0)

+DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, 0)

+DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, 0)

same



Otherwise, LGTM.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-05 14:50
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:

vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt

Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |  32 ++
gcc/config/riscv/vector-iterators.md  |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..348aa05dd91 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_3

Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard. Thanks for the comments.
>
>>> If we use SELECT_VL to refer only to the target-independent ifn, I don't
>>> see why this last bit is true.
> Could you give me more details and information about this since I am not sure 
> whether I catch up with you. 
> You mean the current SELECT_VL is not an appropriate IFN?

No, I meant that the comment I quoted seemed to be saying that solution
3 wasn't possible.  The comment seemed to say that we would need to do
solution 1.

>>>Like I said in the previous message,
>>>when it comes to determining the length of each control, the approach we
>>>take for MIN_EXPR IVs should work for SELECT_VL IVs.  The point is that,
>>>in both cases, any inactive lanes are always the last lanes.
>>>E.g. suppose that, for one particular iteration, SELECT_VL decides that
>>>6 lanes should be active in a loop with VF==8.  If there is a 2-control
>>>rgroup with 4 lanes each, the first control must be 4 and the second
>>>control must be 2, just as if a MIN_EXPR had decided that 6 lanes of
>>>the final iteration are active.
>>>What I don't understand is why this isn't also a problem with the
>>>fallback MIN_EXPR approach.  That is, with the same example as above,
>>>but using MIN_EXPR IVs, I would have expected:
>>>  VF == 8
>>>  1-control rgroup "A":
>>>A set by MIN_EXPR IV
>>>  2-control rgroup "B1", "B2":
>>>B1 = MIN (A, 4)
>>>B2 = A - B1
>>>and so the vectors controlled by A, B1 and B2 would all have different
>>>lengths.
>>>Is the point that, when using MIN_EXPR, this only happens in the
>>>final iteration?  And that you use a tail/epilogue loop for that,
>>>so that the main loop body operates on full vectors only?
> In general, I think your description is correct and  comprehensive. 
> I'd like to share more my understanding to make sure we are on the same page.
>
> Take the example as you said:
>
> FOR one  particular iteration, SELECT_VL decides that 6 lanes should be 
> active in a loop with VF==8.
> and 2-control rgroup with 4 lanes each
> which means:
>
> VF = 8;
> each control VF = 4;
>
> Total length = SELECT_VL(or MIN_EXPR) (remain, 8)
> Then, IMHO, we can have 3 solutions to deduce the length of 2-control base on 
> current flow we already built
>
> Also, let me share "vsetvl" ISA spec:
> ceil(AVL / 2) ≤ vl ≤ VF if  VF  "vl" is the number of the elements we need to process, "avl" = the actual 
> number of elements we will process in the current iteration
>
> Solution 1:
>
> Total length = SELECT_VL (remain, 8) ===> suppose Total length value = 6
>
> control 1 length = SELECT_VL (Total length, 4) ===> If we use "vsetvl" 
> intruction to get the control 1 length,
>  it can be 3 or 4, since RVV ISA: ceil(AVL / 2) ≤ vl ≤ VF if AVL < (2 * VF), 
> the outcome of SELECT_VL may be Total length / 2 = 3
> Depending on the hardware implementation of "vsetvli", Let's say some RVV CPU 
> likes "even distribution" the outcome = 3
>
> control 2 length = Total length - control 1 length  ===> 6 - 3 = 3 (if 
> control 1 = 3) or 6 - 4 = 2 (if control 1 = 4) .
>
> Since RVV ISA gives the flexible definition of "vsetvli", we will end up with 
> this deduction.

Yeah, this one wouldn't work, for reasons discussed previously.
I was thinking only about solutions 2 and 3.

> Solution 2:
>
> Total length = SELECT_VL (remain, 8)  > 6
> control 1 length = MIN_EXPR (Total length, 4)  > since use MIN, so always 
> 4
> control 2 length = Total length - control 1 length  ===> 6 - 4 = 2
>
> Solution 3 (Current flow):
>
> Total length = MIN_EXPR  (remain, 8)  > 6 only when the remain  = 6 in 
> tail/epilogue, otherwise, it always be 8 in loop body.
> control 1 length = MIN_EXPR (Total length, 4)  > since use MIN, so always 
> 4
> control 2 length = Total length - control 1 length  ===> Total length -  4
>
> I'd like to say these 3 solutions all work for RVV. 
> However, RVV length configuration unlike IBM or ARM SVE using a mask. (I 
> would like to say mask or length they are the same thing, use for control of 
> each operations).
> For example, ARM SVE has 8 mask registers, whenever it generate a mask, it 
> can be include in use list in the instructions, since ARM SVE use encoding to 
> specify the mask
> register.
>
> For example:
> If we are using solution 1 in a target that control by length and length is 
> specified in general registers, we can simulate the codegen as below.
>
> max length = select_vl (vf=8)
> length 1 = select_vl (vf=4)
> length 2 = max length - length 1
> ...
> load (...use general register which storing length 1 let's said r0, r0 is 
> specified in the load encoding)
> ...
> load (...use general register which storing length 2 let's said r1, r1 is 
> specified in the load encoding)
> 
>
> However, for RVV, we don't specify the length in the instructions encoding.
> Instead, we have only one VL register, and every time we want to change the 
> length, we need"vsetvli"
>
> So for solution 1, we will have:
>

Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
> "juzhe.zh...@rivai.ai"  writes:
>> Hi, Richard. Thanks for the comments.
>>
 If we use SELECT_VL to refer only to the target-independent ifn, I don't
 see why this last bit is true.
>> Could you give me more details and information about this since I am not 
>> sure whether I catch up with you. 
>> You mean the current SELECT_VL is not an appropriate IFN?
>
> No, I meant that the comment I quoted seemed to be saying that solution
> 3 wasn't possible.  The comment seemed to say that we would need to do
> solution 1.

Sorry, I meant solution 2 rather than solution 3.


Re: Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
Hi, Richard.

>> No, I meant that the comment I quoted seemed to be saying that solution
>> 3 wasn't possible.  The comment seemed to say that we would need to do
>> solution 1.
I am so sorry that I didn't write the comments accurately.
Could you help me with comments ? Base on what we have discussed above (I think 
we are on same page now).
Hmmm. I am not the native English speaker, I use google translator for comments 
:).

>> When comparing solutions 2 and 3 for case (b), is solution 3 still better?
>> E.g. is "vsetvli zero" cheaper than "vsetvli "?


"vsetvli zero" is the same cost as "vsetvli gpr", 

I think for (b),  solution 2 and solution 3 should be almost the same.




juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-05 15:57
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Add SELECT_VL support
Richard Sandiford  writes:
> "juzhe.zh...@rivai.ai"  writes:
>> Hi, Richard. Thanks for the comments.
>>
 If we use SELECT_VL to refer only to the target-independent ifn, I don't
 see why this last bit is true.
>> Could you give me more details and information about this since I am not 
>> sure whether I catch up with you. 
>> You mean the current SELECT_VL is not an appropriate IFN?
>
> No, I meant that the comment I quoted seemed to be saying that solution
> 3 wasn't possible.  The comment seemed to say that we would need to do
> solution 1.
 
Sorry, I meant solution 2 rather than solution 3.
 


Re: [PATCH] libiberty: On Windows pass a >32k cmdline through a response file.

2023-06-05 Thread Jonathan Yong via Gcc-patches

On 5/23/23 08:21, Jonathan Yong wrote:

On 5/22/23 13:25, Costas Argyris wrote:

Currently on Windows, when CreateProcess is called with a command-line
that exceeds the 32k Windows limit, we get a very bad error:

"CreateProcess: No such file or directory"

This patch detects the case where this would happen and writes the
long command-line to a temporary response file and calls CreateProcess
with @file instead.



Looks OK to me.

I will commit it around next week if there are no objections.



Done, pushed to master, thanks.



[PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md.

2023-06-05 Thread Li Xu
gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix 'REQUIREMENT' for machine_mode 
'MODE'.
* config/riscv/vector.md 
(@pred_indexed_store): change 
VNX16_QHSI to VNX16_QHSDI.
(@pred_indexed_store): Ditto.
---
 gcc/config/riscv/vector-iterators.md | 26 +-
 gcc/config/riscv/vector.md   |  6 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..42cbbb49894 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -148,7 +148,7 @@
 ])
 
 (define_mode_iterator VEEWEXT8 [
-  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
@@ -188,7 +188,7 @@
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
-  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
+  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
   (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
   (VNx8DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -199,7 +199,7 @@
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI (VNx8SI "TARGET_MIN_VLEN >= 
128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -213,11 +213,11 @@
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI (VNx16QI 
"TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI (VNx8HI "TARGET_MIN_VLEN >= 
128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI (VNx4SI "TARGET_MIN_VLEN >= 128")
-  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64 && 
TARGET_MIN_VLEN >= 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
-  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
+  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
 ])
 
@@ -400,26 +400,26 @@
 
 (define_mode_iterator VNX1_QHSDI [
   (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI 
"TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
 ])
 
 (define_mode_iterator VNX2_QHSDI [
   VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
 ])
 
 (define_mode_iterator VNX4_QHSDI [
   VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
 ])
 
 (define_mode_iterator VNX8_QHSDI [
   VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
 ])
 
-(define_mode_iterator VNX16_QHSI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_MIN_VLEN 
>= 128")
+(define_mode_iterator VNX16_QHSDI [
+  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && 
TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX32_QHSI [
@@ -435,7 +435,7 @@
   (VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_MIN_VLEN >= 128")
+  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
 
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
@@ -463,7 +463,7 @@
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
"TARGET_MIN_VLEN >= 128")
   (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
"TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 

[PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:

vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt

Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/riscv-vector-builtins-types.def |  32 ++
 gcc/config/riscv/vector-iterators.md  |  21 +
 .../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
 3 files changed, 471 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..1e2491de6d6 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
 DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
 DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
 
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+
 DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
 DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
 DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
 DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
 
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
+
 DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
 DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
@@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
 
+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
+
 DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
 DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
@@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
 DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
 
+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
+
 DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUI

RE: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread Li, Pan2 via Gcc-patches
Updated the PATCH V2 for the ZVFH requirement.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620636.html

Pan

From: Li, Pan2
Sent: Monday, June 5, 2023 3:37 PM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: Kito.cheng ; Wang, Yanzhang 
Subject: RE: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic 
API

Thanks, make sense, will update V2 for this.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Monday, June 5, 2023 3:30 PM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic 
API


+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)

+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)

+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)

+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_32)
Is this used in vfwcvt ? convert FP16 -> FP32, if yes, you should add ZVFHMIN 
or ZVFH require checking.



+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m1_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m2_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m4_t, 0)

+DEF_RVV_CONVERT_I_OPS (vint16m8_t, 0)

same



+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, 0)

+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, 0

same


+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, 0)

+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, 0)

+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, 0)

+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, 0)


same


+DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)

+DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, 0)

+DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, 0)

+DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, 0)

+DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, 0)

same



Otherwise, LGTM.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-05 14:50
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:

vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt

Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |  32 ++
gcc/config/riscv/vector-iterators.md  |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..348aa05dd91 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/ri

Re: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md.

2023-06-05 Thread juzhe.zh...@rivai.ai
Thanks for catching this.
LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-06-05 16:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in 
vector-iterators.md.
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Fix 'REQUIREMENT' for machine_mode 
'MODE'.
* config/riscv/vector.md 
(@pred_indexed_store): change 
VNX16_QHSI to VNX16_QHSDI.
(@pred_indexed_store): Ditto.
---
gcc/config/riscv/vector-iterators.md | 26 +-
gcc/config/riscv/vector.md   |  6 +++---
2 files changed, 16 insertions(+), 16 deletions(-)
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..42cbbb49894 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -148,7 +148,7 @@
])
(define_mode_iterator VEEWEXT8 [
-  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
@@ -188,7 +188,7 @@
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
-  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
+  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
   (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
   (VNx8DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -199,7 +199,7 @@
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI (VNx8SI "TARGET_MIN_VLEN >= 
128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -213,11 +213,11 @@
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI (VNx16QI 
"TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI (VNx8HI "TARGET_MIN_VLEN >= 
128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI (VNx4SI "TARGET_MIN_VLEN >= 128")
-  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64 && 
TARGET_MIN_VLEN >= 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
-  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
+  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
])
@@ -400,26 +400,26 @@
(define_mode_iterator VNX1_QHSDI [
   (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI 
"TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
])
(define_mode_iterator VNX2_QHSDI [
   VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
])
(define_mode_iterator VNX4_QHSDI [
   VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
])
(define_mode_iterator VNX8_QHSDI [
   VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
])
-(define_mode_iterator VNX16_QHSI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_MIN_VLEN 
>= 128")
+(define_mode_iterator VNX16_QHSDI [
+  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && 
TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VNX32_QHSI [
@@ -435,7 +435,7 @@
   (VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_MIN_VLEN >= 128")
+  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
@@ -463,7 +463,7 @@
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN >

Re: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
LGTM,



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-05 16:20
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:
 
vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/riscv-vector-builtins-types.def |  32 ++
gcc/config/riscv/vector-iterators.md  |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..1e2491de6d6 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+
DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
+
DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
@@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
+
DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
@@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
+
DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_ELEN_64)
DEF_RVV_WCONVE

Re: [PATCH 2/2] [V3] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-05 Thread Kito Cheng via Gcc-patches
Only a few minor comments, otherwise LGTM :)

But I guess we need to wait until binutils merge zc stuff.

> Zcmp can share the same logic as save-restore in stack allocation: 
> pre-allocation
> by cm.push, step 1 and step 2.
>
> please be noted cm.push pushes ra, s0-s11 in reverse order than what 
> save-restore does.
> So adaption has been done in .cfi directives in my patch.
>
> Signed-off-by: Fei Gao 
>
> gcc/ChangeLog:
>
> * config/riscv/iterators.md (-8): slot offset in bytes
> (-16): likewise
> (-24): likewise
> (-32): likewise
> (-40): likewise
> (-48): likewise
> (-56): likewise
> (-64): likewise
> (-72): likewise
> (-80): likewise
> (-88): likewise
> (-96): likewise
> (-104): likewise

Use slot0_offset...slot12_offset.

> @@ -422,6 +430,16 @@ static const struct riscv_tune_info 
> riscv_tune_info_table[] = {
>  #include "riscv-cores.def"
>  };
>
> +typedef enum
> +{
> +  PUSH_IDX = 0,
> +  POP_IDX,
> +  POPRET_IDX,
> +  ZCMP_OP_NUM
> +} op_idx;

op_idx -> riscv_zcmp_op_t
> @@ -5388,6 +5487,42 @@ riscv_adjust_libcall_cfi_prologue ()
>return dwarf;
>  }
>
> +static rtx
> +riscv_adjust_multi_push_cfi_prologue (int saved_size)
> +{
> +  rtx dwarf = NULL_RTX;
> +  rtx adjust_sp_rtx, reg, mem, insn;
> +  unsigned int mask = cfun->machine->frame.mask;
> +  int offset;
> +  int saved_cnt = 0;
> +
> +  if (mask & S10_MASK)
> +mask |= S11_MASK;
> +
> +  for (int regno = GP_REG_LAST; regno >= GP_REG_FIRST; regno--)
> +if (BITSET_P (mask & MULTI_PUSH_GPR_MASK, regno - GP_REG_FIRST))
> +  {
> +/* The save order is s11-s0, ra
> +   from high to low addr.  */
> +offset = saved_size - UNITS_PER_WORD * (++saved_cnt);
> +
> +reg = gen_rtx_REG (SImode, regno);

Should be Pmode rather than SImode, and seems
riscv_adjust_libcall_cfi_prologue has same issue...could you send a
separate patch to fix that?

> +mem = gen_frame_mem (SImode, plus_constant (Pmode,

Same here.

> +stack_pointer_rtx,
> +offset));
> +
> +insn = gen_rtx_SET (mem, reg);
> +dwarf = alloc_reg_note (REG_CFA_OFFSET, insn, dwarf);
> +  }
> +
> +  /* Debug info for adjust sp.  */
> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
> +   plus_constant(Pmode, stack_pointer_rtx, 
> -saved_size));
> +  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
> +  dwarf);
> +  return dwarf;
> +}
> +
>  static void
>  riscv_emit_stack_tie (void)
>  {


> @@ -5493,6 +5697,32 @@ riscv_expand_prologue (void)
>  }
>  }
>
> +static rtx
> +riscv_adjust_multi_pop_cfi_epilogue (int saved_size)
> +{
> +  rtx dwarf = NULL_RTX;
> +  rtx adjust_sp_rtx, reg;
> +  unsigned int mask = cfun->machine->frame.mask;
> +
> +  if (mask & S10_MASK)
> +mask |= S11_MASK;
> +
> +  /* Debug info for adjust sp.  */
> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
> +   plus_constant(Pmode, stack_pointer_rtx, 
> saved_size));
> +  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
> +  dwarf);
> +
> +  for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> +if (BITSET_P (mask, regno - GP_REG_FIRST))
> +  {
> +reg = gen_rtx_REG (SImode, regno);

Pmode

> +dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
> +  }
> +
> +  return dwarf;
> +}
> +
>  static rtx
>  riscv_adjust_libcall_cfi_epilogue ()
>  {

> diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
> new file mode 100644
> index 000..f2f2198598c
> --- /dev/null
> +++ b/gcc/config/riscv/zc.md
> @@ -0,0 +1,1042 @@
> +;; Machine description for RISC-V Zc extention.
> +;; Copyright (C) 2011-2023 Free Software Foundation, Inc.

2023 rather than 2011-2023


Re: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md.

2023-06-05 Thread Kito Cheng via Gcc-patches
LGTM

On Mon, Jun 5, 2023 at 4:27 PM juzhe.zh...@rivai.ai
 wrote:
>
> Thanks for catching this.
> LGTM.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Li Xu
> Date: 2023-06-05 16:18
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; Li Xu
> Subject: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in 
> vector-iterators.md.
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Fix 'REQUIREMENT' for 
> machine_mode 'MODE'.
> * config/riscv/vector.md 
> (@pred_indexed_store): change 
> VNX16_QHSI to VNX16_QHSDI.
> (@pred_indexed_store): Ditto.
> ---
> gcc/config/riscv/vector-iterators.md | 26 +-
> gcc/config/riscv/vector.md   |  6 +++---
> 2 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 90743ed76c5..42cbbb49894 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -148,7 +148,7 @@
> ])
> (define_mode_iterator VEEWEXT8 [
> -  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64")
> +  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
>(VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
> "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
>(VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
>(VNx2DF "TARGET_VECTOR_ELEN_FP_64")
> @@ -188,7 +188,7 @@
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx8SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> -  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
> +  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
>(VNx2DF "TARGET_VECTOR_ELEN_FP_64")
>(VNx4DF "TARGET_VECTOR_ELEN_FP_64")
>(VNx8DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> @@ -199,7 +199,7 @@
>(VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI 
> "TARGET_MIN_VLEN >= 128")
>(VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI (VNx8SI "TARGET_MIN_VLEN >= 
> 128")
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
> -  (VNx4DI "TARGET_VECTOR_ELEN_64")
> +  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
>(VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>(VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32")
> @@ -213,11 +213,11 @@
>(VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI (VNx16QI 
> "TARGET_MIN_VLEN >= 128")
>(VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI (VNx8HI "TARGET_MIN_VLEN >= 
> 128")
>(VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI (VNx4SI "TARGET_MIN_VLEN >= 128")
> -  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64 && 
> TARGET_MIN_VLEN >= 128")
> +  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
>(VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>(VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> -  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
> +  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
>(VNx2DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> ])
> @@ -400,26 +400,26 @@
> (define_mode_iterator VNX1_QHSDI [
>(VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI 
> "TARGET_MIN_VLEN < 128")
> -  (VNx1DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
> ])
> (define_mode_iterator VNX2_QHSDI [
>VNx2QI VNx2HI VNx2SI
> -  (VNx2DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
> ])
> (define_mode_iterator VNX4_QHSDI [
>VNx4QI VNx4HI VNx4SI
> -  (VNx4DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
> ])
> (define_mode_iterator VNX8_QHSDI [
>VNx8QI VNx8HI VNx8SI
> -  (VNx8DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
> ])
> -(define_mode_iterator VNX16_QHSI [
> -  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_MIN_VLEN 
> >= 128")
> +(define_mode_iterator VNX16_QHSDI [
> +  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && 
> TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
> ])
> (define_mode_iterator VNX32_QHSI [
> @@ -435,7 +435,7 @@
>(VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
> "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
>(VNx1SI "TARGET_MIN_VLEN == 32") VNx2SI VNx4SI VNx8SI (VNx16SI 
> "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
> -  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
> "TARGET_MIN_VLEN >= 128")
> +  (VNx4DI "TARGET

Re: Re: [PATCH 2/2] [V3] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-05 Thread jiawei




 Sorry for the late,  I will send the binutils patch within this week.


- Original Message -
From: "Kito Cheng"
To: "Fei Gao"
Cc: gcc-patches@gcc.gnu.org, pal...@dabbelt.com, jeffreya...@gmail.com, 
sinan@linux.alibaba.com, jia...@iscas.ac.cn
Sent: Mon, 5 Jun 2023 16:31:29 +0800
Subject: Re: [PATCH 2/2] [V3] [RISC-V] support cm.push cm.pop cm.popret in zcmp

Only a few minor comments, otherwise LGTM :)

But I guess we need to wait until binutils merge zc stuff.

> Zcmp can share the same logic as save-restore in stack allocation: 
> pre-allocation
> by cm.push, step 1 and step 2.
>
> please be noted cm.push pushes ra, s0-s11 in reverse order than what 
> save-restore does.
> So adaption has been done in .cfi directives in my patch.
>
> Signed-off-by: Fei Gao
>
> gcc/ChangeLog:
>
> * config/riscv/iterators.md (-8): slot offset in bytes
> (-16): likewise
> (-24): likewise
> (-32): likewise
> (-40): likewise
> (-48): likewise
> (-56): likewise
> (-64): likewise
> (-72): likewise
> (-80): likewise
> (-88): likewise
> (-96): likewise
> (-104): likewise

Use slot0_offset...slot12_offset.

> @@ -422,6 +430,16 @@ static const struct riscv_tune_info 
> riscv_tune_info_table[] = {
> #include "riscv-cores.def"
> };
>
> +typedef enum
> +{
> + PUSH_IDX = 0,
> + POP_IDX,
> + POPRET_IDX,
> + ZCMP_OP_NUM
> +} op_idx;

op_idx -> riscv_zcmp_op_t
> @@ -5388,6 +5487,42 @@ riscv_adjust_libcall_cfi_prologue ()
> return dwarf;
> }
>
> +static rtx
> +riscv_adjust_multi_push_cfi_prologue (int saved_size)
> +{
> + rtx dwarf = NULL_RTX;
> + rtx adjust_sp_rtx, reg, mem, insn;
> + unsigned int mask = cfun->machine->frame.mask;
> + int offset;
> + int saved_cnt = 0;
> +
> + if (mask & S10_MASK)
> + mask |= S11_MASK;
> +
> + for (int regno = GP_REG_LAST; regno >= GP_REG_FIRST; regno--)
> + if (BITSET_P (mask & MULTI_PUSH_GPR_MASK, regno - GP_REG_FIRST))
> + {
> + /* The save order is s11-s0, ra
> + from high to low addr. */
> + offset = saved_size - UNITS_PER_WORD * (++saved_cnt);
> +
> + reg = gen_rtx_REG (SImode, regno);

Should be Pmode rather than SImode, and seems
riscv_adjust_libcall_cfi_prologue has same issue...could you send a
separate patch to fix that?

> + mem = gen_frame_mem (SImode, plus_constant (Pmode,

Same here.

> + stack_pointer_rtx,
> + offset));
> +
> + insn = gen_rtx_SET (mem, reg);
> + dwarf = alloc_reg_note (REG_CFA_OFFSET, insn, dwarf);
> + }
> +
> + /* Debug info for adjust sp. */
> + adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
> + plus_constant(Pmode, stack_pointer_rtx, -saved_size));
> + dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
> + dwarf);
> + return dwarf;
> +}
> +
> static void
> riscv_emit_stack_tie (void)
> {


> @@ -5493,6 +5697,32 @@ riscv_expand_prologue (void)
> }
> }
>
> +static rtx
> +riscv_adjust_multi_pop_cfi_epilogue (int saved_size)
> +{
> + rtx dwarf = NULL_RTX;
> + rtx adjust_sp_rtx, reg;
> + unsigned int mask = cfun->machine->frame.mask;
> +
> + if (mask & S10_MASK)
> + mask |= S11_MASK;
> +
> + /* Debug info for adjust sp. */
> + adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
> + plus_constant(Pmode, stack_pointer_rtx, saved_size));
> + dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
> + dwarf);
> +
> + for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> + if (BITSET_P (mask, regno - GP_REG_FIRST))
> + {
> + reg = gen_rtx_REG (SImode, regno);

Pmode

> + dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
> + }
> +
> + return dwarf;
> +}
> +
> static rtx
> riscv_adjust_libcall_cfi_epilogue ()
> {

> diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
> new file mode 100644
> index 000..f2f2198598c
> --- /dev/null
> +++ b/gcc/config/riscv/zc.md
> @@ -0,0 +1,1042 @@
> +;; Machine description for RISC-V Zc extention.
> +;; Copyright (C) 2011-2023 Free Software Foundation, Inc.

2023 rather than 2011-2023




Re: Re: [PATCH 2/2] [V3] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-05 Thread Fei Gao
Thanks Kito. 
I will propose V4 and also make a separate patch to fix 
riscv_adjust_libcall_cfi_prologue. 

BR, 
Fei

On 2023-06-05 16:31  Kito Cheng  wrote:
>
>Only a few minor comments, otherwise LGTM :)
>
>But I guess we need to wait until binutils merge zc stuff.
>
>> Zcmp can share the same logic as save-restore in stack allocation: 
>> pre-allocation
>> by cm.push, step 1 and step 2.
>>
>> please be noted cm.push pushes ra, s0-s11 in reverse order than what 
>> save-restore does.
>> So adaption has been done in .cfi directives in my patch.
>>
>> Signed-off-by: Fei Gao 
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/iterators.md (-8): slot offset in bytes
>> (-16): likewise
>> (-24): likewise
>> (-32): likewise
>> (-40): likewise
>> (-48): likewise
>> (-56): likewise
>> (-64): likewise
>> (-72): likewise
>> (-80): likewise
>> (-88): likewise
>> (-96): likewise
>> (-104): likewise
>
>Use slot0_offset...slot12_offset. 
>
>> @@ -422,6 +430,16 @@ static const struct riscv_tune_info 
>> riscv_tune_info_table[] = {
>>  #include "riscv-cores.def"
>>  };
>>
>> +typedef enum
>> +{
>> +  PUSH_IDX = 0,
>> +  POP_IDX,
>> +  POPRET_IDX,
>> +  ZCMP_OP_NUM
>> +} op_idx;
>
>op_idx -> riscv_zcmp_op_t 
>> @@ -5388,6 +5487,42 @@ riscv_adjust_libcall_cfi_prologue ()
>>    return dwarf;
>>  }
>>
>> +static rtx
>> +riscv_adjust_multi_push_cfi_prologue (int saved_size)
>> +{
>> +  rtx dwarf = NULL_RTX;
>> +  rtx adjust_sp_rtx, reg, mem, insn;
>> +  unsigned int mask = cfun->machine->frame.mask;
>> +  int offset;
>> +  int saved_cnt = 0;
>> +
>> +  if (mask & S10_MASK)
>> +    mask |= S11_MASK;
>> +
>> +  for (int regno = GP_REG_LAST; regno >= GP_REG_FIRST; regno--)
>> +    if (BITSET_P (mask & MULTI_PUSH_GPR_MASK, regno - GP_REG_FIRST))
>> +  {
>> +    /* The save order is s11-s0, ra
>> +   from high to low addr.  */
>> +    offset = saved_size - UNITS_PER_WORD * (++saved_cnt);
>> +
>> +    reg = gen_rtx_REG (SImode, regno);
>
>Should be Pmode rather than SImode, and seems
>riscv_adjust_libcall_cfi_prologue has same issue...could you send a
>separate patch to fix that? 
>
>> +    mem = gen_frame_mem (SImode, plus_constant (Pmode,
>
>Same here.
>
>> +    stack_pointer_rtx,
>> +    offset));
>> +
>> +    insn = gen_rtx_SET (mem, reg);
>> +    dwarf = alloc_reg_note (REG_CFA_OFFSET, insn, dwarf);
>> +  }
>> +
>> +  /* Debug info for adjust sp.  */
>> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
>> +   plus_constant(Pmode, stack_pointer_rtx, 
>> -saved_size));
>> +  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>> +  dwarf);
>> +  return dwarf;
>> +}
>> +
>>  static void
>>  riscv_emit_stack_tie (void)
>>  {
>
>
>> @@ -5493,6 +5697,32 @@ riscv_expand_prologue (void)
>>  }
>>  }
>>
>> +static rtx
>> +riscv_adjust_multi_pop_cfi_epilogue (int saved_size)
>> +{
>> +  rtx dwarf = NULL_RTX;
>> +  rtx adjust_sp_rtx, reg;
>> +  unsigned int mask = cfun->machine->frame.mask;
>> +
>> +  if (mask & S10_MASK)
>> +    mask |= S11_MASK;
>> +
>> +  /* Debug info for adjust sp.  */
>> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
>> +   plus_constant(Pmode, stack_pointer_rtx, 
>> saved_size));
>> +  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>> +  dwarf);
>> +
>> +  for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
>> +    if (BITSET_P (mask, regno - GP_REG_FIRST))
>> +  {
>> +    reg = gen_rtx_REG (SImode, regno);
>
>Pmode
>
>> +    dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>> +  }
>> +
>> +  return dwarf;
>> +}
>> +
>>  static rtx
>>  riscv_adjust_libcall_cfi_epilogue ()
>>  {
>
>> diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>> new file mode 100644
>> index 000..f2f2198598c
>> --- /dev/null
>> +++ b/gcc/config/riscv/zc.md
>> @@ -0,0 +1,1042 @@
>> +;; Machine description for RISC-V Zc extention.
>> +;; Copyright (C) 2011-2023 Free Software Foundation, Inc.
>
>2023 rather than 2011-2023

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-05 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
> GCC maintainers:
> 
> The following patch adds three buitins for inserting and extracting the
> exponent and significand for an IEEE 128-bit floating point values. 
> The builtins are valid for Power 9 and Power 10.  

We already have:

unsigned long long int scalar_extract_exp (__ieee128 source);
unsigned __int128 scalar_extract_sig (__ieee128 source);
ieee_128 scalar_insert_exp (unsigned __int128 significand,
unsigned long long int exponent);
ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long int 
exponent);

you need to say something about the requirements or the justification for
adding more, for this patch itself, some comments are inline below, thanks!

> 
> The patch has been tested on both Power 9 and Power 10.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> --
> From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00 2001
> From: Carl Love 
> Date: Wed, 12 Apr 2023 17:46:37 -0400
> Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values
> 
> Add support for the following builtins:
> 
>  __vector unsigned long long int __builtin_extractf128_exp (__ieee128);

Could you make the name similar to the existing one?  The existing one
  
  unsigned long long int scalar_extract_exp (__ieee128 source);

has nothing like f128 on its name, this variant is just to change the
return type to vector type, how about scalar_extract_exp_to_vec?

>  __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);

Ditto.

>  __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
>__vector unsigned long long);

This one can just overload the existing scalar_insert_exp?

> 
> gcc/
>   * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
>__builtin_extractf128_sig, __builtin_insertf128_exp): Add new
>   builtin definitions.
>   * config/rs6000.md (extractf128_exp_, insertf128_exp_,
>   extractf128_sig_): Add define_expand for new builtins.
>   (xsxexpqp_f128_, xsxsigqp_f128_, siexpqpf_f128_):
>   Add define_insn for new builtins.
>   * doc/extend.texi (__builtin_extractf128_exp, __builtin_extractf128_sig,
>   __builtin_insertf128_exp): Add documentation for new builtins.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
>   * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
>   * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |  9 +++
>  gcc/config/rs6000/vsx.md  | 66 ++-
>  gcc/doc/extend.texi   | 28 
>  .../powerpc/bfp/extract-exp-ieee128.c | 49 ++
>  .../powerpc/bfp/extract-sig-ieee128.c | 56 
>  .../powerpc/bfp/insert-exp-ieee128.c  | 58 
>  6 files changed, 265 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 638d0bc72ca..3247a7f7673 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2876,6 +2876,15 @@
>pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>  XL_LEN_R xl_len_r {}
>  
> +  vull __builtin_extractf128_exp (_Float128);
> +EEXPKF extractf128_exp_kf {}
> +
> +  vuq __builtin_extractf128_sig (_Float128);
> +ESIGKF extractf128_sig_kf {}
> +
> +  _Float128 __builtin_insertf128_exp (vuq, vull);
> +IEXPKF_VULL insertf128_exp_kf {}
> +

Put them to be near its related ones like __builtin_vsx_scalar_extract_expq
etc.

>  
>  ; Builtins requiring hardware support for IEEE-128 floating-point.
>  [ieee128-hw]
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 7d845df5c2d..2a9f875ba57 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -369,7 +369,10 @@
> UNSPEC_XXSPLTI32DX
> UNSPEC_XXBLEND
> UNSPEC_XXPERMX
> -  ])
> +   UNSPEC_EXTRACTEXPIEEE
> +   UNSPEC_EXTRACTSIGIEEE
> +   UNSPEC_INSERTEXPIEEE

These are not necessary, just use the existing UNSPEC_VSX_SXEXPDP etc.

> +])
>  
>  (define_int_iterator XVCVBF16[UNSPEC_VSX_XVCVSPBF16
>UNSPEC_VSX_XVCVBF16SPN])
> @@ -4155,6 +4158,38 @@
>   "vinsrx %0,%1,%2"
>   [(set_attr "type" "vecsimple")])
>  
> +(define_expand "extractf128_exp_"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
> +  (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand")]
> +   UNSPEC_EXTRACTEXPI

Re: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread Kito Cheng via Gcc-patches
LGTM too, thanks :)

On Mon, Jun 5, 2023 at 4:27 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM,
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: pan2.li
> Date: 2023-06-05 16:20
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
> Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
> From: Pan Li 
>
> This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
> SEW=16 for below instructions:
>
> vfadd vfsub vfrsub vfwadd vfwsub
> vfmul vfdiv vfrdiv vfwmul
> vfmacc vfnmacc vfmsac vfnmsac vfmadd
> vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
> vfsqrt vfrsqrt7 vfrec7
> vfmin vfmax
> vfsgnj vfsgnjn vfsgnjx
> vmfeq vmfne vmflt vmfle vmfgt vmfge
> vfclass vfmerge
> vfmv
> vfcvt vfwcvt vfncvt
>
> Then users can leverage the instrinsic APIs to perform the FP=16 related
> operations. Please note not all the instrinsic APIs are coverred in the
> test files, only pick some typical ones due to too many. We will perform
> the FP16 related instrinsic API test entirely soon.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
> (vfloat32m1_t): Ditto.
> (vfloat32m2_t): Ditto.
> (vfloat32m4_t): Ditto.
> (vfloat32m8_t): Ditto.
> (vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
> (vint16mf2_t): Ditto.
> (vint16m1_t): Ditto.
> (vint16m2_t): Ditto.
> (vint16m4_t): Ditto.
> (vint16m8_t): Ditto.
> (vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
> (vuint16mf2_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint16m2_t): Ditto.
> (vuint16m4_t): Ditto.
> (vuint16m8_t): Ditto.
> (vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
> (vint32m1_t): Ditto.
> (vint32m2_t): Ditto.
> (vint32m4_t): Ditto.
> (vint32m8_t): Ditto.
> (vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
> (vuint32m1_t): Ditto.
> (vuint32m2_t): Ditto.
> (vuint32m4_t): Ditto.
> (vuint32m8_t): Ditto.
> * config/riscv/vector-iterators.md: Add FP=16 support for V,
> VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
>
> Signed-off-by: Pan Li 
> ---
> .../riscv/riscv-vector-builtins-types.def |  32 ++
> gcc/config/riscv/vector-iterators.md  |  21 +
> .../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
> 3 files changed, 471 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index 9cb3aca992e..1e2491de6d6 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
> DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
> DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
> +DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | 
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +
> DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
> DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
> DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
> DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
> DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
> @@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
> DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
> @@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_WCONVERT_I_OPS (vint32mf

RE: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, June 5, 2023 4:47 PM
To: juzhe.zh...@rivai.ai
Cc: Li, Pan2 ; gcc-patches ; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic 
API

LGTM too, thanks :)

On Mon, Jun 5, 2023 at 4:27 PM juzhe.zh...@rivai.ai  
wrote:
>
> LGTM,
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: pan2.li
> Date: 2023-06-05 16:20
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
> Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point 
> intrinsic API
> From: Pan Li 
>
> This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
> SEW=16 for below instructions:
>
> vfadd vfsub vfrsub vfwadd vfwsub
> vfmul vfdiv vfrdiv vfwmul
> vfmacc vfnmacc vfmsac vfnmsac vfmadd
> vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac vfsqrt 
> vfrsqrt7 vfrec7 vfmin vfmax vfsgnj vfsgnjn vfsgnjx vmfeq vmfne vmflt 
> vmfle vmfgt vmfge vfclass vfmerge vfmv vfcvt vfwcvt vfncvt
>
> Then users can leverage the instrinsic APIs to perform the FP=16 
> related operations. Please note not all the instrinsic APIs are 
> coverred in the test files, only pick some typical ones due to too 
> many. We will perform the FP16 related instrinsic API test entirely soon.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
> (vfloat32m1_t): Ditto.
> (vfloat32m2_t): Ditto.
> (vfloat32m4_t): Ditto.
> (vfloat32m8_t): Ditto.
> (vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
> (vint16mf2_t): Ditto.
> (vint16m1_t): Ditto.
> (vint16m2_t): Ditto.
> (vint16m4_t): Ditto.
> (vint16m8_t): Ditto.
> (vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
> (vuint16mf2_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint16m2_t): Ditto.
> (vuint16m4_t): Ditto.
> (vuint16m8_t): Ditto.
> (vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
> (vint32m1_t): Ditto.
> (vint32m2_t): Ditto.
> (vint32m4_t): Ditto.
> (vint32m8_t): Ditto.
> (vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
> (vuint32m1_t): Ditto.
> (vuint32m2_t): Ditto.
> (vuint32m4_t): Ditto.
> (vuint32m8_t): Ditto.
> * config/riscv/vector-iterators.md: Add FP=16 support for V, 
> VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
>
> Signed-off-by: Pan Li 
> ---
> .../riscv/riscv-vector-builtins-types.def |  32 ++
> gcc/config/riscv/vector-iterators.md  |  21 +
> .../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
> 3 files changed, 471 insertions(+)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index 9cb3aca992e..1e2491de6d6 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, 
> RVV_REQUIRE_FULL_V) DEF_RVV_FULL_V_U_OPS (vuint64m4_t, 
> RVV_REQUIRE_FULL_V) DEF_RVV_FULL_V_U_OPS (vuint64m8_t, 
> RVV_REQUIRE_FULL_V)
> +DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | 
> +RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_WEXTF_OPS 
> +(vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32) 
> +DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | 
> +RVV_REQUIRE_ELEN_FP_32) DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH 
> +| RVV_REQUIRE_ELEN_FP_32) DEF_RVV_WEXTF_OPS (vfloat32m8_t, 
> +TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +
> DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64) 
> DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64) 
> DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64) 
> DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | 
> +RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 
> +TARGET_ZVFH) DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH) DEF_RVV_CONVERT_I_OPS 
> +(vint16m4_t, TARGET_ZVFH) DEF_RVV_CONVERT_I_OPS (vint16m8_t, 
> +TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64) 
> DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0) DEF_RVV_CONVERT_I_OPS 
> (vint32m2_t, 0) @@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS 
> (vint64m2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_CONVERT_I_OPS (vint64m4_t, 
> RVV_REQUIRE_ELEN_64) DEF_RVV_CONVERT_I_OPS (vint64m8_t, 
> RVV_REQUIRE_ELEN_64)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | 
> +RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, 
> +TARGET_ZVFH) DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH) 
> +DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
> +
> DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64) 
> D

Re: [PATCH] libiberty: On Windows pass a >32k cmdline through a response file.

2023-06-05 Thread Costas Argyris via Gcc-patches
Thanks, here is the follow up patch for a couple typos in the same file.

On Mon, 5 Jun 2023 at 09:12, Jonathan Yong <10wa...@gmail.com> wrote:

> On 5/23/23 08:21, Jonathan Yong wrote:
> > On 5/22/23 13:25, Costas Argyris wrote:
> >> Currently on Windows, when CreateProcess is called with a command-line
> >> that exceeds the 32k Windows limit, we get a very bad error:
> >>
> >> "CreateProcess: No such file or directory"
> >>
> >> This patch detects the case where this would happen and writes the
> >> long command-line to a temporary response file and calls CreateProcess
> >> with @file instead.
> >>
> >
> > Looks OK to me.
> >
> > I will commit it around next week if there are no objections.
> >
>
> Done, pushed to master, thanks.
>
>
From 45c18cf113585aa0b03512a459e757c7aaef69ce Mon Sep 17 00:00:00 2001
From: Costas Argyris 
Date: Mon, 5 Jun 2023 10:03:11 +0100
Subject: [PATCH] libiberty: pex-win32.c: Fix some typos.

Signed-off-by: Costas Argyris 
---
 libiberty/pex-win32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libiberty/pex-win32.c b/libiberty/pex-win32.c
index 0fd8b38734c..f7fe306036b 100644
--- a/libiberty/pex-win32.c
+++ b/libiberty/pex-win32.c
@@ -351,7 +351,7 @@ argv_to_cmdline (char *const *argv)
 	 prevent wasting 2 chars per argument of the CreateProcess 32k char
 	 limit.  We need only escape embedded double-quotes and immediately
 	 preceeding backslash characters.  A sequence of backslach characters
-	 that is not follwed by a double quote character will not be
+	 that is not followed by a double quote character will not be
 	 escaped.  */
   needs_quotes = 0;
   for (j = 0; argv[i][j]; j++)
@@ -366,7 +366,7 @@ argv_to_cmdline (char *const *argv)
 	  /* Escape preceeding backslashes.  */
 	  for (k = j - 1; k >= 0 && argv[i][k] == '\\'; k--)
 		cmdline_len++;
-	  /* Escape the qote character.  */
+	  /* Escape the quote character.  */
 	  cmdline_len++;
 	}
 	}
-- 
2.30.2



Re: [PATCH] rs6000: Remove duplicate expression [PR106907]

2023-06-05 Thread Segher Boessenkool
Hi!

On Mon, Jun 05, 2023 at 12:11:42PM +0530, P Jeevitha wrote:
> PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
> expression issue here. Here the same expression is used twice in logical
> AND(&&) operation which result in same result so removing that.
> 
> 2023-06-05  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/106907
>   * config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove
>   duplicate expression.
> 
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 42f49e4a56b..d197c3f3289 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -28784,7 +28784,6 @@ vec_const_128bit_to_bytes (rtx op,
>  
>info->all_words_same
>  = (info->words[0] == info->words[1]
> -   && info->words[0] == info->words[1]
> && info->words[0] == info->words[2]
> && info->words[0] == info->words[3]);

Thanks!  Okay for trunk.  Also okay for all backports, no need to wait
if unexpected problems in trunk show up.  But still, backport to 13
first, then 12, then 11, only stop when it stops applying (or there are
no open release branches left) :-)


Segher


[PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch optimizes the boolean evaluation of EQ/NE against zero
by adding two insn_and_split patterns similar to SImode conditional
store:

"eq_zero":
op0 = (op1 == 0) ? 1 : 0;
op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */

"movsicc_ne0_reg_0":
op0 = (op1 != 0) ? op2 : 0;
op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */

/* example #1 */
int bool_eqSI(int x) {
  return x == 0;
}
int bool_neSI(int x) {
  return x != 0;
}

;; after (TARGET_NSA)
bool_eqSI:
nsaua2, a2
srlia2, a2, 5
ret.n
bool_neSI:
mov.n   a9, a2
movi.n  a2, 1
moveqz  a2, a9, a9
ret.n

These also work in SFmode by ignoring their sign bits, and further-
more, the branch if EQ/NE against zero in SFmode is also done in the
same manner.

The reasons for this optimization in SFmode are:

  - Only zero values (negative or non-negative) contain no bits of 1
with both the exponent and the mantissa.
  - EQ/NE comparisons involving NaNs produce no signal even if they
are signaling.
  - Even if the use of IEEE 754 single-precision floating-point co-
processor is configured (TARGET_HARD_FLOAT is true):
1. Load zero value to FP register
2. Possibly, additional FP move if the comparison target is
   an address register
3. FP equality check instruction
4. Read the boolean register containing the result, or condi-
   tional branch
As noted above, a considerable number of instructions are still
generated.

/* example #2 */
int bool_eqSF(float x) {
  return x == 0;
}
int bool_neSF(float x) {
  return x != 0;
}
int bool_ltSF(float x) {
  return x < 0;
}
extern void foo(void);
void cb_eqSF(float x) {
  if(x != 0)
foo();
}
void cb_neSF(float x) {
  if(x == 0)
foo();
}
void cb_geSF(float x) {
  if(x < 0)
foo();
}

;; after
;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
bool_eqSF:
add.n   a2, a2, a2
nsaua2, a2
srlia2, a2, 5
ret.n
bool_neSF:
add.n   a9, a2, a2
movi.n  a2, 1
moveqz  a2, a9, a9
ret.n
bool_ltSF:
movi.n  a9, 0
wfr f0, a2
wfr f1, a9
olt.s   b0, f0, f1
movi.n  a9, 0
movi.n  a2, 1
movfa2, a9, b0
ret.n
cb_eqSF:
add.n   a2, a2, a2
beqz.n  a2, .L6
j.l foo, a9
.L6:
ret.n
cb_neSF:
add.n   a2, a2, a2
bnez.n  a2, .L8
j.l foo, a9
.L8:
ret.n
cb_geSF:
addisp, sp, -16
movi.n  a3, 0
s32i.n  a12, sp, 8
s32i.n  a0, sp, 12
mov.n   a12, a2
call0   __unordsf2
bnez.n  a2, .L10
movi.n  a3, 0
mov.n   a2, a12
call0   __gesf2
bneia2, -1, .L10
l32i.n  a0, sp, 12
l32i.n  a12, sp, 8
addisp, sp, 16
j.l foo, a9
.L10:
l32i.n  a0, sp, 12
l32i.n  a12, sp, 8
addisp, sp, 16
ret.n

gcc/ChangeLog:

* config/xtensa/predicates.md (const_float_0_operand):
Rename from obsolete "const_float_1_operand" and change the
constant to compare.
(cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
New.
* config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
Add code for EQ/NE comparison with constant zero in SFmode.
(xtensa_expand_scc): Added code to derive boolean evaluation
of EQ/NE with constant zero for comparison in SFmode.
(xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
zero inside "cbranchsf4" to 0.
* config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
Change "match_operator" and the third "match_operand" to the
ones mentioned above.
(movsicc_ne0_reg_zero, eq_zero): New.
---
 gcc/config/xtensa/predicates.md | 17 +--
 gcc/config/xtensa/xtensa.cc | 45 
 gcc/config/xtensa/xtensa.md | 53 +
 3 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index a3575a68892..cfac3ad4936 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -155,11 +155,11 @@
&& CONSTANT_P (op)
&& GET_MODE_SIZE (mode) % UNITS_PER_WORD == 0")
 
-;; Accept the floating point constant 1 in the appropriate mode.
-(define_predicate "const_float_1_operand"
+;; Accept the floating point constant 0 in the appropriate mode.
+(define_predicate "const_float_0_operand"
   (match_code "const_double")
 {
-  return real_equal (CONST_DOUBLE_REAL_VALUE (op), &dconst1);
+  return real_equal (CONST_DOUBLE_REAL_VAL

Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
>>> No, I meant that the comment I quoted seemed to be saying that solution
>>> 3 wasn't possible.  The comment seemed to say that we would need to do
>>> solution 1.
> I am so sorry that I didn't write the comments accurately.
> Could you help me with comments ? Base on what we have discussed above (I 
> think we are on same page now).

Yeah, agree we seem to be on the same page

>>> When comparing solutions 2 and 3 for case (b), is solution 3 still better?
>>> E.g. is "vsetvli zero" cheaper than "vsetvli "?
>
>
> "vsetvli zero" is the same cost as "vsetvli gpr", 
>
> I think for (b),  solution 2 and solution 3 should be almost the same.

OK, thanks.  If we wanted to use solution 2 for (b), the condition
would be just:

  LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1

dropping the:

  LOOP_VINFO_LENS (loop_vinfo).length () == 1

But it would make the pointer IV updates more complex.  So let's
say that that's the reason for preferring solution 3.

So rather than:

+  /* If we're using decrement IV approach in loop control, we can use output of
+ SELECT_VL to adjust IV of loop control and data reference when it 
satisfies
+ the following checks:
+
+ (a) SELECT_VL is supported by the target.
+ (b) LOOP_VINFO is single-rgroup control.
+ (c) non-SLP.
+ (d) LOOP can not be unrolled.
+
+ Otherwise, we use MIN_EXPR approach.
+
+ 1. We only apply SELECT_VL on single-rgroup since:
+
+ (1). Multiple-rgroup controls N vector loads/stores would need N pointer
+ updates by variable amounts.
+ (2). SELECT_VL allows flexible length (<=VF) in each iteration.
+ (3). For decrement IV approach, we calculate the MAX length of the loop
+ and then deduce the length of each control from this MAX length.
+
+ Base on (1), (2) and (3) situations, if we try to use SELECT_VL on
+ multiple-rgroup control, we need to generate multiple SELECT_VL to
+ carefully adjust length of each control. Such approach is very inefficient
+ and unprofitable for targets that are using a standalone instruction
+ to configure the length of each operation.
+ E.g. RISC-V vector use 'vsetvl' to configure the length of each operation.

how about:

  /* If a loop uses length controls and has a decrementing loop control IV,
 we will normally pass that IV through a MIN_EXPR to calcaluate the
 basis for the length controls.  E.g. in a loop that processes one
 element per scalar iteration, the number of elements would be
 MIN_EXPR , where N is the number of scalar iterations left.

 This MIN_EXPR approach allows us to use pointer IVs with an invariant
 step, since only the final iteration of the vector loop can have
 inactive lanes.

 However, some targets have a dedicated instruction for calculating the
 preferred length, given the total number of elements that still need to
 be processed.  This is encapsulated in the SELECT_VL internal function.

 If the target supports SELECT_VL, we can use it instead of MIN_EXPR
 to determine the basis for the length controls.  However, unlike the
 MIN_EXPR calculation, the SELECT_VL calculation can decide to make
 lanes inactive in any iteration of the vector loop, not just the last
 iteration.  This SELECT_VL approach therefore requires us to use pointer
 IVs with variable steps.

 Once we've decided how many elements should be processed by one
 iteration of the vector loop, we need to populate the rgroup controls.
 If a loop has multiple rgroups, we need to make sure that those rgroups
 "line up" (that is, they must be consistent about which elements are
 active and which aren't).  This is done by vect_adjust_loop_lens_control.

 In principle, it would be possible to use vect_adjust_loop_lens_control
 on either the result of a MIN_EXPR or the result of a SELECT_VL.
 However:

 (1) In practice, it only makes sense to use SELECT_VL when a vector
 operation will be controlled directly by the result.  It is not
 worth using SELECT_VL if it would only be the input to other
 calculations.

 (2) If we use SELECT_VL for an rgroup that has N controls, each associated
 pointer IV will need N updates by a variable amount (N-1 updates
 within the iteration and 1 update to move to the next iteration).
 
 Because of this, we prefer to use the MIN_EXPR approach whenever there
 is more than one length control.

 In addition, SELECT_VL always operates to a granularity of 1 unit.
 If we wanted to use it to control an SLP operation on N consecutive
 elements, we would need to make the SELECT_VL inputs measure scalar
 iterations (rather than elements) and then multiply the SELECT_VL
 result by N.  But using SELECT_VL this way is inefficient because
 of (1) above.

Thanks,
Richard


Re: Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
>> But it would make the pointer IV updates more complex.  So let's
>> say that that's the reason for preferring solution 3.

Yes, I prefer pointer solution 3 to avoid complex IV updates and there is
no benefits in solution 2(unlike single-rgroup).

I read your comments, it's more comprehensive than I wrote.

I will send V3 patch with appending your comments.

Thanks you so much!


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-05 18:09
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Add SELECT_VL support
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
>>> No, I meant that the comment I quoted seemed to be saying that solution
>>> 3 wasn't possible.  The comment seemed to say that we would need to do
>>> solution 1.
> I am so sorry that I didn't write the comments accurately.
> Could you help me with comments ? Base on what we have discussed above (I 
> think we are on same page now).
 
Yeah, agree we seem to be on the same page
 
>>> When comparing solutions 2 and 3 for case (b), is solution 3 still better?
>>> E.g. is "vsetvli zero" cheaper than "vsetvli "?
>
>
> "vsetvli zero" is the same cost as "vsetvli gpr", 
>
> I think for (b),  solution 2 and solution 3 should be almost the same.
 
OK, thanks.  If we wanted to use solution 2 for (b), the condition
would be just:
 
  LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1
 
dropping the:
 
  LOOP_VINFO_LENS (loop_vinfo).length () == 1
 
But it would make the pointer IV updates more complex.  So let's
say that that's the reason for preferring solution 3.
 
So rather than:
 
+  /* If we're using decrement IV approach in loop control, we can use output of
+ SELECT_VL to adjust IV of loop control and data reference when it 
satisfies
+ the following checks:
+
+ (a) SELECT_VL is supported by the target.
+ (b) LOOP_VINFO is single-rgroup control.
+ (c) non-SLP.
+ (d) LOOP can not be unrolled.
+
+ Otherwise, we use MIN_EXPR approach.
+
+ 1. We only apply SELECT_VL on single-rgroup since:
+
+ (1). Multiple-rgroup controls N vector loads/stores would need N pointer
+   updates by variable amounts.
+ (2). SELECT_VL allows flexible length (<=VF) in each iteration.
+ (3). For decrement IV approach, we calculate the MAX length of the loop
+   and then deduce the length of each control from this MAX length.
+
+ Base on (1), (2) and (3) situations, if we try to use SELECT_VL on
+ multiple-rgroup control, we need to generate multiple SELECT_VL to
+ carefully adjust length of each control. Such approach is very inefficient
+ and unprofitable for targets that are using a standalone instruction
+ to configure the length of each operation.
+ E.g. RISC-V vector use 'vsetvl' to configure the length of each operation.
 
how about:
 
  /* If a loop uses length controls and has a decrementing loop control IV,
 we will normally pass that IV through a MIN_EXPR to calcaluate the
 basis for the length controls.  E.g. in a loop that processes one
 element per scalar iteration, the number of elements would be
 MIN_EXPR , where N is the number of scalar iterations left.
 
 This MIN_EXPR approach allows us to use pointer IVs with an invariant
 step, since only the final iteration of the vector loop can have
 inactive lanes.
 
 However, some targets have a dedicated instruction for calculating the
 preferred length, given the total number of elements that still need to
 be processed.  This is encapsulated in the SELECT_VL internal function.
 
 If the target supports SELECT_VL, we can use it instead of MIN_EXPR
 to determine the basis for the length controls.  However, unlike the
 MIN_EXPR calculation, the SELECT_VL calculation can decide to make
 lanes inactive in any iteration of the vector loop, not just the last
 iteration.  This SELECT_VL approach therefore requires us to use pointer
 IVs with variable steps.
 
 Once we've decided how many elements should be processed by one
 iteration of the vector loop, we need to populate the rgroup controls.
 If a loop has multiple rgroups, we need to make sure that those rgroups
 "line up" (that is, they must be consistent about which elements are
 active and which aren't).  This is done by vect_adjust_loop_lens_control.
 
 In principle, it would be possible to use vect_adjust_loop_lens_control
 on either the result of a MIN_EXPR or the result of a SELECT_VL.
 However:
 
 (1) In practice, it only makes sense to use SELECT_VL when a vector
 operation will be controlled directly by the result.  It is not
 worth using SELECT_VL if it would only be the input to other
 calculations.
 
 (2) If we use SELECT_VL for an rgroup that has N controls, each associated
 pointer IV will need N updates by a variable amount (N-1 updates
 within the iteration and 1 update to move to the next iteration).
 

Re: [PATCH] New wi::bitreverse function.

2023-06-05 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> This patch provides a wide-int implementation of bitreverse, that
> implements both of Richard Sandiford's suggestions from the review at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html of an
> improved API (as a stand-alone function matching the bswap refactoring),
> and an implementation that works with any bit-width precision.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> (and a make check-gcc).  Ok for mainline?

OK, thanks.

> Are the remaining pieces
> of the above patch pre-approved (pending re-testing)?  The aim is that
> this new code will be thoroughly tested by the new *-2.c test cases in
> https://gcc.gnu.org/git/?p=gcc.git;h=c09471fbc7588db2480f036aa56a2403d3c03ae
> 5
> with a minor tweak to use the BITREVERSE rtx in the NVPTX back-end,
> followed by similar tests on other targets that provide bit-reverse
> built-ins (such as ARM and xstormy16), in advance of support for a
> backend-independent solution to PR middle-end/50481.

Ah, great.  Yeah, in that case, OK for the rest as well.

Richard


[PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford

This patch address comments from Richard and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i

---
 gcc/doc/md.texi | 22 
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 -
 gcc/tree-vect-loop.cc   | 72 +
 gcc/tree-vect-stmts.cc  | 66 ++
 gcc/tree-vectorizer.h   |  6 
 7 files changed, 191 insertions(+), 9 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-&incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+insert_after, &index_before_incr, &index_after_incr);
+ tree len = gimple_build (header_seq, IFN_SELEC

Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi.
Thanks for the help.
This patch is boostrap PASS. Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-05 18:30
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
Co-authored-by: Richard Sandiford
 
This patch address comments from Richard and rebase to trunk.
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
 
---
gcc/doc/md.texi | 22 
gcc/internal-fn.def |  1 +
gcc/optabs.def  |  1 +
gcc/tree-vect-loop-manip.cc | 32 -
gcc/tree-vect-loop.cc   | 72 +
gcc/tree-vect-stmts.cc  | 66 ++
gcc/tree-vectorizer.h   |  6 
7 files changed, 191 insertions(+), 9 deletions(-)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
OPTAB_D (len_load_optab, "len_load_$a")
OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
- &incr_gsi, insert_after, &index_before_incr,
- &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ {
+   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+  insert_after, &index_before_incr, &index_after_incr);
+   tree len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+index_before_in

回复: Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
More update,  just passed regression on X86.

Thanks.


juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-06-05 18:40
收件人: 钟居哲; gcc-patches
抄送: richard.sandiford; rguenther
主题: Re: [PATCH V3] VECT: Add SELECT_VL support
Hi, Richard and Richi.
Thanks for the help.
This patch is boostrap PASS. Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-05 18:30
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
Co-authored-by: Richard Sandiford
 
This patch address comments from Richard and rebase to trunk.
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
 
---
gcc/doc/md.texi | 22 
gcc/internal-fn.def |  1 +
gcc/optabs.def  |  1 +
gcc/tree-vect-loop-manip.cc | 32 -
gcc/tree-vect-loop.cc   | 72 +
gcc/tree-vect-stmts.cc  | 66 ++
gcc/tree-vectorizer.h   |  6 
7 files changed, 191 insertions(+), 9 deletions(-)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
OPTAB_D (len_load_optab, "len_load_$a")
OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
- &incr_gsi, insert_after, &index_before_incr,
- &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vin

Re: [pushed] analyzer: implement various atomic builtins [PR109015]

2023-06-05 Thread Maxim Kuvyrkov via Gcc-patches
Hi David,

Hm, I'm seeing this failure only in pre-commit testing, but I don't see it in 
our post-commit testing of gcc:master.

Does this patch rely on your other patch committed just before this one?

--
Maxim Kuvyrkov
https://www.linaro.org




> On Jun 3, 2023, at 09:23, Maxim Kuvyrkov  wrote:
> 
> Hi David,
> 
> The new test ICEs the compiler on aarch64-linux-gnu [1].  Would you please 
> investigate?
> 
> Running gcc:gcc.dg/analyzer/analyzer.exp ...
> FAIL: gcc.dg/analyzer/atomic-builtins-qemu-sockets.c (internal compiler 
> error: in validate, at analyzer/store.cc:1329)
> FAIL: gcc.dg/analyzer/atomic-builtins-qemu-sockets.c (test for excess errors)
> 
> This is a simple native build on aarch64-linux-gnu.  Please let me know if 
> you need any help in reproducing this.
> 
> [1] 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/82/artifact/artifacts/artifacts.precommit/06-check_regression/results.compare/*view*/
> 
> Thanks!
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
> 
> 
> 
>> On Jun 2, 2023, at 17:32, David Malcolm via Gcc-patches 
>>  wrote:
>> 
>> This patch implements many of the __atomic_* builtins from
>> sync-builtins.def as known_function subclasses within the analyzer.
>> 
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>> Pushed to trunk as r14-1497-gef768035ae8090.
>> 
>> gcc/analyzer/ChangeLog:
>> PR analyzer/109015
>> * kf.cc (class kf_atomic_exchange): New.
>> (class kf_atomic_exchange_n): New.
>> (class kf_atomic_fetch_op): New.
>> (class kf_atomic_op_fetch): New.
>> (class kf_atomic_load): New.
>> (class kf_atomic_load_n): New.
>> (class kf_atomic_store_n): New.
>> (register_atomic_builtins): New function.
>> (register_known_functions): Call register_atomic_builtins.
>> 
>> gcc/testsuite/ChangeLog:
>> PR analyzer/109015
>> * gcc.dg/analyzer/atomic-builtins-1.c: New test.
>> * gcc.dg/analyzer/atomic-builtins-haproxy-proxy.c: New test.
>> * gcc.dg/analyzer/atomic-builtins-qemu-sockets.c: New test.
>> * gcc.dg/analyzer/atomic-types-1.c: New test.
>> ---
>> gcc/analyzer/kf.cc| 355 
>> .../gcc.dg/analyzer/atomic-builtins-1.c   | 544 ++
>> .../analyzer/atomic-builtins-haproxy-proxy.c  |  55 ++
>> .../analyzer/atomic-builtins-qemu-sockets.c   |  18 +
>> .../gcc.dg/analyzer/atomic-types-1.c  |  11 +
>> 5 files changed, 983 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/analyzer/atomic-builtins-1.c
>> create mode 100644 
>> gcc/testsuite/gcc.dg/analyzer/atomic-builtins-haproxy-proxy.c
>> create mode 100644 
>> gcc/testsuite/gcc.dg/analyzer/atomic-builtins-qemu-sockets.c
>> create mode 100644 gcc/testsuite/gcc.dg/analyzer/atomic-types-1.c
>> 
>> diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
>> index 93c46630f36..104499e 100644
>> --- a/gcc/analyzer/kf.cc
>> +++ b/gcc/analyzer/kf.cc
>> @@ -69,6 +69,235 @@ kf_alloca::impl_call_pre (const call_details &cd) const
>>  cd.maybe_set_lhs (ptr_sval);
>> }
>> 
>> +/* Handler for:
>> +   void __atomic_exchange (type *ptr, type *val, type *ret, int memorder).  
>> */
>> +
>> +class kf_atomic_exchange : public internal_known_function
>> +{
>> +public:
>> +  /* This is effectively:
>> +   *RET = *PTR;
>> +   *PTR = *VAL;
>> +  */
>> +  void impl_call_pre (const call_details &cd) const final override
>> +  {
>> +const svalue *ptr_ptr_sval = cd.get_arg_svalue (0);
>> +tree ptr_ptr_tree = cd.get_arg_tree (0);
>> +const svalue *val_ptr_sval = cd.get_arg_svalue (1);
>> +tree val_ptr_tree = cd.get_arg_tree (1);
>> +const svalue *ret_ptr_sval = cd.get_arg_svalue (2);
>> +tree ret_ptr_tree = cd.get_arg_tree (2);
>> +/* Ignore the memorder param.  */
>> +
>> +region_model *model = cd.get_model ();
>> +region_model_context *ctxt = cd.get_ctxt ();
>> +
>> +const region *val_region
>> +  = model->deref_rvalue (val_ptr_sval, val_ptr_tree, ctxt);
>> +const svalue *star_val_sval = model->get_store_value (val_region, ctxt);
>> +const region *ptr_region
>> +  = model->deref_rvalue (ptr_ptr_sval, ptr_ptr_tree, ctxt);
>> +const svalue *star_ptr_sval = model->get_store_value (ptr_region, ctxt);
>> +const region *ret_region
>> +  = model->deref_rvalue (ret_ptr_sval, ret_ptr_tree, ctxt);
>> +model->set_value (ptr_region, star_val_sval, ctxt);
>> +model->set_value (ret_region, star_ptr_sval, ctxt);
>> +  }
>> +};
>> +
>> +/* Handler for:
>> +   __atomic_exchange_n (type *ptr, type val, int memorder).  */
>> +
>> +class kf_atomic_exchange_n : public internal_known_function
>> +{
>> +public:
>> +  /* This is effectively:
>> +   RET = *PTR;
>> +   *PTR = VAL;
>> +   return RET;
>> +  */
>> +  void impl_call_pre (const call_details &cd) const final override
>> +  {
>> +const svalue *ptr_sval = cd.get_arg_svalue (0);
>> +tree ptr_tree = cd.get_arg_tree (0);
>> +const svalue *set_sval = cd.get_arg_svalue (1);
>> +/* Ignore the memorder param.  

Re: [PATCH] libiberty: On Windows pass a >32k cmdline through a response file.

2023-06-05 Thread Jonathan Yong via Gcc-patches

On 6/5/23 09:22, Costas Argyris wrote:

Thanks, here is the follow up patch for a couple typos in the same file.



Thanks, pushed as obvious.




Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

2023-06-05 Thread Thomas Schwinge
Hi!

OK to push the attached
"Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0d5095d8cd2d68113890a39a7fdb649198e576c1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 2 Jun 2023 23:11:00 +0200
Subject: [PATCH] Add
 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

	gcc/testsuite/
	* gfortran.fortran-torture/execute/math.f90: Enhance for optional
	OpenACC, OpenMP 'target' usage.
	libgomp/
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 .../gfortran.fortran-torture/execute/math.f90 | 23 +--
 .../fortran-torture_execute_math.f90  |  4 
 .../fortran-torture_execute_math.f90  |  5 
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90

diff --git a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90 b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
index 17cc78f7a10..e71f669304f 100644
--- a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
+++ b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
@@ -1,9 +1,14 @@
 ! Program to test mathematical intrinsics
+
+! See also 'libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90'; thus the '!$omp' directives.
+! See also 'libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90'; thus the '!$acc' directives.
+
 subroutine dotest (n, val4, val8, known)
implicit none
real(kind=4) val4, known
real(kind=8) val8
integer n
+   !$acc routine seq
 
if (abs (val4 - known) .gt. 0.001) STOP 1
if (abs (real (val8, kind=4) - known) .gt. 0.001) STOP 2
@@ -14,17 +19,20 @@ subroutine dotestc (n, val4, val8, known)
complex(kind=4) val4, known
complex(kind=8) val8
integer n
+   !$acc routine seq
+
if (abs (val4 - known) .gt. 0.001) STOP 3
if (abs (cmplx (val8, kind=4) - known) .gt. 0.001) STOP 4
 end subroutine
 
-program testmath
+subroutine testmath
implicit none
real(kind=4) r, two4, half4
real(kind=8) q, two8, half8
complex(kind=4) cr
complex(kind=8) cq
external dotest, dotestc
+   !$acc routine seq
 
two4 = 2.0
two8 = 2.0_8
@@ -96,5 +104,16 @@ program testmath
cq = log ((-1.0_8, -1.0_8))
call dotestc (21, cr, cq, (0.3466, -2.3562))
 
-end program
+end subroutine
 
+program main
+   implicit none
+   external testmath
+
+   !$acc serial
+   !$omp target
+   call testmath
+   !$acc end serial
+   !$omp end target
+
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90 b/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
new file mode 100644
index 000..3348a0bb3ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
@@ -0,0 +1,4 @@
+! { dg-do run }
+! { dg-additional-options -foffload-options=-lm }
+
+include '../../../gcc/testsuite/gfortran.fortran-torture/execute/math.f90'
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90 b/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
new file mode 100644
index 000..1b2ac440762
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
@@ -0,0 +1,5 @@
+! { dg-do run }
+!TODO { dg-prune-output {using 'vector_length \(32\)', ignoring 1} }
+! { dg-additional-options -foffload-options=-lm }
+
+include '../../../gcc/testsuite/gfortran.fortran-torture/execute/math.f90'
-- 
2.34.1



driver: Forward '-lgfortran', '-lm' to offloading compilation

2023-06-05 Thread Thomas Schwinge
Hi!

OK to push the attached
"driver: Forward '-lgfortran', '-lm' to offloading compilation"?
(We didn't have a PR open for that, or did we?)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5d3cb866cad3bbcf47c5e66825e5710e86cc017e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 5 Jun 2023 11:26:37 +0200
Subject: [PATCH] driver: Forward '-lgfortran', '-lm' to offloading compilation

..., so that users don't manually need to specify
'-foffload-options=-lgfortran', '-foffload-options=-lm' in addition to
'-lgfortran', '-lm' (specified manually, or implicitly by the driver).

	gcc/
	* gcc.cc (driver_handle_option): Forward host '-lgfortran', '-lm'
	to offloading compilation.
	* config/gcn/mkoffload.cc (main): Adjust.
	* config/nvptx/mkoffload.cc (main): Likewise.
	* doc/invoke.texi (foffload-options): Update example.
	libgomp/
	* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Don't
	set.
	* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags):
	Likewise.
	* testsuite/libgomp.c/simd-math-1.c: Remove
	'-foffload-options=-lm'.
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 gcc/config/gcn/mkoffload.cc   | 12 
 gcc/config/nvptx/mkoffload.cc | 12 
 gcc/doc/invoke.texi   |  5 +-
 gcc/gcc.cc| 56 +++
 libgomp/testsuite/libgomp.c/simd-math-1.c |  1 -
 .../fortran-torture_execute_math.f90  |  1 -
 libgomp/testsuite/libgomp.fortran/fortran.exp |  2 -
 .../fortran-torture_execute_math.f90  |  1 -
 .../libgomp.oacc-fortran/fortran.exp  |  2 -
 9 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 988c12318fd..8b608bf024e 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -946,6 +946,18 @@ main (int argc, char **argv)
   else if (startswith (argv[i], STR))
 	gcn_stack_size = atoi (argv[i] + strlen (STR));
 #undef STR
+  /* Translate host into offloading libraries.  */
+  else if (strcmp (argv[i], "-l_GCC_gfortran") == 0
+	   || strcmp (argv[i], "-l_GCC_m") == 0)
+	{
+	  /* Elide '_GCC_'.  */
+	  size_t i_dst = strlen ("-l");
+	  size_t i_src = strlen ("-l_GCC_");
+	  char c;
+	  do
+	c = argv[i][i_dst++] = argv[i][i_src++];
+	  while (c != '\0');
+	}
 }
 
   if (!(fopenacc ^ fopenmp))
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 6cdea45cffe..aaea9fb320d 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -649,6 +649,18 @@ main (int argc, char **argv)
   else if (strcmp (argv[i], "-dumpbase") == 0
 	   && i + 1 < argc)
 	dumppfx = argv[++i];
+  /* Translate host into offloading libraries.  */
+  else if (strcmp (argv[i], "-l_GCC_gfortran") == 0
+	   || strcmp (argv[i], "-l_GCC_m") == 0)
+	{
+	  /* Elide '_GCC_'.  */
+	  size_t i_dst = strlen ("-l");
+	  size_t i_src = strlen ("-l_GCC_");
+	  char c;
+	  do
+	c = argv[i][i_dst++] = argv[i][i_src++];
+	  while (c != '\0');
+	}
 }
   if (!(fopenacc ^ fopenmp))
 fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> "
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2d639c92d4..7b3a2a74459 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2716,9 +2716,8 @@ the @option{-foffload-options=@var{target-list}=@var{options}} form.  The
 Typical command lines are
 
 @smallexample
--foffload-options=-lgfortran -foffload-options=-lm
--foffload-options="-lgfortran -lm" -foffload-options=nvptx-none=-latomic
--foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
+-foffload-options='-fno-math-errno -ffinite-math-only' -foffload-options=nvptx-none=-latomic
+-foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-O3
 @end smallexample
 
 @opindex fopenacc
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 2ccca00d603..15995206856 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -47,6 +47,9 @@ compilation is specified by a string called a "spec".  */
 #include "opts-jobserver.h"
 #include "common/common-target.h"
 
+#ifndef MATH_LIBRARY
+#define MATH_LIBRARY "m"
+#endif
 
 
 /* Manage the manipulation of env vars.
@@ -4117,6 +4120,48 @@ next_item:
 }
 }
 
+/* Forward certain options to offloading compilation.  */
+
+static void
+forward_offload_option (size_t opt_index, const char *arg, bool validated)
+{
+  switch (opt_index)
+{
+case OPT_l:
+  /* Use a '_GCC_' prefix and standard name ('-l_GCC_m' irrespective of the
+	 host's 'MATH_LIBRARY', for example), so that the 'mkoffload's can tell
+	 this ha

Re: [RFA] Improve strcmp expansion when one input is a constant string.

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 00:29, Richard Biener wrote:



But then for example x86 has smaller encoding for byte ops and while
widening is easily done later, truncation is not.

Which probably argues we need to be checking costs.



Btw, you failed to update the overall function comment which lists
the conversions applied.
ACK.  It occurred to me when I woke up today that I also failed to 
handle the case where word_mode is actually smaller than an int.




Note I would have expected to use the mode of the load so we truly
elide some extensions, using word_mode looks like just another
mode here?  The key to note is probably

   op0 = convert_modes (mode, unit_mode, op0, 1);
   op1 = convert_modes (mode, unit_mode, op1, 1);
   rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
   result, 1, OPTAB_WIDEN);

On many (most?) targets the loads can cheaply extend to word_mode.



which uses OPTAB_WIDEN - wouldn't it be better to pass in the
unconverted modes and leave the decision which mode to use
to OPTAB_WIDEN?  Should we somehow query the target for
the smallest mode from unit_mode it can do both the MINUS
and the compare?
I'll play with it.  My worry is that the optabs are going to leave 
things in SImode.  RV64 has 32 bit add/subtract which implicitly sign 
extends the result to 64 bits and Jivan's patch models that behavior 
with generally very good results.  But again, I'll play with it.


I do agree that we need to be looking at cost modeling more in here.  So 
I'll poke at that too.



jeff


Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-05 Thread Andrew Stubbs

On 30/05/2023 07:26, Richard Biener wrote:

On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs  wrote:


Hi all,

I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and there are no vector equivalents of that type.

Therefore, this patch adds minimal support for "complex vector int"
modes.  I have not attempted to provide any means to use these modes
from C, so they're really only useful for DIVMOD.  The actual libfunc
implementation will pack the data into wider vector modes manually.

A knock-on effect of this is that I needed to increase the range of
"mode_unit_size" (several of the vector modes supported by amdgcn exceed
the previous 255-byte limit).

Since this change would add a large number of new, unused modes to many
architectures, I have elected to *not* enable them, by default, in
machmode.def (where the other complex modes are created).  The new modes
are therefore inactive on all architectures but amdgcn, for now.

OK for mainline?  (I've not done a full test yet, but I will.)


I think it makes more sense to map vector CSImode to vector SImode with
the double number of lanes.  In fact since divmod is a libgcc function
I wonder where your vector variant would reside and how GCC decides to
emit calls to it?  That is, there's no way to OMP simd declare this function?


The divmod implementation lives in libgcc. It's not too difficult to 
write using vector extensions and some asm tricks. I did try an OMP simd 
declare implementation, but it didn't vectorize well, and that's a yack 
I don't wish to shave right now.


In any case, the OMP simd declare will not help us here, directly, 
because the DIVMOD transformation happens too late in the pass pipeline, 
long after ifcvt and vect. My implementation (not yet posted), uses a 
libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way. 
It just needs the complex vector modes to exist.


Using vectors twice the length is problematic also. If I create a new 
V128SImode that spans across two 64-lane vector registers then that will 
probably have the desired effect ("real" quotient in v8, "imaginary" 
remainder in v9), but if I use V64SImode to represent two V32SImode 
vectors then that's a one-register mode, and I'll have to use a 
permutation (a memory operation) to extract lanes 32-63 into lanes 0-31, 
and if we ever want to implement instructions that operate on these 
modes (as opposed to the odd/even add/sub complex patterns we have now) 
then the masking will be all broken and we'd need to constantly 
disassemble the double length vectors to operate on them.


The implementation I proposed is essentially a struct containing two 
vectors placed in consecutive registers. This is the natural 
representation for the architecture.


Anyway, you don't like this patch and I see that AArch64 is picking 
apart BLKmode to see if there's complex inside, so maybe I can make 
something like that work here? AArch64 doesn't seem to use 
TARGET_EXPAND_DIVMOD_LIBFUNC though, and I'm pretty sure the problem I 
was trying to solve was in the way the expand pass handles the BLKmode 
complex, outside the control of the backend hook (I'm still paging this 
stuff back in, post vacation).


Thanks

Andrew


[PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-05 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.

With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.

The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.

One has to count and this often is tedious.  Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.

This new syntax takes into account many of the common things that are done in MD
files.   It's also worth saying that this version is intended to deal with the
common case of a string based alternatives.   For C chunks we have some ideas
but those are not intended to be addressed here.

It's easiest to explain with an example:

normal syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  
r,  r,  r, w,r,w, w")
(match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  "@
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %1
   #
   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
   ldr\\t%w0, %1
   ldr\\t%s0, %1
   str\\t%w1, %0
   str\\t%s1, %0
   adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
   adr\\t%x0, %c1
   adrp\\t%x0, %A1
   fmov\\t%s0, %w1
   fmov\\t%w0, %s1
   fmov\\t%s0, %s1
   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(const_int 0)]
   "{
   aarch64_expand_mov_immediate (operands[0], operands[1]);
   DONE;
}"
  ;; The "mov_imm" type for CNT is just a placeholder.
  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,

load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
   (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
   (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
]
)

New syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand")
(match_operand:SI 1 "aarch64_mov_operand"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  {@ [cons: =0, 1; attrs: type, arch, length]
 [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
 [k , r  ; mov_reg  , *   , 4] ^
 [r , k  ; mov_reg  , *   , 4] ^
 [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
 [r , n  ; mov_imm  , *   ,16] #
 /* The "mov_imm" type for CNT is just a placeholder.  */
 [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", 
"%x0", operands[1]);
 [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
 [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
 [m , rZ ; store_4  , *   , 4] str\t%w1, %0
 [m , w  ; store_4  , fp  , 4] str\t%s1, %0
 [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
 [r , Usa; adr  , *   , 4] adr\t%x0, %c1
 [r , Ush; adr  , *   , 4] adrp\t%x0, %A1
 [w , rZ ; f_mcr, fp  , 4] fmov\t%s0, %w1
 [r , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
 [w , w  ; fmov , fp  , 4] fmov\t%s0, %s1
 [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], SImode);
  }
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
  [(const_int 0)]
  {
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
  }
)

The patch contains some more rewritten examples for both Arm and AArch64.  I
have included them for examples in this patch but the final version posted in
will have these split out.

The main syntax rules are as follows (See docs for full rules):
  - Template must start with "{@" and end with "}" to use the new syntax.
  - "{@" is followed by a layout in parentheses which is "cons:" followed by
a list of match_operand/match_scratch IDs, then a semicolon, then the
same for attributes ("attrs:"). Both sections are optional (so you can
use only cons, or only attrs, or both), and cons must come before attrs
if present.
  - Each alternative begins with any amount of whitespace.
  - Following the whitespace is a comma-separated list of constraints and/or
attributes within brackets [], with sections separated by a semicolon.
  - Following the closing ']' is any amount of whitespace, and then the actual
asm output.
  - Spaces are allowed in the list (they will simply be removed).
  - All alternatives should be specified: a blank list should be
"[,,]", "[,,;,]" etc., n

Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing

2023-06-05 Thread Maxim Kuvyrkov via Gcc-patches
> On Jun 3, 2023, at 19:17, Jeff Law  wrote:
> 
> On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote:
>> This patch adds tracking of current testsuite "tool" and "exp"
>> to the processing of .sum files.  This avoids aliasing between
>> tests from different testsuites with same name+description.
>> E.g., this is necessary for testsuite/c-c++-common, which is ran
>> for both gcc and g++ "tools".
>> This patch changes manifest format from ...
>> 
>> FAIL: gcc_test
>> FAIL: g++_test
>> 
>> ... to ...
>> 
>> === gcc tests ===
>> Running gcc/foo.exp ...
>> FAIL: gcc_test
>> === gcc Summary ==
>> === g++ tests ===
>> Running g++/bar.exp ...
>> FAIL: g++_test
>> === g++ Summary ==
>> .
>> The new format uses same formatting as DejaGnu's .sum files
>> to specify which "tool" and "exp" the test belongs to.
> I think the series is fine.  You're not likely to hear from Diego or Doug I 
> suspect, I don't think either are involved in GNU stuff anymore.
> 

Thanks, Jeff.  I'll wait for a couple of days and will merge if there are no 
new comments.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org



[PATCH] libiberty: writeargv: Simplify function error mode.

2023-06-05 Thread Costas Argyris via Gcc-patches
writeargv can be simplified by getting rid of the error exit mode
that was only relevant many years ago when the function used
to open the file descriptor internally.
From 1271552baee5561fa61652f4ca7673c9667e4f8f Mon Sep 17 00:00:00 2001
From: Costas Argyris 
Date: Mon, 5 Jun 2023 15:02:06 +0100
Subject: [PATCH] libiberty: writeargv: Simplify function error mode.

The goto-based error mode was based on a previous version
of the function where it was responsible for opening the
file, so it had to close it upon any exit:

https://inbox.sourceware.org/gcc-patches/20070417200340.gm9...@sparrowhawk.codesourcery.com/

(thanks pinskia)

This is no longer the case though since now the function
takes the file descriptor as input, so the exit mode on
error can be just a simple return 1 statement.

Signed-off-by: Costas Argyris 
---
 libiberty/argv.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/libiberty/argv.c b/libiberty/argv.c
index a95a10e14ff..1a18b4d8866 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -289,8 +289,8 @@ char **buildargv (const char *input)
 @deftypefn Extension int writeargv (char * const *@var{argv}, FILE *@var{file})
 
 Write each member of ARGV, handling all necessary quoting, to the file
-named by FILE, separated by whitespace.  Return 0 on success, non-zero
-if an error occurred while writing to FILE.
+associated with FILE, separated by whitespace.  Return 0 on success,
+non-zero if an error occurred while writing to FILE.
 
 @end deftypefn
 
@@ -314,36 +314,25 @@ writeargv (char * const *argv, FILE *f)
 
   if (ISSPACE(c) || c == '\\' || c == '\'' || c == '"')
 if (EOF == fputc ('\\', f))
-  {
-status = 1;
-goto done;
-  }
+  return 1;
 
   if (EOF == fputc (c, f))
-{
-  status = 1;
-  goto done;
-}
+return 1;
+	  
   arg++;
 }
 
   /* Write out a pair of quotes for an empty argument.  */
   if (arg == *argv)
-	if (EOF == fputs ("\"\"", f))
-	  {
-	status = 1;
-	goto done;
-	  }
+if (EOF == fputs ("\"\"", f))
+  return 1;
 
   if (EOF == fputc ('\n', f))
-{
-  status = 1;
-  goto done;
-}
+return 1;
+  
   argv++;
 }
 
- done:
   return status;
 }
 
-- 
2.30.2



[PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch support the intrinsic API of FP16 ZVFH Reduction floating-point.
Aka SEW=16 for below instructions:

vfredosum vfredusum
vfredmax vfredmin
vfwredosum vfwredusum

Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.
---
 .../riscv/riscv-vector-builtins-types.def |  7 +++
 gcc/config/riscv/vector-iterators.md  | 12 
 .../riscv/rvv/base/zvfh-intrinsic.c   | 58 ++-
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 1e2491de6d6..bd3deae8340 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -634,6 +634,13 @@ DEF_RVV_WU_OPS (vuint32m2_t, 0)
 DEF_RVV_WU_OPS (vuint32m4_t, 0)
 DEF_RVV_WU_OPS (vuint32m8_t, 0)
 
+DEF_RVV_WF_OPS (vfloat16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WF_OPS (vfloat16mf2_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m1_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m2_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m4_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m8_t, TARGET_ZVFH)
+
 DEF_RVV_WF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_WF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
 DEF_RVV_WF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e4f2ba90799..c338e3c9003 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
 ])
 
 (define_mode_iterator VWF [
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_MIN_VLEN < 128") VNx2SF VNx4SF VNx8SF (VNx16SF 
"TARGET_MIN_VLEN > 32") (VNx32SF "TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VWF_ZVE64 [
+  VNx1HF VNx2HF VNx4HF VNx8HF VNx16HF VNx32HF
   VNx1SF VNx2SF VNx4SF VNx8SF VNx16SF
 ])
 
@@ -1322,6 +1330,7 @@ (define_mode_attr VWLMUL1 [
   (VNx8HI "VNx4SI") (VNx16HI "VNx4SI") (VNx32HI "VNx4SI") (VNx64HI "VNx4SI")
   (VNx1SI "VNx2DI") (VNx2SI "VNx2DI") (VNx4SI "VNx2DI")
   (VNx8SI "VNx2DI") (VNx16SI "VNx2DI") (VNx32SI "VNx2DI")
+  (VNx1HF "VNx4SF") (VNx2HF "VNx4SF") (VNx4HF "VNx4SF") (VNx8HF "VNx4SF") 
(VNx16HF "VNx4SF") (VNx32HF "VNx4SF") (VNx64HF "VNx4SF")
   (VNx1SF "VNx2DF") (VNx2SF "VNx2DF")
   (VNx4SF "VNx2DF") (VNx8SF "VNx2DF") (VNx16SF "VNx2DF") (VNx32SF "VNx2DF")
 ])
@@ -1333,6 +1342,7 @@ (define_mode_attr VWLMUL1_ZVE64 [
   (VNx8HI "VNx2SI") (VNx16HI "VNx2SI") (VNx32HI "VNx2SI")
   (VNx1SI "VNx1DI") (VNx2SI "VNx1DI") (VNx4SI "VNx1DI")
   (VNx8SI "VNx1DI") (VNx16SI "VNx1DI")
+  (VNx1HF "VNx2SF") (VNx2HF "VNx2SF") (VNx4HF "VNx2SF") (VNx8HF "VNx2SF") 
(VNx16HF "VNx2SF") (VNx32HF "VNx2SF")
   (VNx1SF "VNx1DF") (VNx2SF "VNx1DF")
   (VNx4SF "VNx1DF") (VNx8SF "VNx1DF") (VNx16SF "VNx1DF")
 ])
@@ -1393,6 +1403,7 @@ (define_mode_attr vwlmul1 [
   (VNx8HI "vnx4si") (VNx16HI "vnx4si") (VNx32HI "vnx4si") (VNx64HI "vnx4si")
   (VNx1SI "vnx2di") (VNx2SI "vnx2di") (VNx4SI "vnx2di")
   (VNx8SI "vnx2di") (VNx16SI "vnx2di") (VNx32SI "vnx2di")
+  (VNx1HF "vnx4sf") (VNx2HF "vnx4sf") (VNx4HF "vnx4sf") (VNx8HF "vnx4sf") 
(VNx16HF "vnx4sf") (VNx32HF "vnx4sf") (VNx64HF "vnx4sf")
   (VNx1SF "vnx2df") (VNx2SF "vnx2df")
   (VNx4SF "vnx2df") (VNx8SF "vnx2df") (VNx16SF "vnx2df") (VNx32SF "vnx2df")
 ])
@@ -1404,6 +1415,7 @@ (define_mode_attr vwlmul1_zve64 [
   (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2SI")
   (VNx1SI "vnx1di") (VNx2SI "vnx1di") (VNx4SI "vnx1di")
   (VNx8SI "vnx1di") (VNx16SI "vnx1di")
+  (VNx1HF "vnx2sf") (VNx2HF "vnx2sf") (VNx4HF "vnx2sf") (VNx8HF "vnx2sf") 
(VNx16HF "vnx2sf") (VNx32HF "vnx2sf")
   (VNx1SF "vnx1df") (VNx2SF "vnx1df")
   (VNx4SF "vnx1df") (VNx8SF "vnx1df") (VNx16SF "vnx1df")
 ])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c 
b/gcc/testsuite/gcc.target

Re: [patch] Fix PR101188 wrong code from postreload

2023-06-05 Thread Georg-Johann Lay




Am 03.06.23 um 17:53 schrieb Jeff Law:



On 6/2/23 02:46, Georg-Johann Lay wrote:

There is the following bug in postreload that can be traced back
to v5 at least:

In postreload.cc::reload_cse_move2add() there is a loop over all
insns.  If it encounters a SET, the next insn is analyzed if it
is a single_set.

After next has been analyzed, it continues with

   if (success)
 delete_insn (insn);
   changed |= success;
   insn = next; // This effectively skips analysis of next.
   move2add_record_mode (reg);
   reg_offset[regno]
 = trunc_int_for_mode (added_offset + base_offset,
   mode);
   continue; // for continues with insn = NEXT_INSN (insn).

So it records the effect of next, but not the clobbers that
next might have.  This is a problem if next clobbers a GPR
like it can happen for avr.  What then can happen is that in a
later round, it may use a value from a (partially) clobbered reg.

The patch records the effects of potential clobbers.

Bootstrapped and reg-tested on x86_64.  Also tested on avr where
the bug popped up.  The testcase discriminates on avr, and for now
I am not aware of any other target that's affected by the bug.

The change is not intrusive and fixes wrong code, so I'd like
to backport it.

Ok to apply?

Johann

rtl-optimization/101188: Don't bypass clobbers of some insns that are
optimized or are optimization candidates.

gcc/
 PR rtl-optimization/101188
 * postreload.cc (reload_cse_move2add): Record clobbers of next
 insn using move2add_note_store.

gcc/testsuite/
 PR rtl-optimization/101188
 * gcc.c-torture/execute/pr101188.c: New test.
If I understand the code correctly, isn't the core of the problem that 
we "continue" rather than executing the rest of the code in the loop. In 
particular the continue bypasses this chunk of code:



 for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
    {
  if (REG_NOTE_KIND (note) == REG_INC
  && REG_P (XEXP (note, 0)))
    {
  /* Reset the information about this register.  */
  int regno = REGNO (XEXP (note, 0));
  if (regno < FIRST_PSEUDO_REGISTER)
    {
  move2add_record_mode (XEXP (note, 0));
  reg_mode[regno] = VOIDmode;
    }
    }
    }

  /* There are no REG_INC notes for SP autoinc.  */
  subrtx_var_iterator::array_type array;
  FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (insn), NONCONST)
    {
  rtx mem = *iter;
  if (mem
  && MEM_P (mem)
  && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
    {
  if (XEXP (XEXP (mem, 0), 0) == stack_pointer_rtx)
    reg_mode[STACK_POINTER_REGNUM] = VOIDmode;
    }
    }

  note_stores (insn, move2add_note_store, insn);


Of particular importance for your case would be the note_stores call. 
But I could well see other targets needing the search for REG_INC notes 
as well as stack pushes.


If I'm right, then wouldn't it be better to factor that blob of code 
above into its own function, then use it before the "continue" rather 
than implementing a custom can for CLOBBERS?


It also begs the question if the other case immediately above the code I 
quoted needs similar adjustment.  It doesn't do the insn = next, but it 
does bypass the search for autoinc memory references and the note_stores 
call.



Jeff


So if I understand you correctly, this means that my patch is declined?

Johann


[PATCH v1] RISC-V: Fix some typo in vector-iterators.md

2023-06-05 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to fix some typo in vector-iterators.md, aka:

[-"vnx1DI")-]{+"vnx1di")+}
[-"vnx2SI")-]{+"vnx2si")+}
[-"vnx1SI")-]{+"vnx1si")+}

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix typo in mode attr.
---
 gcc/config/riscv/vector-iterators.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e4f2ba90799..665a77eaf50 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1367,8 +1367,8 @@ (define_mode_attr vlmul1_zve64 [
   (VNx8HI "vnx4hi") (VNx16HI "vnx4hi") (VNx32HI "vnx4hi")
   (VNx1SI "vnx2si") (VNx2SI "vnx2si") (VNx4SI "vnx2si")
   (VNx8SI "vnx2si") (VNx16SI "vnx2si")
-  (VNx1DI "vnx1DI") (VNx2DI "vnx1DI")
-  (VNx4DI "vnx1DI") (VNx8DI "vnx1DI")
+  (VNx1DI "vnx1di") (VNx2DI "vnx1di")
+  (VNx4DI "vnx1di") (VNx8DI "vnx1di")
   (VNx1SF "vnx2sf") (VNx2SF "vnx2sf")
   (VNx4SF "vnx2sf") (VNx8SF "vnx2sf") (VNx16SF "vnx2sf")
   (VNx1DF "vnx1df") (VNx2DF "vnx1df")
@@ -1401,7 +1401,7 @@ (define_mode_attr vwlmul1_zve64 [
   (VNx1QI "vnx4hi") (VNx2QI "vnx4hi") (VNx4QI "vnx4hi")
   (VNx8QI "vnx4hi") (VNx16QI "vnx4hi") (VNx32QI "vnx4hi") (VNx64QI "vnx4hi")
   (VNx1HI "vnx2si") (VNx2HI "vnx2si") (VNx4HI "vnx2si")
-  (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2SI")
+  (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2si")
   (VNx1SI "vnx1di") (VNx2SI "vnx1di") (VNx4SI "vnx1di")
   (VNx8SI "vnx1di") (VNx16SI "vnx1di")
   (VNx1SF "vnx1df") (VNx2SF "vnx1df")
@@ -1412,7 +1412,7 @@ (define_mode_attr vwlmul1_zve32 [
   (VNx1QI "vnx2hi") (VNx2QI "vnx2hi") (VNx4QI "vnx2hi")
   (VNx8QI "vnx2hi") (VNx16QI "vnx2hi") (VNx32QI "vnx2hi")
   (VNx1HI "vnx1si") (VNx2HI "vnx1si") (VNx4HI "vnx1si")
-  (VNx8HI "vnx1si") (VNx16HI "vnx1SI")
+  (VNx8HI "vnx1si") (VNx16HI "vnx1si")
 ])
 
 (define_mode_attr VDEMOTE [
-- 
2.34.1



Ping: Fwd: [V9][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-06-05 Thread Qing Zhao via Gcc-patches
Ping on this patch.

The C FE and Doc changes has been approved.
Please help to review and approve the Middle-end change.

Or provide guide on how to move this patch forward.

Thanks a lot for the help.

Qing

Begin forwarded message:

From: Qing Zhao mailto:qing.z...@oracle.com>>
Subject: [V9][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: May 30, 2023 at 2:30:28 PM EDT
To: jos...@codesourcery.com, 
richard.guent...@gmail.com, 
ja...@redhat.com, 
gcc-patches@gcc.gnu.org
Cc: keesc...@chromium.org, 
siddh...@gotplt.org, 
uec...@tugraz.at, Qing Zhao 
mailto:qing.z...@oracle.com>>

Richard or Jakub,

could you please review this patch and see whether it's Okay to commit?

thanks a lot.

Qing

===

GCC extension accepts the case when a struct with a C99 flexible array member
is embedded into another struct or union (possibly recursively) as the last
field.
__builtin_object_size should treat such struct as flexible size.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDES_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDES_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit 
type_includes_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_includes_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-object-size.cc (addr_object_size): Handle 
structure/union type
when it has flexible size.
* tree-streamer-in.cc 
(unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc 
(pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDES_FLEXARRAY): New macro TYPE_INCLUDES_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.

change TYPE_INCLUDES_FLEXARRAY to TYPE_INCLUDES_FLEXARRAY
---
gcc/c/c-decl.cc   |  11 ++
gcc/lto/lto-common.cc |   5 +-
gcc/print-tree.cc |   5 +
.../gcc.dg/builtin-object-size-pr101832.c | 134 ++
gcc/tree-core.h   |   2 +
gcc/tree-object-size.cc   |  23 
++-
gcc/tree-streamer-in.cc   |   5 
+-
gcc/tree-streamer-out.cc  |   
5 +-
gcc/tree.h|   7 +-
9 files changed, 192 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc 
b/gcc/c/c-decl.cc
index b5b491cf2da..0c718151f6d 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9282,6 +9282,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
  /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
  DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);

+  /* Set TYPE_INCLUDES_FLEXARRAY for the context of x, t.
+ when x is an array and is the last field.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+ TYPE_INCLUDES_FLEXARRAY (t)
+  = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
+  /* Recursively set TYPE_INCLUDES_FLEXARRAY for the context of x, t
+ when x is an union or record and is the last field.  */
+  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
+ TYPE_INCLUDES_FLEXARRAY (t)
+  = is_last_field && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x));
+
  if (DECL_NAME (x)
 || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/lto/lto-common.cc 
b/gcc/lto/lto-common.cc
index 537570204b3..f6b85bbc6f7 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1275,7 +1275,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
  if (AGGREGATE_TYPE_P (t1))
compare_values (TYPE_TYPELESS_STORAGE);
  compare_values (TYPE_EMPTY_P);
-  compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (FUNC_OR_METHOD_TYPE_P (t1))
+ compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (RECORD_OR_UNION_T

Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Max Filippov via Gcc-patches
Hi Suwa-san,

On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
 wrote:
>
> This patch optimizes the boolean evaluation of EQ/NE against zero
> by adding two insn_and_split patterns similar to SImode conditional
> store:
>
> "eq_zero":
> op0 = (op1 == 0) ? 1 : 0;
> op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
>
> "movsicc_ne0_reg_0":
> op0 = (op1 != 0) ? op2 : 0;
> op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
>
> /* example #1 */
> int bool_eqSI(int x) {
>   return x == 0;
> }
> int bool_neSI(int x) {
>   return x != 0;
> }
>
> ;; after (TARGET_NSA)
> bool_eqSI:
> nsaua2, a2
> srlia2, a2, 5
> ret.n
> bool_neSI:
> mov.n   a9, a2
> movi.n  a2, 1
> moveqz  a2, a9, a9
> ret.n
>
> These also work in SFmode by ignoring their sign bits, and further-
> more, the branch if EQ/NE against zero in SFmode is also done in the
> same manner.
>
> The reasons for this optimization in SFmode are:
>
>   - Only zero values (negative or non-negative) contain no bits of 1
> with both the exponent and the mantissa.
>   - EQ/NE comparisons involving NaNs produce no signal even if they
> are signaling.
>   - Even if the use of IEEE 754 single-precision floating-point co-
> processor is configured (TARGET_HARD_FLOAT is true):
> 1. Load zero value to FP register
> 2. Possibly, additional FP move if the comparison target is
>an address register
> 3. FP equality check instruction
> 4. Read the boolean register containing the result, or condi-
>tional branch
> As noted above, a considerable number of instructions are still
> generated.
>
> /* example #2 */
> int bool_eqSF(float x) {
>   return x == 0;
> }
> int bool_neSF(float x) {
>   return x != 0;
> }
> int bool_ltSF(float x) {
>   return x < 0;
> }
> extern void foo(void);
> void cb_eqSF(float x) {
>   if(x != 0)
> foo();
> }
> void cb_neSF(float x) {
>   if(x == 0)
> foo();
> }
> void cb_geSF(float x) {
>   if(x < 0)
> foo();
> }
>
> ;; after
> ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> bool_eqSF:
> add.n   a2, a2, a2
> nsaua2, a2
> srlia2, a2, 5
> ret.n
> bool_neSF:
> add.n   a9, a2, a2
> movi.n  a2, 1
> moveqz  a2, a9, a9
> ret.n
> bool_ltSF:
> movi.n  a9, 0
> wfr f0, a2
> wfr f1, a9
> olt.s   b0, f0, f1
> movi.n  a9, 0
> movi.n  a2, 1
> movfa2, a9, b0
> ret.n
> cb_eqSF:
> add.n   a2, a2, a2
> beqz.n  a2, .L6
> j.l foo, a9
> .L6:
> ret.n
> cb_neSF:
> add.n   a2, a2, a2
> bnez.n  a2, .L8
> j.l foo, a9
> .L8:
> ret.n
> cb_geSF:
> addisp, sp, -16
> movi.n  a3, 0
> s32i.n  a12, sp, 8
> s32i.n  a0, sp, 12
> mov.n   a12, a2
> call0   __unordsf2
> bnez.n  a2, .L10
> movi.n  a3, 0
> mov.n   a2, a12
> call0   __gesf2
> bneia2, -1, .L10
> l32i.n  a0, sp, 12
> l32i.n  a12, sp, 8
> addisp, sp, 16
> j.l foo, a9
> .L10:
> l32i.n  a0, sp, 12
> l32i.n  a12, sp, 8
> addisp, sp, 16
> ret.n
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (const_float_0_operand):
> Rename from obsolete "const_float_1_operand" and change the
> constant to compare.
> (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> New.
> * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> Add code for EQ/NE comparison with constant zero in SFmode.
> (xtensa_expand_scc): Added code to derive boolean evaluation
> of EQ/NE with constant zero for comparison in SFmode.
> (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> zero inside "cbranchsf4" to 0.
> * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> Change "match_operator" and the third "match_operand" to the
> ones mentioned above.
> (movsicc_ne0_reg_zero, eq_zero): New.
> ---
>  gcc/config/xtensa/predicates.md | 17 +--
>  gcc/config/xtensa/xtensa.cc | 45 
>  gcc/config/xtensa/xtensa.md | 53 +
>  3 files changed, 106 insertions(+), 9 deletions(-)

This version performs much better than v1, but there's still new
testsuite failure in the gcc.c-torture/execute/bitfld-3.c
and the following change in the generated code
from:

   l32i.n  a11, a7, 8
   l8uia9, a7, 12
   movia10, 0xff
   add.n   a9, a9, a10
   addi.n  a7, a11, -1
   movi.n  

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-05 Thread Alexander Monakov via Gcc-patches
Ping for the front-end maintainers' input.

On Mon, 22 May 2023, Richard Biener wrote:

> On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
>  wrote:
> >
> > Implement -ffp-contract=on for C and C++ without changing default
> > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
> 
> The documentation changes mention the defaults are changed for
> standard modes, I suppose you want to remove that hunk.
> 
> > gcc/c-family/ChangeLog:
> >
> > * c-gimplify.cc (fma_supported_p): New helper.
> > (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
> > contraction.
> >
> > gcc/ChangeLog:
> >
> > * common.opt (fp_contract_mode) [on]: Remove fallback.
> > * config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
> > * doc/invoke.texi (-ffp-contract): Update.
> > * trans-mem.cc (diagnose_tm_1): Skip internal function calls.
> > ---
> >  gcc/c-family/c-gimplify.cc | 78 ++
> >  gcc/common.opt |  3 +-
> >  gcc/config/sh/sh.md|  2 +-
> >  gcc/doc/invoke.texi|  8 ++--
> >  gcc/trans-mem.cc   |  3 ++
> >  5 files changed, 88 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> > index ef5c7d919f..f7635d3b0c 100644
> > --- a/gcc/c-family/c-gimplify.cc
> > +++ b/gcc/c-family/c-gimplify.cc
> > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "c-ubsan.h"
> >  #include "tree-nested.h"
> >  #include "context.h"
> > +#include "tree-pass.h"
> > +#include "internal-fn.h"
> >
> >  /*  The gimplification pass converts the language-dependent trees
> >  (ld-trees) emitted by the parser into language-independent trees
> > @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
> > body)
> >return bind;
> >  }
> >
> > +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
> > +
> > +static bool
> > +fma_supported_p (enum internal_fn fn, tree type)
> > +{
> > +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> > +}
> > +
> >  /* Gimplification of expression trees.  */
> >
> >  /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
> > @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > ATTRIBUTE_UNUSED,
> > break;
> >}
> >
> > +case PLUS_EXPR:
> > +case MINUS_EXPR:
> > +  {
> > +   tree type = TREE_TYPE (*expr_p);
> > +   /* For -ffp-contract=on we need to attempt FMA contraction only
> > +  during initial gimplification.  Late contraction across statement
> > +  boundaries would violate language semantics.  */
> > +   if (SCALAR_FLOAT_TYPE_P (type)
> > +   && flag_fp_contract_mode == FP_CONTRACT_ON
> > +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
> > +   && fma_supported_p (IFN_FMA, type))
> > + {
> > +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
> > +
> > +   tree *op0_p = &TREE_OPERAND (*expr_p, 0);
> > +   tree *op1_p = &TREE_OPERAND (*expr_p, 1);
> > +
> > +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
> > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
> > +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
> > + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
> > +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
> > + {
> > +   std::swap (op0_p, op1_p);
> > +   std::swap (neg_mul, neg_add);
> > + }
> > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
> > + {
> > +   op0_p = &TREE_OPERAND (*op0_p, 0);
> > +   neg_mul = !neg_mul;
> > + }
> > +   if (TREE_CODE (*op0_p) != MULT_EXPR)
> > + break;
> > +   auto_vec ops (3);
> > +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
> > +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
> > +   ops.quick_push (*op1_p);
> > +
> > +   enum internal_fn ifn = IFN_FMA;
> > +   if (neg_mul)
> > + {
> > +   if (fma_supported_p (IFN_FNMA, type))
> > + ifn = IFN_FNMA;
> > +   else
> > + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
> > + }
> > +   if (neg_add)
> > + {
> > +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : IFN_FNMS;
> > +   if (fma_supported_p (ifn2, type))
> > + ifn = ifn2;
> > +   else
> > + ops[2] = build1 (NEGATE_EXPR, type, ops[2]);
> > + }
> > +   for (auto &&op : ops)
> > + if (gimplify_expr (&op, pre_p, post_p, is_gimple_val, 
> > fb_rvalue)
> > + == GS_ERROR)
> > +   return GS_ERROR;
> > +
> > +   gcall *call = gimple_build_call_inter

[COMMITTED] reginfo: Change return type of predicate functions from int to bool

2023-06-05 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog:

* rtl.h (reg_classes_intersect_p): Change return type from int to bool.
(reg_class_subset_p): Ditto.
* reginfo.cc (reg_classes_intersect_p): Ditto.
(reg_class_subset_p): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros
diff --git a/gcc/reginfo.cc b/gcc/reginfo.cc
index 24f612bed59..d472a35946a 100644
--- a/gcc/reginfo.cc
+++ b/gcc/reginfo.cc
@@ -1134,9 +1134,9 @@ reg_scan_mark_refs (rtx x, rtx_insn *insn)
 }
 
 
-/* Return nonzero if C1 is a subset of C2, i.e., if every register in C1
+/* Return true if C1 is a subset of C2, i.e., if every register in C1
is also in C2.  */
-int
+bool
 reg_class_subset_p (reg_class_t c1, reg_class_t c2)
 {
   return (c1 == c2
@@ -1145,8 +1145,8 @@ reg_class_subset_p (reg_class_t c1, reg_class_t c2)
   reg_class_contents[(int) c2]));
 }
 
-/* Return nonzero if there is a register that is in both C1 and C2.  */
-int
+/* Return true if there is a register that is in both C1 and C2.  */
+bool
 reg_classes_intersect_p (reg_class_t c1, reg_class_t c2)
 {
   return (c1 == c2
diff --git a/gcc/rtl.h b/gcc/rtl.h
index af9fb882bf2..3f0af780634 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4292,8 +4292,8 @@ extern HARD_REG_SET eliminable_regset;
 extern void mark_elimination (int, int);
 
 /* In reginfo.cc */
-extern int reg_classes_intersect_p (reg_class_t, reg_class_t);
-extern int reg_class_subset_p (reg_class_t, reg_class_t);
+extern bool reg_classes_intersect_p (reg_class_t, reg_class_t);
+extern bool reg_class_subset_p (reg_class_t, reg_class_t);
 extern void globalize_reg (tree, int);
 extern void init_reg_modes_target (void);
 extern void init_regs (void);


[COMMITTED] print-rtl: Change return type of two print functions from int to void

2023-06-05 Thread Uros Bizjak via Gcc-patches
Also change one internal variable to bool.

gcc/ChangeLog:

* rtl.h (print_rtl_single): Change return type from int to void.
(print_rtl_single_with_indent): Ditto.
* print-rtl.h (class rtx_writer): Ditto.  Change m_sawclose to bool.
* print-rtl.cc (rtx_writer::rtx_writer): Update for m_sawclose change.
(rtx_writer::print_rtx_operand_code_0): Ditto.
(rtx_writer::print_rtx_operand_codes_E_and_V): Ditto.
(rtx_writer::print_rtx_operand_code_i): Ditto.
(rtx_writer::print_rtx_operand_code_u): Ditto.
(rtx_writer::print_rtx_operand): Ditto.
(rtx_writer::print_rtx): Ditto.
(rtx_writer::finish_directive): Ditto.
(print_rtl_single): Change return type from int to void
and adjust function body accordingly.
(rtx_writer::print_rtl_single_with_indent): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc
index f67bf868778..c59d9896edb 100644
--- a/gcc/print-rtl.cc
+++ b/gcc/print-rtl.cc
@@ -85,7 +85,7 @@ int flag_dump_unnumbered_links = 0;
 
 rtx_writer::rtx_writer (FILE *outf, int ind, bool simple, bool compact,
rtx_reuse_manager *reuse_manager ATTRIBUTE_UNUSED)
-: m_outfile (outf), m_sawclose (0), m_indent (ind),
+: m_outfile (outf), m_indent (ind), m_sawclose (false),
   m_in_call_function_usage (false), m_simple (simple), m_compact (compact)
 #ifndef GENERATOR_FILE
   , m_rtx_reuse_manager (reuse_manager)
@@ -237,13 +237,13 @@ rtx_writer::print_rtx_operand_code_0 (const_rtx in_rtx 
ATTRIBUTE_UNUSED,
fprintf (m_outfile, " #");
  else
fprintf (m_outfile, " %d", NOTE_EH_HANDLER (in_rtx));
- m_sawclose = 1;
+ m_sawclose = true;
  break;
 
case NOTE_INSN_BLOCK_BEG:
case NOTE_INSN_BLOCK_END:
  dump_addr (m_outfile, " ", NOTE_BLOCK (in_rtx));
- m_sawclose = 1;
+ m_sawclose = true;
  break;
 
case NOTE_INSN_BASIC_BLOCK:
@@ -370,7 +370,7 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
 {
   fprintf (m_outfile, "\n%s%*s",
   print_rtx_head, m_indent * 2, "");
-  m_sawclose = 0;
+  m_sawclose = false;
 }
   if (GET_CODE (in_rtx) == CONST_VECTOR
   && !GET_MODE_NUNITS (GET_MODE (in_rtx)).is_constant ()
@@ -381,7 +381,7 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
 {
   m_indent += 2;
   if (XVECLEN (in_rtx, idx))
-   m_sawclose = 1;
+   m_sawclose = true;
 
   int barrier = XVECLEN (in_rtx, idx);
   if (GET_CODE (in_rtx) == CONST_VECTOR
@@ -431,7 +431,7 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
 fprintf (m_outfile, "\n%s%*s", print_rtx_head, m_indent * 2, "");
 
   fputs ("]", m_outfile);
-  m_sawclose = 1;
+  m_sawclose = true;
   m_indent -= 2;
 }
 
@@ -510,7 +510,7 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, int 
idx)
   /* Don't print INSN_CODEs in compact mode.  */
   if (m_compact && is_insn && &INSN_CODE (in_rtx) == &XINT (in_rtx, idx))
{
- m_sawclose = 0;
+ m_sawclose = false;
  return;
}
 
@@ -524,7 +524,7 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, int 
idx)
  && XINT (in_rtx, idx) >= 0
  && (name = get_insn_name (XINT (in_rtx, idx))) != NULL)
fprintf (m_outfile, " {%s}", name);
-  m_sawclose = 0;
+  m_sawclose = false;
 }
 }
 
@@ -619,7 +619,7 @@ rtx_writer::print_rtx_operand_code_u (const_rtx in_rtx, int 
idx)
fprintf (m_outfile, " [# deleted]");
  else
fprintf (m_outfile, " [%d deleted]", INSN_UID (sub));
- m_sawclose = 0;
+ m_sawclose = false;
  return;
}
 
@@ -640,7 +640,7 @@ rtx_writer::print_rtx_operand_code_u (const_rtx in_rtx, int 
idx)
 }
   else
 fputs (" 0", m_outfile);
-  m_sawclose = 0;
+  m_sawclose = false;
 }
 
 /* Subroutine of print_rtx.   Print operand IDX of IN_RTX.  */
@@ -667,7 +667,7 @@ rtx_writer::print_rtx_operand (const_rtx in_rtx, int idx)
fputs (" (nil)", m_outfile);
   else
fprintf (m_outfile, " (\"%s\")", str);
-  m_sawclose = 1;
+  m_sawclose = true;
   break;
 
 case '0':
@@ -709,7 +709,7 @@ rtx_writer::print_rtx_operand (const_rtx in_rtx, int idx)
 
 case 'n':
   fprintf (m_outfile, " %s", GET_NOTE_INSN_NAME (XINT (in_rtx, idx)));
-  m_sawclose = 0;
+  m_sawclose = false;
   break;
 
 case 'u':
@@ -729,7 +729,7 @@ rtx_writer::print_rtx_operand (const_rtx in_rtx, int idx)
 
 case '*':
   fputs (" Unknown", m_outfile);
-  m_sawclose = 0;
+  m_sawclose = false;
   break;
 
 case 'B':
@@ -797,20 +797,20 @@ rtx_writer::print_rtx (const_rtx in_rtx)
fputc (' ', m_outfile);
   else
fprintf (m_outfile, "\n%s%*s", print_rtx_head, m_indent * 2, "");
-  m_sawclose = 0;

Re: [patch] Fix PR101188 wrong code from postreload

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 09:06, Georg-Johann Lay wrote:



Am 03.06.23 um 17:53 schrieb Jeff Law:



On 6/2/23 02:46, Georg-Johann Lay wrote:

There is the following bug in postreload that can be traced back
to v5 at least:

In postreload.cc::reload_cse_move2add() there is a loop over all
insns.  If it encounters a SET, the next insn is analyzed if it
is a single_set.

After next has been analyzed, it continues with

   if (success)
 delete_insn (insn);
   changed |= success;
   insn = next; // This effectively skips analysis of next.
   move2add_record_mode (reg);
   reg_offset[regno]
 = trunc_int_for_mode (added_offset + base_offset,
   mode);
   continue; // for continues with insn = NEXT_INSN (insn).

So it records the effect of next, but not the clobbers that
next might have.  This is a problem if next clobbers a GPR
like it can happen for avr.  What then can happen is that in a
later round, it may use a value from a (partially) clobbered reg.

The patch records the effects of potential clobbers.

Bootstrapped and reg-tested on x86_64.  Also tested on avr where
the bug popped up.  The testcase discriminates on avr, and for now
I am not aware of any other target that's affected by the bug.

The change is not intrusive and fixes wrong code, so I'd like
to backport it.

Ok to apply?

Johann

rtl-optimization/101188: Don't bypass clobbers of some insns that are
optimized or are optimization candidates.

gcc/
 PR rtl-optimization/101188
 * postreload.cc (reload_cse_move2add): Record clobbers of next
 insn using move2add_note_store.

gcc/testsuite/
 PR rtl-optimization/101188
 * gcc.c-torture/execute/pr101188.c: New test.
If I understand the code correctly, isn't the core of the problem that 
we "continue" rather than executing the rest of the code in the loop. 
In particular the continue bypasses this chunk of code:



 for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
    {
  if (REG_NOTE_KIND (note) == REG_INC
  && REG_P (XEXP (note, 0)))
    {
  /* Reset the information about this register.  */
  int regno = REGNO (XEXP (note, 0));
  if (regno < FIRST_PSEUDO_REGISTER)
    {
  move2add_record_mode (XEXP (note, 0));
  reg_mode[regno] = VOIDmode;
    }
    }
    }

  /* There are no REG_INC notes for SP autoinc.  */
  subrtx_var_iterator::array_type array;
  FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (insn), NONCONST)
    {
  rtx mem = *iter;
  if (mem
  && MEM_P (mem)
  && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == 
RTX_AUTOINC)

    {
  if (XEXP (XEXP (mem, 0), 0) == stack_pointer_rtx)
    reg_mode[STACK_POINTER_REGNUM] = VOIDmode;
    }
    }

  note_stores (insn, move2add_note_store, insn);


Of particular importance for your case would be the note_stores call. 
But I could well see other targets needing the search for REG_INC 
notes as well as stack pushes.


If I'm right, then wouldn't it be better to factor that blob of code 
above into its own function, then use it before the "continue" rather 
than implementing a custom can for CLOBBERS?


It also begs the question if the other case immediately above the code 
I quoted needs similar adjustment.  It doesn't do the insn = next, but 
it does bypass the search for autoinc memory references and the 
note_stores call.



Jeff


So if I understand you correctly, this means that my patch is declined?
I wouldn't go that far.  I need to review your questions/comments in 
detail and decide if we want to fix the problem more generally or go 
with a more targeted fix.


jeff


[committed] d: Warn when declared size of a special enum does not match its intrinsic type.

2023-06-05 Thread Iain Buclaw via Gcc-patches
Hi,

All special enums have declarations in the D runtime library, but the
compiler will recognize and treat them specially if declared in any
module.  When the underlying base type of a special enum is a different
size to its matched intrinsic, then this can cause undefined behavior at
runtime.  Detect and warn about when such a mismatch occurs.

This was found when merging the D front-end with the v2.103.1 release,
splitting this out of the merge patch into its own standalone change.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline and backported to the releases/gcc-13 branch.

Regards,
Iain.

---
gcc/d/ChangeLog:

* gdc.texi (Warnings): Document -Wextra and -Wmismatched-special-enum.
* implement-d.texi (Special Enums): Add reference to warning option
-Wmismatched-special-enum.
* lang.opt: Add -Wextra and -Wmismatched-special-enum.
* types.cc (TypeVisitor::visit (TypeEnum *)): Warn when declared
special enum size mismatches its intrinsic type.

gcc/testsuite/ChangeLog:

* gdc.dg/Wmismatched_enum.d: New test.
---
 gcc/d/gdc.texi  | 17 +
 gcc/d/implement-d.texi  |  5 +
 gcc/d/lang.opt  |  8 
 gcc/d/types.cc  | 15 +++
 gcc/testsuite/gdc.dg/Wmismatched_enum.d |  4 
 5 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/Wmismatched_enum.d

diff --git a/gcc/d/gdc.texi b/gcc/d/gdc.texi
index 24b6ee00478..6f81967a83d 100644
--- a/gcc/d/gdc.texi
+++ b/gcc/d/gdc.texi
@@ -699,6 +699,23 @@ Do not warn about usage of deprecated features and symbols 
with
 @item -Werror
 Turns all warnings into errors.
 
+@opindex Wextra
+@opindex Wno-extra
+@item -Wextra
+This enables some extra warning flags that are not enabled by
+@option{-Wall}.
+
+@gccoptlist{-Waddress
+-Wcast-result
+-Wmismatched-special-enum
+-Wunknown-pragmas}
+
+@opindex Wmismatched-special-enum
+@opindex Wno-mismatched-special-enum
+@item -Wmismatched-special-enum
+Warn when an enum the compiler recognizes as special is declared with a
+different size to the built-in type it is representing.
+
 @opindex Wspeculative
 @opindex Wno-speculative
 @item -Wspeculative
diff --git a/gcc/d/implement-d.texi b/gcc/d/implement-d.texi
index 039e5fbd24e..6f33bc192fe 100644
--- a/gcc/d/implement-d.texi
+++ b/gcc/d/implement-d.texi
@@ -2085,6 +2085,11 @@ for convenience: @code{c_complex_double}, 
@code{c_complex_float},
 @code{c_complex_real}, @code{cpp_long}, @code{cpp_longlong},
 @code{c_long_double}, @code{cpp_ulong}, @code{cpp_ulonglong}.
 
+It may cause undefined behavior at runtime if a special enum is declared with a
+base type that has a different size to the target C/C++ type it is
+representing.  The GNU D compiler will catch such declarations and emit a
+warning when the @option{-Wmismatched-special-enum} option is seen on the
+command-line.
 
 @c 
 
diff --git a/gcc/d/lang.opt b/gcc/d/lang.opt
index bb0a3dcc911..26ca92c4c17 100644
--- a/gcc/d/lang.opt
+++ b/gcc/d/lang.opt
@@ -134,6 +134,14 @@ Werror
 D
 ; Documented in common.opt
 
+Wextra
+D Warning
+; Documented in common.opt
+
+Wmismatched-special-enum
+D Warning Var(warn_mismatched_special_enum) LangEnabledBy(D, Wextra)
+Warn when a special enum is declared with the wrong base type.
+
 Wpsabi
 D
 ; Documented in C
diff --git a/gcc/d/types.cc b/gcc/d/types.cc
index beaf2a61af9..a4c05bfb75f 100644
--- a/gcc/d/types.cc
+++ b/gcc/d/types.cc
@@ -1067,6 +1067,21 @@ public:
gcc_assert (underlying != NULL);
 
t->ctype = build_variant_type_copy (build_ctype (underlying));
+
+   /* When the size of the declared enum base type doesn't match the target
+  C type that this enum is being used as a placeholder for, we can't
+  use the generated underlying type as it'll conflict with all sizes
+  the front-end has computed during semantic.  */
+   if (TYPE_SIZE (t->ctype) != TYPE_SIZE (basetype))
+ {
+   warning_at (make_location_t (t->sym->loc),
+   OPT_Wmismatched_special_enum,
+   "size of %qs (%wd) differ from its declared size (%wd)",
+   t->sym->ident->toChars (), int_size_in_bytes (t->ctype),
+   int_size_in_bytes (basetype));
+   t->ctype = basetype;
+ }
+
build_type_decl (t->ctype, t->sym);
   }
 else if (t->sym->ident == NULL
diff --git a/gcc/testsuite/gdc.dg/Wmismatched_enum.d 
b/gcc/testsuite/gdc.dg/Wmismatched_enum.d
new file mode 100644
index 000..54f47988c2b
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/Wmismatched_enum.d
@@ -0,0 +1,4 @@
+// { dg-do compile }
+// { dg-options "-Wmismatched-special-enum" }
+
+enum __c_longlong : byte; // { dg-warning "differ from its declared size" }
-- 
2.39.2



Re: [RFA] Improve strcmp expansion when one input is a constant string.

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 00:29, Richard Biener wrote:



But then for example x86 has smaller encoding for byte ops and while
widening is easily done later, truncation is not.
Sadly, the x86 costing looks totally bogus here.  We actually emit the 
exact same code for a QI mode loads vs a zero-extending load from QI to 
SI.  But the costing is different and would tend to prefer QImode.  That 
in turn is going to force an extension at the end of the sequence which 
would be a regression relative to the current code.  Additionally we may 
get partial register stalls for the byte ops to implement the comparison 
steps.


The net result is that querying the backend's costs would do the exact 
opposite of what I think we want on x86.  One could argue the x86 
maintainers should improve this situation...




Note I would have expected to use the mode of the load so we truly
elide some extensions, using word_mode looks like just another
mode here?  The key to note is probably

   op0 = convert_modes (mode, unit_mode, op0, 1);
   op1 = convert_modes (mode, unit_mode, op1, 1);
   rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
   result, 1, OPTAB_WIDEN);

which uses OPTAB_WIDEN - wouldn't it be better to pass in the
unconverted modes and leave the decision which mode to use
to OPTAB_WIDEN?  Should we somehow query the target for
the smallest mode from unit_mode it can do both the MINUS
and the compare?
And avoiding OPTAB_WIDEN isn't going to help rv64 at all.  The core 
issue being that we do define 32bit ops.  With Jivan's patch those 32bit 
ops expose the sign extending nature.  So a 32bit add would look 
something like


(set (temp:DI) (sign_extend:DI (plus:SI (op:SI) (op:SI
(set (res:SI) (subreg:SI (temp:DI) 0)

Where we mark the subreg with SUBREG_PROMOTED_VAR_P.


I'm not sure the best way to proceed now.  I could just put this on the 
back-burner as it's RISC-V specific and the gains elsewhere dwarf this 
issue.



jeff


Re: [PATCH v1] RISC-V: Fix some typo in vector-iterators.md

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 09:07, Pan Li via Gcc-patches wrote:

From: Pan Li 

This patch would like to fix some typo in vector-iterators.md, aka:

[-"vnx1DI")-]{+"vnx1di")+}
[-"vnx2SI")-]{+"vnx2si")+}
[-"vnx1SI")-]{+"vnx1si")+}

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix typo in mode attr.

OK
jeff


Re: PING Re: [PATCH RFA (tree-eh)] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-06-05 Thread Jason Merrill via Gcc-patches

On 6/5/23 02:09, Richard Biener wrote:

On Fri, Jun 2, 2023 at 6:57 PM Jason Merrill via Gcc-patches
 wrote:


Since Jonathan approved the library change, I'm looking for middle-end
approval for the tree-eh change, even without advice on the potential
follow-up.

On 5/24/23 14:55, Jason Merrill wrote:

Middle-end folks: any thoughts about how best to make the change described in
the last paragraph below?

Library folks: any thoughts on the changes to __cxa_call_terminate?

-- 8< --

[except.handle]/7 says that when we enter std::terminate due to a throw,
that is considered an active handler.  We already implemented that properly
for the case of not finding a handler (__cxa_throw calls __cxa_begin_catch
before std::terminate) and the case of finding a callsite with no landing
pad (the personality function calls __cxa_call_terminate which calls
__cxa_begin_catch), but for the case of a throw in a try/catch in a noexcept
function, we were emitting a cleanup that calls std::terminate directly
without ever calling __cxa_begin_catch to handle the exception.

A straightforward way to fix this seems to be calling __cxa_call_terminate
instead.  However, that requires exporting it from libstdc++, which we have
not previously done.  Despite the name, it isn't actually part of the ABI
standard.  Nor is __cxa_call_unexpected, as far as I can tell, but that one
is also used by clang.  For this case they use __clang_call_terminate; it
seems reasonable to me for us to stick with __cxa_call_terminate.

I also change __cxa_call_terminate to take void* for simplicity in the front
end (and consistency with __cxa_call_unexpected) but that isn't necessary if
it's undesirable for some reason.

This patch does not fix the issue that representing the noexcept as a
cleanup is wrong, and confuses the handler search; since it looks like a
cleanup in the EH tables, the unwinder keeps looking until it finds the
catch in main(), which it should never have gotten to.  Without the
try/catch in main, the unwinder would reach the end of the stack and say no
handler was found.  The noexcept is a handler, and should be treated as one,
as it is when the landing pad is omitted.

The best fix for that issue seems to me to be to represent an
ERT_MUST_NOT_THROW after an ERT_TRY in an action list as though it were an
ERT_ALLOWED_EXCEPTIONS (since indeed it is an exception-specification).  The
actual code generation shouldn't need to change (apart from the change made
by this patch), only the action table entry.

   PR c++/97720

gcc/cp/ChangeLog:

   * cp-tree.h (enum cp_tree_index): Add CPTI_CALL_TERMINATE_FN.
   (call_terminate_fn): New macro.
   * cp-gimplify.cc (gimplify_must_not_throw_expr): Use it.
   * except.cc (init_exception_processing): Set it.
   (cp_protect_cleanup_actions): Return it.

gcc/ChangeLog:

   * tree-eh.cc (lower_resx): Pass the exception pointer to the
   failure_decl.
   * except.h: Tweak comment.

libstdc++-v3/ChangeLog:

   * libsupc++/eh_call.cc (__cxa_call_terminate): Take void*.
   * config/abi/pre/gnu.ver: Add it.

gcc/testsuite/ChangeLog:

   * g++.dg/eh/terminate2.C: New test.
---
   gcc/cp/cp-tree.h |  2 ++
   gcc/except.h |  2 +-
   gcc/cp/cp-gimplify.cc|  2 +-
   gcc/cp/except.cc |  5 -
   gcc/testsuite/g++.dg/eh/terminate2.C | 30 
   gcc/tree-eh.cc   | 16 ++-
   libstdc++-v3/libsupc++/eh_call.cc|  4 +++-
   libstdc++-v3/config/abi/pre/gnu.ver  |  7 +++
   8 files changed, 63 insertions(+), 5 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/eh/terminate2.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index a1b882f11fe..a8465a988b5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -217,6 +217,7 @@ enum cp_tree_index
  definitions.  */
   CPTI_ALIGN_TYPE,
   CPTI_TERMINATE_FN,
+CPTI_CALL_TERMINATE_FN,
   CPTI_CALL_UNEXPECTED_FN,

   /* These are lazily inited.  */
@@ -358,6 +359,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   /* Exception handling function declarations.  */
   #define terminate_fn
cp_global_trees[CPTI_TERMINATE_FN]
   #define call_unexpected_fn  cp_global_trees[CPTI_CALL_UNEXPECTED_FN]
+#define call_terminate_fncp_global_trees[CPTI_CALL_TERMINATE_FN]
   #define get_exception_ptr_fn
cp_global_trees[CPTI_GET_EXCEPTION_PTR_FN]
   #define begin_catch_fn  
cp_global_trees[CPTI_BEGIN_CATCH_FN]
   #define end_catch_fn
cp_global_trees[CPTI_END_CATCH_FN]
diff --git a/gcc/except.h b/gcc/except.h
index 5ecdbc0d1dc..378a9e4cb77 100644
--- a/gcc/except.h
+++ b/gcc/except.h
@@ -155,7 +155,7 @@ struct GTY(()) eh_region_d
   struct eh_region_u_must_not_throw {
 /* A function decl to be invoked if this region is actually reachable
from within

Re: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-05 Thread Richard Sandiford via Gcc-patches
Looks good!  Just some minor comments:

Tamar Christina  writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 6a435eb44610960513e9739ac9ac1e8a27182c10..1437ab55b260ab5c876e92d59ba39d24bffc6276
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -27,6 +27,7 @@ See the next chapter for information on the C header file.
>  from such an insn.
>  * Output Statement::For more generality, write C code to output
>  the assembler code.
> +* Compact Syntax::  Compact syntax for writing Machine descriptors.

s/Machine/machine/

>  * Predicates::  Controlling what kinds of operands can be used
>  for an insn.
>  * Constraints:: Fine-tuning operand selection.
> @@ -713,6 +714,213 @@ you can use @samp{*} inside of a @samp{@@} 
> multi-alternative template:
>  @end group
>  @end smallexample
>  
> +@node Compact Syntax
> +@section Compact Syntax
> +@cindex compact syntax
> +
> +In cases where the number of alternatives in a @code{define_insn} or
> +@code{define_insn_and_split} are large then it may be beneficial to use the
> +compact syntax when specifying alternatives.
> +
> +This syntax puts the constraints and attributes on the same horizontal line 
> as
> +the instruction assembly template.
> +
> +As an example
> +
> +@smallexample
> +@group
> +(define_insn_and_split ""
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r")
> + (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv"))]
> +  ""
> +  "@@
> +   mov\\t%w0, %w1
> +   mov\\t%w0, %w1
> +   mov\\t%w0, %w1
> +   mov\\t%w0, %1
> +   #
> +   * return aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]);"
> +  "&& true"
> +   [(const_int 0)]
> +  @{
> + aarch64_expand_mov_immediate (operands[0], operands[1]);
> + DONE;
> +  @}
> +  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm")
> +   (set_attr "arch"   "*,*,*,*,*,sve")
> +   (set_attr "length" "4,4,4,4,*,  4")
> +]
> +)
> +@end group
> +@end smallexample
> +
> +can be better expressed as:
> +
> +@smallexample
> +@group
> +(define_insn_and_split ""
> +  [(set (match_operand:SI 0 "nonimmediate_operand")
> + (match_operand:SI 1 "aarch64_mov_operand"))]
> +  ""
> +  @{@@ [cons: =0, 1; attrs: type, arch, length]
> + [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
> + [k , r  ; mov_reg  , *   , 4] ^
> + [r , k  ; mov_reg  , *   , 4] ^
> + [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
> + [r , n  ; mov_imm  , *   , *] #
> + [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate 
> ("cnt", "%x0", operands[1]);
> +  @}
> +  "&& true"
> +  [(const_int 0)]
> +  @{
> +aarch64_expand_mov_immediate (operands[0], operands[1]);
> +DONE;
> +  @}
> +)
> +@end group
> +@end smallexample
> +
> +The syntax rules are as follows:
> +@itemize @bullet
> +@item
> +Template must start with "@{@@" to use the new syntax.

s/Template/Templates/ or s/Template/The template/

@{@@ should be quoted using @samp{...} rather than "...".  Same for later
instances.

> +
> +@item
> +"@{@@" is followed by a layout in parentheses which is @samp{"cons:"} 
> followed by
> +a list of @code{match_operand}/@code{match_scratch} comma operand numbers, 
> then a
> +semicolon, followed by the same for attributes (@samp{"attrs:"}).  Operand

No "..." needed for cons: and attrs: (@samp is enough)

> +modifiers can be placed in this section group as well.  Both sections
> +are optional (so you can use only @samp{cons}, or only @samp{attrs}, or 
> both),
> +and @samp{cons} must come before @samp{attrs} if present.
> +
> +@item
> +Each alternative begins with any amount of whitespace.
> +
> +@item
> +Following the whitespace is a comma-separated list of @samp{constraints} 
> and/or
> +@samp{attributes} within brackets @code{[]}, with sections separated by a

I think "constraints" and "attributes" should be unquoted here, rather
than @samp.

> +semicolon.
> +
> +@item
> +Should you want to copy the previous asm line, the symbol @code{^} can be 
> used.
> +This allows less copy pasting between alternative and reduces the number of
> +lines to update on changes.
> +
> +@item
> +When using C functions for output, the idiom @code{* return ;} can 
> be

I think this should be @samp rather than @code, since it's quoting
a sample rather than a single entity (but I don't know texinfo well,
so could be wrong).

s//@var{function}/

> +replaced with the shorthand @code{<< ;}.
> +
> +@item
> +Following the closing ']' is any amount of whitespace, and then the actual 
> asm

@samp or @code here too

> +output.
> +
> +@item
> +Spaces are allowed in the list (they will simply be removed).
> +
> +@item
> +All alternatives should be specified: a blank list should be "[,,]", "[,,;,]"
> +etc., not "[]" or "".

@samp for these too.  I don't think @samp{} prints anything, so ""
probably needs to be described in words.

Maybe s/All alternatives/All constraint alternative

Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Takayuki 'January June' Suwa via Gcc-patches
On 2023/06/06 0:15, Max Filippov wrote:
> Hi Suwa-san,
Hi!  Thanks for your regtest every time.

> 
> On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
>  wrote:
>>
>> This patch optimizes the boolean evaluation of EQ/NE against zero
>> by adding two insn_and_split patterns similar to SImode conditional
>> store:
>>
>> "eq_zero":
>> op0 = (op1 == 0) ? 1 : 0;
>> op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
>>
>> "movsicc_ne0_reg_0":
>> op0 = (op1 != 0) ? op2 : 0;
>> op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
>>
>> /* example #1 */
>> int bool_eqSI(int x) {
>>   return x == 0;
>> }
>> int bool_neSI(int x) {
>>   return x != 0;
>> }
>>
>> ;; after (TARGET_NSA)
>> bool_eqSI:
>> nsaua2, a2
>> srlia2, a2, 5
>> ret.n
>> bool_neSI:
>> mov.n   a9, a2
>> movi.n  a2, 1
>> moveqz  a2, a9, a9
>> ret.n
>>
>> These also work in SFmode by ignoring their sign bits, and further-
>> more, the branch if EQ/NE against zero in SFmode is also done in the
>> same manner.
>>
>> The reasons for this optimization in SFmode are:
>>
>>   - Only zero values (negative or non-negative) contain no bits of 1
>> with both the exponent and the mantissa.
>>   - EQ/NE comparisons involving NaNs produce no signal even if they
>> are signaling.
>>   - Even if the use of IEEE 754 single-precision floating-point co-
>> processor is configured (TARGET_HARD_FLOAT is true):
>> 1. Load zero value to FP register
>> 2. Possibly, additional FP move if the comparison target is
>>an address register
>> 3. FP equality check instruction
>> 4. Read the boolean register containing the result, or condi-
>>tional branch
>> As noted above, a considerable number of instructions are still
>> generated.
>>
>> /* example #2 */
>> int bool_eqSF(float x) {
>>   return x == 0;
>> }
>> int bool_neSF(float x) {
>>   return x != 0;
>> }
>> int bool_ltSF(float x) {
>>   return x < 0;
>> }
>> extern void foo(void);
>> void cb_eqSF(float x) {
>>   if(x != 0)
>> foo();
>> }
>> void cb_neSF(float x) {
>>   if(x == 0)
>> foo();
>> }
>> void cb_geSF(float x) {
>>   if(x < 0)
>> foo();
>> }
>>
>> ;; after
>> ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
>> bool_eqSF:
>> add.n   a2, a2, a2
>> nsaua2, a2
>> srlia2, a2, 5
>> ret.n
>> bool_neSF:
>> add.n   a9, a2, a2
>> movi.n  a2, 1
>> moveqz  a2, a9, a9
>> ret.n
>> bool_ltSF:
>> movi.n  a9, 0
>> wfr f0, a2
>> wfr f1, a9
>> olt.s   b0, f0, f1
>> movi.n  a9, 0
>> movi.n  a2, 1
>> movfa2, a9, b0
>> ret.n
>> cb_eqSF:
>> add.n   a2, a2, a2
>> beqz.n  a2, .L6
>> j.l foo, a9
>> .L6:
>> ret.n
>> cb_neSF:
>> add.n   a2, a2, a2
>> bnez.n  a2, .L8
>> j.l foo, a9
>> .L8:
>> ret.n
>> cb_geSF:
>> addisp, sp, -16
>> movi.n  a3, 0
>> s32i.n  a12, sp, 8
>> s32i.n  a0, sp, 12
>> mov.n   a12, a2
>> call0   __unordsf2
>> bnez.n  a2, .L10
>> movi.n  a3, 0
>> mov.n   a2, a12
>> call0   __gesf2
>> bneia2, -1, .L10
>> l32i.n  a0, sp, 12
>> l32i.n  a12, sp, 8
>> addisp, sp, 16
>> j.l foo, a9
>> .L10:
>> l32i.n  a0, sp, 12
>> l32i.n  a12, sp, 8
>> addisp, sp, 16
>> ret.n
>>
>> gcc/ChangeLog:
>>
>> * config/xtensa/predicates.md (const_float_0_operand):
>> Rename from obsolete "const_float_1_operand" and change the
>> constant to compare.
>> (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
>> New.
>> * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
>> Add code for EQ/NE comparison with constant zero in SFmode.
>> (xtensa_expand_scc): Added code to derive boolean evaluation
>> of EQ/NE with constant zero for comparison in SFmode.
>> (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
>> zero inside "cbranchsf4" to 0.
>> * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
>> Change "match_operator" and the third "match_operand" to the
>> ones mentioned above.
>> (movsicc_ne0_reg_zero, eq_zero): New.
>> ---
>>  gcc/config/xtensa/predicates.md | 17 +--
>>  gcc/config/xtensa/xtensa.cc | 45 
>>  gcc/config/xtensa/xtensa.md | 53 +
>>  3 files changed, 106 insertions(+), 9 deletions(-)
> 
> This version performs much better than v1, but there's still new
> testsuit

[PATCH] c++: extend lookup_template_class shortcut [PR110122]

2023-06-05 Thread Patrick Palka via Gcc-patches
Here when substituting the injected class name A during regeneration of
the lambda, we find ourselves in lookup_template_class for A with
V=_ZTAXtl3BarEE (i.e. the template parameter object for Foo{}).  The
call to coerce_template_parms within then undesirably tries to make a copy
of this class NTTP argument, which fails because its type is not copyable.

Sidestepping the question of when precisely a class NTTP should be
copied (which is the subject of PR104577), it seems clear that this
testcase shouldn't require copyability of Foo.

lookup_template_class has a shortcut for looking up the current class
scope, which would avoid the problematic coerce_template_parms call, but
the shortcut doesn't trigger because it only considers the innermost
class scope which in this case in the lambda type.  So this patch fixes
this by extending the lookup_template_class shortcut to consider outer
class scopes too (and skipping over lambda types since they are never
instantiated from lookup_template_class IIUC).  We also need to avoid
calling coerce_template_parms when looking up a templated non-template
class for sake of the A::B example.  The call should be unnecessary
because the innermost arguments belong to the context and so should have
already been coerced.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  To fix the 13 regression in that PR, it should suffice to do a
minimal backport of this change that just replaces current_class_type
with current_nonlambda_class_type in the shortcut.

PR c++/110122
PR c++/104577

gcc/cp/ChangeLog:

* pt.cc (lookup_template_class): Extend shortcut for looking
up the current class scope to consider outer class scopes too,
and use current_nonlambda_class_type instead of current_class_type.
Only call coerce_template_parms when looking up a true class
template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class57.C: New test.
---
 gcc/cp/pt.cc | 19 +++
 gcc/testsuite/g++.dg/cpp2a/nontype-class57.C | 25 
 2 files changed, 40 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class57.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 244b0b03454..29ad9ba4072 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9928,16 +9928,27 @@ lookup_template_class (tree d1, tree arglist, tree 
in_decl, tree context,
 template.  */
 
   /* Shortcut looking up the current class scope again.  */
-  if (current_class_type)
-   if (tree ti = CLASSTYPE_TEMPLATE_INFO (current_class_type))
+  for (tree scope = current_nonlambda_class_type ();
+  scope != NULL_TREE;
+  scope = TYPE_P (scope) ? TYPE_CONTEXT (scope) : DECL_CONTEXT (scope))
+   {
+ if (!CLASS_TYPE_P (scope))
+   continue;
+
+ tree ti = CLASSTYPE_TEMPLATE_INFO (scope);
+ if (!ti || TMPL_ARGS_DEPTH (TI_ARGS (ti)) < arg_depth)
+   break;
+
  if (gen_tmpl == most_general_template (TI_TEMPLATE (ti))
  && comp_template_args (arglist, TI_ARGS (ti)))
-   return current_class_type;
+   return scope;
+   }
 
   /* Calculate the BOUND_ARGS.  These will be the args that are
 actually tsubst'd into the definition to create the
 instantiation.  */
-  arglist = coerce_template_parms (parmlist, arglist, gen_tmpl, complain);
+  if (PRIMARY_TEMPLATE_P (gen_tmpl))
+   arglist = coerce_template_parms (parmlist, arglist, gen_tmpl, complain);
 
   if (arglist == error_mark_node)
/* We were unable to bind the arguments.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class57.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class57.C
new file mode 100644
index 000..88ebdc1b3ff
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class57.C
@@ -0,0 +1,25 @@
+// PR c++/110122
+// { dg-do compile { target c++20 } }
+
+struct Foo {
+  constexpr Foo() = default;
+  Foo(const Foo&) = delete;
+};
+
+template
+struct A {
+  A() {
+[] { A a; }();
+[this] { this; }();
+  }
+
+  struct B {
+B() {
+  [] { A a; }();
+  [this] { this; }();
+}
+  };
+};
+
+A a;
+A::B b;
-- 
2.41.0.rc1.10.g9e49351c30



Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Max Filippov via Gcc-patches
On Mon, Jun 5, 2023 at 8:15 AM Max Filippov  wrote:
>
> Hi Suwa-san,
>
> On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
>  wrote:
> >
> > This patch optimizes the boolean evaluation of EQ/NE against zero
> > by adding two insn_and_split patterns similar to SImode conditional
> > store:
> >
> > "eq_zero":
> > op0 = (op1 == 0) ? 1 : 0;
> > op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
> >
> > "movsicc_ne0_reg_0":
> > op0 = (op1 != 0) ? op2 : 0;
> > op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
> >
> > /* example #1 */
> > int bool_eqSI(int x) {
> >   return x == 0;
> > }
> > int bool_neSI(int x) {
> >   return x != 0;
> > }
> >
> > ;; after (TARGET_NSA)
> > bool_eqSI:
> > nsaua2, a2
> > srlia2, a2, 5
> > ret.n
> > bool_neSI:
> > mov.n   a9, a2
> > movi.n  a2, 1
> > moveqz  a2, a9, a9
> > ret.n
> >
> > These also work in SFmode by ignoring their sign bits, and further-
> > more, the branch if EQ/NE against zero in SFmode is also done in the
> > same manner.
> >
> > The reasons for this optimization in SFmode are:
> >
> >   - Only zero values (negative or non-negative) contain no bits of 1
> > with both the exponent and the mantissa.
> >   - EQ/NE comparisons involving NaNs produce no signal even if they
> > are signaling.
> >   - Even if the use of IEEE 754 single-precision floating-point co-
> > processor is configured (TARGET_HARD_FLOAT is true):
> > 1. Load zero value to FP register
> > 2. Possibly, additional FP move if the comparison target is
> >an address register
> > 3. FP equality check instruction
> > 4. Read the boolean register containing the result, or condi-
> >tional branch
> > As noted above, a considerable number of instructions are still
> > generated.
> >
> > /* example #2 */
> > int bool_eqSF(float x) {
> >   return x == 0;
> > }
> > int bool_neSF(float x) {
> >   return x != 0;
> > }
> > int bool_ltSF(float x) {
> >   return x < 0;
> > }
> > extern void foo(void);
> > void cb_eqSF(float x) {
> >   if(x != 0)
> > foo();
> > }
> > void cb_neSF(float x) {
> >   if(x == 0)
> > foo();
> > }
> > void cb_geSF(float x) {
> >   if(x < 0)
> > foo();
> > }
> >
> > ;; after
> > ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> > bool_eqSF:
> > add.n   a2, a2, a2
> > nsaua2, a2
> > srlia2, a2, 5
> > ret.n
> > bool_neSF:
> > add.n   a9, a2, a2
> > movi.n  a2, 1
> > moveqz  a2, a9, a9
> > ret.n
> > bool_ltSF:
> > movi.n  a9, 0
> > wfr f0, a2
> > wfr f1, a9
> > olt.s   b0, f0, f1
> > movi.n  a9, 0
> > movi.n  a2, 1
> > movfa2, a9, b0
> > ret.n
> > cb_eqSF:
> > add.n   a2, a2, a2
> > beqz.n  a2, .L6
> > j.l foo, a9
> > .L6:
> > ret.n
> > cb_neSF:
> > add.n   a2, a2, a2
> > bnez.n  a2, .L8
> > j.l foo, a9
> > .L8:
> > ret.n
> > cb_geSF:
> > addisp, sp, -16
> > movi.n  a3, 0
> > s32i.n  a12, sp, 8
> > s32i.n  a0, sp, 12
> > mov.n   a12, a2
> > call0   __unordsf2
> > bnez.n  a2, .L10
> > movi.n  a3, 0
> > mov.n   a2, a12
> > call0   __gesf2
> > bneia2, -1, .L10
> > l32i.n  a0, sp, 12
> > l32i.n  a12, sp, 8
> > addisp, sp, 16
> > j.l foo, a9
> > .L10:
> > l32i.n  a0, sp, 12
> > l32i.n  a12, sp, 8
> > addisp, sp, 16
> > ret.n
> >
> > gcc/ChangeLog:
> >
> > * config/xtensa/predicates.md (const_float_0_operand):
> > Rename from obsolete "const_float_1_operand" and change the
> > constant to compare.
> > (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> > New.
> > * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> > Add code for EQ/NE comparison with constant zero in SFmode.
> > (xtensa_expand_scc): Added code to derive boolean evaluation
> > of EQ/NE with constant zero for comparison in SFmode.
> > (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> > zero inside "cbranchsf4" to 0.
> > * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> > Change "match_operator" and the third "match_operand" to the
> > ones mentioned above.
> > (movsicc_ne0_reg_zero, eq_zero): New.
> > ---
> >  gcc/config/xtensa/predicates.md | 17 +--
> >  gcc/config/xtensa/xtensa.cc | 45 
> >  gcc/config/xtensa/xtensa.md | 53 +
> >  3 file

Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-05 22:49
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFH Reduction floating-point.
Aka SEW=16 for below instructions:
 
vfredosum vfredusum
vfredmax vfredmin
vfwredosum vfwredusum
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.
---
.../riscv/riscv-vector-builtins-types.def |  7 +++
gcc/config/riscv/vector-iterators.md  | 12 
.../riscv/rvv/base/zvfh-intrinsic.c   | 58 ++-
3 files changed, 75 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 1e2491de6d6..bd3deae8340 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -634,6 +634,13 @@ DEF_RVV_WU_OPS (vuint32m2_t, 0)
DEF_RVV_WU_OPS (vuint32m4_t, 0)
DEF_RVV_WU_OPS (vuint32m8_t, 0)
+DEF_RVV_WF_OPS (vfloat16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WF_OPS (vfloat16mf2_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m1_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m2_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m4_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m8_t, TARGET_ZVFH)
+
DEF_RVV_WF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_WF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_WF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e4f2ba90799..c338e3c9003 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
])
(define_mode_iterator VWF [
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_MIN_VLEN < 128") VNx2SF VNx4SF VNx8SF (VNx16SF 
"TARGET_MIN_VLEN > 32") (VNx32SF "TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VWF_ZVE64 [
+  VNx1HF VNx2HF VNx4HF VNx8HF VNx16HF VNx32HF
   VNx1SF VNx2SF VNx4SF VNx8SF VNx16SF
])
@@ -1322,6 +1330,7 @@ (define_mode_attr VWLMUL1 [
   (VNx8HI "VNx4SI") (VNx16HI "VNx4SI") (VNx32HI "VNx4SI") (VNx64HI "VNx4SI")
   (VNx1SI "VNx2DI") (VNx2SI "VNx2DI") (VNx4SI "VNx2DI")
   (VNx8SI "VNx2DI") (VNx16SI "VNx2DI") (VNx32SI "VNx2DI")
+  (VNx1HF "VNx4SF") (VNx2HF "VNx4SF") (VNx4HF "VNx4SF") (VNx8HF "VNx4SF") 
(VNx16HF "VNx4SF") (VNx32HF "VNx4SF") (VNx64HF "VNx4SF")
   (VNx1SF "VNx2DF") (VNx2SF "VNx2DF")
   (VNx4SF "VNx2DF") (VNx8SF "VNx2DF") (VNx16SF "VNx2DF") (VNx32SF "VNx2DF")
])
@@ -1333,6 +1342,7 @@ (define_mode_attr VWLMUL1_ZVE64 [
   (VNx8HI "VNx2SI") (VNx16HI "VNx2SI") (VNx32HI "VNx2SI")
   (VNx1SI "VNx1DI") (VNx2SI "VNx1DI") (VNx4SI "VNx1DI")
   (VNx8SI "VNx1DI") (VNx16SI "VNx1DI")
+  (VNx1HF "VNx2SF") (VNx2HF "VNx2SF") (VNx4HF "VNx2SF") (VNx8HF "VNx2SF") 
(VNx16HF "VNx2SF") (VNx32HF "VNx2SF")
   (VNx1SF "VNx1DF") (VNx2SF "VNx1DF")
   (VNx4SF "VNx1DF") (VNx8SF "VNx1DF") (VNx16SF "VNx1DF")
])
@@ -1393,6 +1403,7 @@ (define_mode_attr vwlmul1 [
   (VNx8HI "vnx4si") (VNx16HI "vnx4si") (VNx32HI "vnx4si") (VNx64HI "vnx4si")
   (VNx1SI "vnx2di") (VNx2SI "vnx2di") (VNx4SI "vnx2di")
   (VNx8SI "vnx2di") (VNx16SI "vnx2di") (VNx32SI "vnx2di")
+  (VNx1HF "vnx4sf") (VNx2HF "vnx4sf") (VNx4HF "vnx4sf") (VNx8HF "vnx4sf") 
(VNx16HF "vnx4sf") (VNx32HF "vnx4sf") (VNx64HF "vnx4sf")
   (VNx1SF "vnx2df") (VNx2SF "vnx2df")
   (VNx4SF "vnx2df") (VNx8SF "vnx2df") (VNx16SF "vnx2df") (VNx32SF "vnx2df")
])
@@ -1404,6 +1415,7 @@ (define_mode_attr vwlmul1_zve64 [
   (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2SI")
   (VNx1SI "vnx1di") (VNx2SI "vnx1di") (VNx4SI "vnx1di")
   (VNx8SI "vnx1di") (VNx16SI "vnx1di")
+  (VNx1HF "vnx2sf") (VNx2HF "vnx2sf") (VNx4HF "vnx2sf") (VNx8HF "vnx2sf") 
(VNx16HF "vnx2sf") (VNx32HF "vnx2sf")
   (VNx1SF "vnx1df") (VNx2SF "vnx1df")
   (VNx4SF "vnx1df") (VNx8SF "

RE: [PATCH v1] RISC-V: Fix some typo in vector-iterators.md

2023-06-05 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, June 6, 2023 3:01 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Fix some typo in vector-iterators.md



On 6/5/23 09:07, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> This patch would like to fix some typo in vector-iterators.md, aka:
> 
> [-"vnx1DI")-]{+"vnx1di")+}
> [-"vnx2SI")-]{+"vnx2si")+}
> [-"vnx1SI")-]{+"vnx1si")+}
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/vector-iterators.md: Fix typo in mode attr.
OK
jeff


RE: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md.

2023-06-05 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, June 5, 2023 4:39 PM
To: juzhe.zh...@rivai.ai
Cc: Li Xu ; gcc-patches ; 
palmer 
Subject: Re: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in 
vector-iterators.md.

LGTM

On Mon, Jun 5, 2023 at 4:27 PM juzhe.zh...@rivai.ai  
wrote:
>
> Thanks for catching this.
> LGTM.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Li Xu
> Date: 2023-06-05 16:18
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; Li Xu
> Subject: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in 
> vector-iterators.md.
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Fix 'REQUIREMENT' for 
> machine_mode 'MODE'.
> * config/riscv/vector.md 
> (@pred_indexed_store): change 
> VNX16_QHSI to VNX16_QHSDI.
> (@pred_indexed_store): Ditto.
> ---
> gcc/config/riscv/vector-iterators.md | 26 +-
> gcc/config/riscv/vector.md   |  6 +++---
> 2 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 90743ed76c5..42cbbb49894 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -148,7 +148,7 @@
> ])
> (define_mode_iterator VEEWEXT8 [
> -  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64")
> +  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> + "TARGET_VECTOR_ELEN_64")
>(VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
> "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
>(VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
>(VNx2DF "TARGET_VECTOR_ELEN_FP_64") @@ -188,7 +188,7 @@
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx8SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> -  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
> +  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
>(VNx2DF "TARGET_VECTOR_ELEN_FP_64")
>(VNx4DF "TARGET_VECTOR_ELEN_FP_64")
>(VNx8DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128") @@ 
> -199,7 +199,7 @@
>(VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI 
> "TARGET_MIN_VLEN >= 128")
>(VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI (VNx8SI "TARGET_MIN_VLEN >= 
> 128")
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
> -  (VNx4DI "TARGET_VECTOR_ELEN_64")
> +  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
>(VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>(VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32") @@ -213,11 +213,11 @@
>(VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI (VNx16QI 
> "TARGET_MIN_VLEN >= 128")
>(VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI (VNx8HI "TARGET_MIN_VLEN >= 
> 128")
>(VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI (VNx4SI "TARGET_MIN_VLEN >= 
> 128")
> -  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64 && 
> TARGET_MIN_VLEN >= 128")
> +  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> + "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
>(VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>(VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> -  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
> +  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
>(VNx2DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
> ])
> @@ -400,26 +400,26 @@
> (define_mode_iterator VNX1_QHSDI [
>(VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") 
> (VNx1SI "TARGET_MIN_VLEN < 128")
> -  (VNx1DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 
> + 128")
> ])
> (define_mode_iterator VNX2_QHSDI [
>VNx2QI VNx2HI VNx2SI
> -  (VNx2DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
> ])
> (define_mode_iterator VNX4_QHSDI [
>VNx4QI VNx4HI VNx4SI
> -  (VNx4DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
> ])
> (define_mode_iterator VNX8_QHSDI [
>VNx8QI VNx8HI VNx8SI
> -  (VNx8DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
> +  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
> ])
> -(define_mode_iterator VNX16_QHSI [
> -  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI 
> "TARGET_MIN_VLEN >= 128")
> +(define_mode_iterator VNX16_QHSDI [
> +  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI 
> +"TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
> ])
> (define_mode_iterator VNX32_QHSI [
> @@ -435,7 +435,7 @@
>(VNx2HI "TARGET_MIN_VLEN == 32") VNx4HI VNx8HI VNx16HI (VNx32HI 
> "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
>(VNx1SI "TARGET_MIN_VLE

Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread Kito Cheng via Gcc-patches
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index e4f2ba90799..c338e3c9003 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> ])
> (define_mode_iterator VWF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")

I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
so it means zvfhmin also enabled reduction?

and also has the same concern for V and VF in the last patch[1] too.

[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/

Give a more practical example to explain my concern:

We've using V and VF iterators in autovec.md, and zvfhmin will set
MASK_VECTOR_ELEN_FP_16
which means zvfhmin WILL enable most autovec patterns with fp16,
that should not what we expected to do I think?


Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
Oh. YES. Thanks for catching this.
VF will be used in autovec for example: vfadd.
When specify zfhmin, the vfadd autovec will be enabled unexpectedly.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-06 09:32
To: juzhe.zh...@rivai.ai
CC: pan2.li; gcc-patches; Kito.cheng; yanzhang.wang
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index e4f2ba90799..c338e3c9003 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> ])
> (define_mode_iterator VWF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
so it means zvfhmin also enabled reduction?
 
and also has the same concern for V and VF in the last patch[1] too.
 
[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/
 
Give a more practical example to explain my concern:
 
We've using V and VF iterators in autovec.md, and zvfhmin will set
MASK_VECTOR_ELEN_FP_16
which means zvfhmin WILL enable most autovec patterns with fp16,
that should not what we expected to do I think?
 


RE: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread Li, Pan2 via Gcc-patches
I see. I restricted the ZVFH/ZVFHMIN from the riscv-vector-builtins-types.def 
for ops definition but lack the consideration of autovec part.



Do you prefer leave this PATCH as is and fix this issue in another PATCH 
entirely OR

update this PATCH V2 for predictor and send another PATCH for the previous one?



Both works for me.



Pan

From: juzhe.zh...@rivai.ai 
Sent: Tuesday, June 6, 2023 9:39 AM
To: kito.cheng 
Cc: Li, Pan2 ; gcc-patches ; 
Kito.cheng ; Wang, Yanzhang 
Subject: Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction 
floating-point intrinsic API

Oh. YES. Thanks for catching this.
VF will be used in autovec for example: vfadd.
When specify zfhmin, the vfadd autovec will be enabled unexpectedly.


juzhe.zh...@rivai.ai

From: Kito Cheng
Date: 2023-06-06 09:32
To: juzhe.zh...@rivai.ai
CC: pan2.li; 
gcc-patches; 
Kito.cheng; 
yanzhang.wang
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index e4f2ba90799..c338e3c9003 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> ])
> (define_mode_iterator VWF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")

I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
so it means zvfhmin also enabled reduction?

and also has the same concern for V and VF in the last patch[1] too.

[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/

Give a more practical example to explain my concern:

We've using V and VF iterators in autovec.md, and zvfhmin will set
MASK_VECTOR_ELEN_FP_16
which means zvfhmin WILL enable most autovec patterns with fp16,
that should not what we expected to do I think?



Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
I think we should split instructions pattern which belongs to ZVFHMIN.
And add ZVFH gating into all original iterator for example: VF VWFetc.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-06 09:32
To: juzhe.zh...@rivai.ai
CC: pan2.li; gcc-patches; Kito.cheng; yanzhang.wang
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index e4f2ba90799..c338e3c9003 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> ])
> (define_mode_iterator VWF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
so it means zvfhmin also enabled reduction?
 
and also has the same concern for V and VF in the last patch[1] too.
 
[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/
 
Give a more practical example to explain my concern:
 
We've using V and VF iterators in autovec.md, and zvfhmin will set
MASK_VECTOR_ELEN_FP_16
which means zvfhmin WILL enable most autovec patterns with fp16,
that should not what we expected to do I think?
 


Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread Kito Cheng via Gcc-patches
OK for landing this patch first, and fix by follow up patches.

On Tue, Jun 6, 2023 at 9:41 AM juzhe.zh...@rivai.ai
 wrote:
>
> I think we should split instructions pattern which belongs to ZVFHMIN.
> And add ZVFH gating into all original iterator for example: VF VWFetc.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-06-06 09:32
> To: juzhe.zh...@rivai.ai
> CC: pan2.li; gcc-patches; Kito.cheng; yanzhang.wang
> Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction 
> floating-point intrinsic API
> > diff --git a/gcc/config/riscv/vector-iterators.md 
> > b/gcc/config/riscv/vector-iterators.md
> > index e4f2ba90799..c338e3c9003 100644
> > --- a/gcc/config/riscv/vector-iterators.md
> > +++ b/gcc/config/riscv/vector-iterators.md
> > @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> > ])
> > (define_mode_iterator VWF [
> > +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> > +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> > +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> > +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> > +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> > +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> > +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
>
> I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
> zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
> so it means zvfhmin also enabled reduction?
>
> and also has the same concern for V and VF in the last patch[1] too.
>
> [1] 
> https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/
>
> Give a more practical example to explain my concern:
>
> We've using V and VF iterators in autovec.md, and zvfhmin will set
> MASK_VECTOR_ELEN_FP_16
> which means zvfhmin WILL enable most autovec patterns with fp16,
> that should not what we expected to do I think?
>


[PATCH] [RISC-V] correct machine mode in save-restore cfi RTL.

2023-06-05 Thread Fei Gao
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): use Pmode 
for cfi reg/mem machmode
(riscv_adjust_libcall_cfi_epilogue): use Pmode for cfi reg machmode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode for 
cfi reg/mem.
---
 gcc/config/riscv/riscv.cc|  6 +++---
 .../gcc.target/riscv/save-restore-cfi-2.c| 16 
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index caa7858b864..9eafd281260 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5370,8 +5370,8 @@ riscv_adjust_libcall_cfi_prologue ()
else
  offset = saved_size - ((regno - S2_REGNUM + 4) * UNITS_PER_WORD);
 
-   reg = gen_rtx_REG (SImode, regno);
-   mem = gen_frame_mem (SImode, plus_constant (Pmode,
+   reg = gen_rtx_REG (Pmode, regno);
+   mem = gen_frame_mem (Pmode, plus_constant (Pmode,
stack_pointer_rtx,
offset));
 
@@ -5510,7 +5510,7 @@ riscv_adjust_libcall_cfi_epilogue ()
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
   {
-   reg = gen_rtx_REG (SImode, regno);
+   reg = gen_rtx_REG (Pmode, regno);
dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
   }
 
diff --git a/gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c 
b/gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c
new file mode 100644
index 000..44d805b4de8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-pro_and_epilogue -O2 -march=rv64gc -mabi=lp64d 
-msave-restore -mcmodel=medany" } */
+/* { dg-skip-if "" { *-*-* } {"-Os" "-O1" "-O0" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { scan-rtl-dump {expr_list:REG_CFA_OFFSET \(set \(mem/c:DI} 
"pro_and_epilogue" } } */
+/* { dg-final { scan-rtl-dump {expr_list:REG_CFA_RESTORE \(reg:DI 8 s0\)} 
"pro_and_epilogue" } } */
+
+char my_getchar();
+float getf();
+
+int foo()
+{
+  int s0 = my_getchar();
+  float f0 = getf();
+  int b = my_getchar();
+  return f0 + s0 + b;
+}
-- 
2.17.1



Re: [PATCH] [RISC-V] correct machine mode in save-restore cfi RTL.

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 19:57, Fei Gao wrote:

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): use Pmode 
for cfi reg/mem machmode
 (riscv_adjust_libcall_cfi_epilogue): use Pmode for cfi reg machmode

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode 
for cfi reg/mem.
I rewrapped the ChangeLog to 80 columns and reindented the arguments to 
the gen_frame_mem call to match our formatting guidelines.


Pushed to the trunk.

Thanks,
jeff


RE: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe, will fix the issue we discussed soon.

Pan

-Original Message-
From: Kito Cheng  
Sent: Tuesday, June 6, 2023 9:48 AM
To: juzhe.zh...@rivai.ai
Cc: kito.cheng ; Li, Pan2 ; 
gcc-patches ; Wang, Yanzhang 
Subject: Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction 
floating-point intrinsic API

OK for landing this patch first, and fix by follow up patches.

On Tue, Jun 6, 2023 at 9:41 AM juzhe.zh...@rivai.ai  
wrote:
>
> I think we should split instructions pattern which belongs to ZVFHMIN.
> And add ZVFH gating into all original iterator for example: VF VWFetc.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-06-06 09:32
> To: juzhe.zh...@rivai.ai
> CC: pan2.li; gcc-patches; Kito.cheng; yanzhang.wang
> Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction 
> floating-point intrinsic API
> > diff --git a/gcc/config/riscv/vector-iterators.md 
> > b/gcc/config/riscv/vector-iterators.md
> > index e4f2ba90799..c338e3c9003 100644
> > --- a/gcc/config/riscv/vector-iterators.md
> > +++ b/gcc/config/riscv/vector-iterators.md
> > @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> > ])
> > (define_mode_iterator VWF [
> > +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")  
> > + (VNx2HF "TARGET_VECTOR_ELEN_FP_16")  (VNx4HF 
> > + "TARGET_VECTOR_ELEN_FP_16")  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")  
> > + (VNx16HF "TARGET_VECTOR_ELEN_FP_16")  (VNx32HF 
> > + "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")  (VNx64HF 
> > + "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
>
> I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor 
> here, zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag, so it means 
> zvfhmin also enabled reduction?
>
> and also has the same concern for V and VF in the last patch[1] too.
>
> [1] 
> https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707
> 158-1-pan2...@intel.com/
>
> Give a more practical example to explain my concern:
>
> We've using V and VF iterators in autovec.md, and zvfhmin will set
> MASK_VECTOR_ELEN_FP_16
> which means zvfhmin WILL enable most autovec patterns with fp16, that 
> should not what we expected to do I think?
>


Re: [PATCH] [RISC-V] add TC for save-restore cfi directives.

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 00:07, Fei Gao wrote:

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/save-restore-cfi.c: New test to check save-restore 
cfi directives.

Wrapped the ChangeLog to 80 columns and pushed to the trunk.

Thanks,
Jeff


[committed] bootstrap rtl-checking: Fix XVEC vs XVECEXP in postreload.cc

2023-06-05 Thread Hans-Peter Nilsson via Gcc-patches
Oops.  Sorry.  Committed as obvious.  A bootstrap
--enable-checking=yes,extra,rtl (same as the reporter, but
not the default) with the patch completed, where a bootstrap
without it failed.

-- >8 --
PR bootstrap/110120
* postreload.cc (reload_cse_move2add, move2add_use_add2_insn): Use
XVECEXP, not XEXP, to access first item of a PARALLEL.
---
 gcc/postreload.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/postreload.cc b/gcc/postreload.cc
index b479d4b921ba..20e138b4fa8b 100644
--- a/gcc/postreload.cc
+++ b/gcc/postreload.cc
@@ -1801,7 +1801,7 @@ move2add_use_add2_insn (scalar_int_mode mode, rtx reg, 
rtx sym, rtx off,
 naked SET, or else its single_set is the first element
 in a PARALLEL.  */
  rtx *setloc = GET_CODE (PATTERN (insn)) == PARALLEL
-   ? &XEXP (PATTERN (insn), 0) : &PATTERN (insn);
+   ? &XVECEXP (PATTERN (insn), 0, 0) : &PATTERN (insn);
  if (*setloc == set && costs_lt_p (&newcst, &oldcst, speed))
{
  changed = validate_change (insn, setloc, new_set, 0);
@@ -2027,7 +2027,7 @@ reload_cse_move2add (rtx_insn *first)
  costs_add_n_insns (&oldcst, 1);
 
  rtx *setloc = GET_CODE (PATTERN (next)) == PARALLEL
-   ? &XEXP (PATTERN (next), 0) : &PATTERN (next);
+   ? &XVECEXP (PATTERN (next), 0, 0) : &PATTERN (next);
  if (*setloc == set
  && costs_lt_p (&newcst, &oldcst, speed)
  && have_add2_insn (reg, new_src))
-- 
2.30.2



RE: [EXTERNAL] [PATCH] Update perf auto profile script

2023-06-05 Thread Eugene Rozenfeld via Gcc-patches
Ok for trunk. Thank you for updating this!

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Andi Kleen via Gcc-patches
Sent: Tuesday, May 30, 2023 4:08 AM
To: gcc-patches@gcc.gnu.org
Cc: Andi Kleen 
Subject: [EXTERNAL] [PATCH] Update perf auto profile script

- Fix gen_autofdo_event: The download URL for the Intel Perfmon Event
  list has changed, as well as the JSON format.
  Also it now uses pattern matching to match CPUs. Update the script to support 
all of this.
- Regenerate gcc-auto-profile with the latest published Intel model
  numbers, so it works with recent systems.
- So far it's still broken on hybrid systems
---
 contrib/gen_autofdo_event.py | 7 ---
 gcc/config/i386/gcc-auto-profile | 9 -
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 
ac23b83888db..533c706c090b 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -32,8 +32,9 @@ import json
 import argparse
 import collections
 import os
+import fnmatch

-baseurl = "https://download.01.org/perfmon";
+baseurl = "https://raw.githubusercontent.com/intel/perfmon/main";

 target_events = ('BR_INST_RETIRED.NEAR_TAKEN',
  'BR_INST_EXEC.TAKEN',
@@ -74,7 +75,7 @@ def get_cpustr():
 def find_event(eventurl, model):
 print("Downloading", eventurl, file = sys.stderr)
 u = urllib.request.urlopen(eventurl)
-events = json.loads(u.read())
+events = json.loads(u.read())["Events"]
 u.close()

 found = 0
@@ -102,7 +103,7 @@ found = 0
 cpufound = 0
 for j in u:
 n = j.rstrip().decode().split(',')
-if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
+if len(n) >= 4 and (args.all or fnmatch.fnmatch(cpu, n[0])) and n[3] == 
"core":
 components = n[0].split("-")
 model = components[2]
 model = int(model, 16)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 5ab224b041b9..04f7d35dcc51 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -43,8 +43,10 @@ model*:\ 47|\
 model*:\ 37|\
 model*:\ 44) E="cpu/event=0x88,umask=0x40/$FLAGS" ;;  model*:\ 55|\
+model*:\ 74|\
 model*:\ 77|\
 model*:\ 76|\
+model*:\ 90|\
 model*:\ 92|\
 model*:\ 95|\
 model*:\ 87|\
@@ -75,14 +77,19 @@ model*:\ 165|\
 model*:\ 166|\
 model*:\ 85|\
 model*:\ 85) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
+model*:\ 125|\
 model*:\ 126|\
+model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
 model*:\ 143|\
+model*:\ 207|\
 model*:\ 106|\
 model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;  model*:\ 134|\ 
-model*:\ 150) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 150|\
+model*:\ 156|\
+model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
 *)
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
exit 1 ;;
--
2.40.1



Re: [PATCH v3] configure: Implement --enable-host-pie

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:

Ping.  Anyone have any further comments?
Given this was approved before, but got reverted due to issues (which 
have since been addressed) -- I think you might as well go forward and 
sooner rather than later so that we can catch fallout earlier.


jeff


Re: [PATCH] libiberty: writeargv: Simplify function error mode.

2023-06-05 Thread Jeff Law via Gcc-patches




On 6/5/23 08:37, Costas Argyris via Gcc-patches wrote:

writeargv can be simplified by getting rid of the error exit mode
that was only relevant many years ago when the function used
to open the file descriptor internally.

[ ... ]
Thanks.  I've pushed this to the trunk.

You could (as a follow-up) simplify it even further.  There's no need 
for the status variable as far as I can tell.  You could just have the 
final return be "return 0;" instead of "return status;".


Jeff


[PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-05 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8 + 0] = b[i * 8 + 7] + 1;
  a[i * 8 + 1] = b[i * 8 + 7] + 2;
  a[i * 8 + 2] = b[i * 8 + 7] + 8;
  a[i * 8 + 3] = b[i * 8 + 7] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 7] + 6;
  a[i * 8 + 6] = b[i * 8 + 7] + 7;
  a[i * 8 + 7] = b[i * 8 + 7] + 3;
}
}

To enable VLA SLP auto-vectorization, we should be able to handle this 
following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
vsetvli a7,zero,e16,m2,ta,ma
vid.v   v4
vsrl.vi v4,v4,3
li  a3,8
vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 
8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
li  t1,67633152
addit1,t1,513
li  a3,50790400
addia3,a3,1541
sllia3,a3,32
add a3,a3,t1
vsetvli t1,zero,e64,m1,ta,ma
vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
min a3,...
vsetvli zero,a3,e8,m1,ta,ma
vle8.v  v2,0(a6)
vsetvli a7,zero,e8,m1,ta,ma
vrgatherei16.vv v1,v2,v4
vadd.vv v1,v1,v3
vsetvli zero,a3,e8,m1,ta,ma
vse8.v  v1,0(a2)
add a6,a6,a4
add a2,a2,a4
mv  a3,a5
add a5,a5,t1
bgtua3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
since "vrgatherei16.vv" can cover larger
  range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
lbu a5,799(a1)
addiw   a4,a5,1
sb  a4,792(a0)
addiw   a4,a5,2
sb  a4,793(a0)
addiw   a4,a5,8
sb  a4,794(a0)
addiw   a4,a5,4
sb  a4,795(a0)
addiw   a4,a5,5
sb  a4,796(a0)
addiw   a4,a5,6
sb  a4,797(a0)
addiw   a4,a5,7
sb  a4,798(a0)
addiw   a5,a5,3
sb  a5,799(a0)
ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" 
which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc 
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_r

[PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-05 Thread liuhongt via Gcc-patches
r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
TYPE_MIN, but PABSB will store unsigned result into dst. The patch
uses ABSU_EXPR + VCE instead of ABS_EXPR.

Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit
vector absm2 is guarded with TARGET_MMX_WITH_SSE.

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk?


gcc/ChangeLog:

PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple
ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o
TARGET_64BIT.
* config/i386/i386-builtin.def: Replace CODE_FOR_nothing with
real codename for __builtin_ia32_pabs{b,w,d}.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110108.c: New test.
---
 gcc/config/i386/i386-builtin.def |  6 ++--
 gcc/config/i386/i386.cc  | 44 
 gcc/testsuite/gcc.target/i386/pr110108.c | 16 +
 3 files changed, 56 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110108.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 383b68a9bb8..7ba5b6a9d11 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -900,11 +900,11 @@ BDESC (OPTION_MASK_ISA_SSE3, 0, CODE_FOR_sse3_hsubv2df3, 
"__builtin_ia32_hsubpd"
 
 /* SSSE3 */
 BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, "__builtin_ia32_pabsb128", 
IX86_BUILTIN_PABSB128, UNKNOWN, (int) V16QI_FTYPE_V16QI)
-BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
"__builtin_ia32_pabsb", IX86_BUILTIN_PABSB, UNKNOWN, (int) V8QI_FTYPE_V8QI)
+BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
CODE_FOR_ssse3_absv8qi2, "__builtin_ia32_pabsb", IX86_BUILTIN_PABSB, UNKNOWN, 
(int) V8QI_FTYPE_V8QI)
 BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, "__builtin_ia32_pabsw128", 
IX86_BUILTIN_PABSW128, UNKNOWN, (int) V8HI_FTYPE_V8HI)
-BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
"__builtin_ia32_pabsw", IX86_BUILTIN_PABSW, UNKNOWN, (int) V4HI_FTYPE_V4HI)
+BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
CODE_FOR_ssse3_absv4hi2, "__builtin_ia32_pabsw", IX86_BUILTIN_PABSW, UNKNOWN, 
(int) V4HI_FTYPE_V4HI)
 BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, "__builtin_ia32_pabsd128", 
IX86_BUILTIN_PABSD128, UNKNOWN, (int) V4SI_FTYPE_V4SI)
-BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
"__builtin_ia32_pabsd", IX86_BUILTIN_PABSD, UNKNOWN, (int) V2SI_FTYPE_V2SI)
+BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
CODE_FOR_ssse3_absv2si2, "__builtin_ia32_pabsd", IX86_BUILTIN_PABSD, UNKNOWN, 
(int) V2SI_FTYPE_V2SI)
 
 BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_phaddwv8hi3, 
"__builtin_ia32_phaddw128", IX86_BUILTIN_PHADDW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
 BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
CODE_FOR_ssse3_phaddwv4hi3, "__builtin_ia32_phaddw", IX86_BUILTIN_PHADDW, 
UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index d4ff56ee8dd..b09b3c79e99 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -18433,6 +18433,7 @@ bool
 ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi), *g;
+  gimple_seq stmts = NULL;
   tree fndecl = gimple_call_fndecl (stmt);
   gcc_checking_assert (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD));
   int n_args = gimple_call_num_args (stmt);
@@ -18555,7 +18556,6 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
{
  loc = gimple_location (stmt);
  tree type = TREE_TYPE (arg2);
- gimple_seq stmts = NULL;
  if (VECTOR_FLOAT_TYPE_P (type))
{
  tree itype = GET_MODE_INNER (TYPE_MODE (type)) == E_SFmode
@@ -18610,7 +18610,6 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
  tree zero_vec = build_zero_cst (type);
  tree minus_one_vec = build_minus_one_cst (type);
  tree cmp_type = truth_type_for (type);
- gimple_seq stmts = NULL;
  tree cmp = gimple_build (&stmts, tcode, cmp_type, arg0, arg1);
  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
  g = gimple_build_assign (gimple_call_lhs (stmt),
@@ -18904,14 +18903,18 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   break;
 
 case IX86_BUILTIN_PABSB:
+case IX86_BUILTIN_PABSW:
+case IX86_BUILTIN_PABSD:
+  /* 64-bit vector abs2 is only supported under TARGET_MMX_WITH_SSE. 
 */
+  if (!TARGET_64BIT)
+   break;
+  /* FALLTHRU.  */
 case IX86_BUILTIN_PABSB128:
 case IX86_BUILTIN_PABSB256:
 case IX86_BUILTIN_PABSB512:
-case IX86_BUILTIN_PABSW:
 case IX86_BUILTIN_PABSW128:
 case IX86_BUILTIN_PABSW256:
 case IX86_BUILTIN_PABSW512:
-case IX86_BUILTIN_PABSD:
 case IX86_BUILTIN_PABSD128:
 case 

[PATCH] Don't fold _mm{, 256}_blendv_epi8 into (mask < 0 ? src1 : src2) when -funsigned-char.

2023-06-05 Thread liuhongt via Gcc-patches
Since mask < 0 will be always false when -funsigned-char, but
vpblendvb needs to check the most significant bit.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk and backport to GCC12/GCC13 release branch?

gcc/ChangeLog:

PR target/110108
* config/i386/i386-builtin.def (BDESC): Replace
CODE_FOR_nothing with real code name for blendvb builtins.
* config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
_mm{,256}_blendv_epi8 into (mask < 0 ? src1 : src2) when
-funsigned-char.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110108-2.c: New test.
---
 gcc/config/i386/i386-builtin.def   |  4 ++--
 gcc/config/i386/i386.cc|  7 +++
 gcc/testsuite/gcc.target/i386/pr110108-2.c | 14 ++
 3 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110108-2.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 7ba5b6a9d11..b4c99ff62a2 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -944,7 +944,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, 
"__builtin_ia32_dppd", I
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", 
IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps_v4sf, 
"__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_mpsadbw, 
"__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI_INT)
-BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, 
"__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendvb, 
"__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI_V16QI)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendw, 
"__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI_INT)
 
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_sign_extendv8qiv8hi2, 
"__builtin_ia32_pmovsxbw128", IX86_BUILTIN_PMOVSXBW128, UNKNOWN, (int) 
V8HI_FTYPE_V16QI)
@@ -1198,7 +1198,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_andv4di3, 
"__builtin_ia32_andsi256", IX
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_andnotv4di3, 
"__builtin_ia32_andnotsi256", IX86_BUILTIN_ANDNOT256I, UNKNOWN, (int) 
V4DI_FTYPE_V4DI_V4DI)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_uavgv32qi3, 
"__builtin_ia32_pavgb256",  IX86_BUILTIN_PAVGB256, UNKNOWN, (int) 
V32QI_FTYPE_V32QI_V32QI)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_uavgv16hi3, 
"__builtin_ia32_pavgw256",  IX86_BUILTIN_PAVGW256, UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI)
-BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, 
"__builtin_ia32_pblendvb256", IX86_BUILTIN_PBLENDVB256, UNKNOWN, (int) 
V32QI_FTYPE_V32QI_V32QI_V32QI)
+BDESC (OPTION_MASK_ISA_AVX2, 0,  CODE_FOR_avx2_pblendvb, 
"__builtin_ia32_pblendvb256", IX86_BUILTIN_PBLENDVB256, UNKNOWN, (int) 
V32QI_FTYPE_V32QI_V32QI_V32QI)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pblendw, 
"__builtin_ia32_pblendw256", IX86_BUILTIN_PBLENDVW256, UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI_INT)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeqb256", 
IX86_BUILTIN_PCMPEQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeqw256", 
IX86_BUILTIN_PCMPEQW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI)
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b09b3c79e99..f8f6c26c8eb 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -18548,6 +18548,13 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   /* FALLTHRU.  */
 case IX86_BUILTIN_PBLENDVB128:
 case IX86_BUILTIN_BLENDVPS:
+  /* Don't fold PBLENDVB when funsigned-char since mask < 0
+will always be false in the gimple level.  */
+  if ((fn_code == IX86_BUILTIN_PBLENDVB128
+  || fn_code == IX86_BUILTIN_PBLENDVB256)
+ && !flag_signed_char)
+   break;
+
   gcc_assert (n_args == 3);
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
diff --git a/gcc/testsuite/gcc.target/i386/pr110108-2.c 
b/gcc/testsuite/gcc.target/i386/pr110108-2.c
new file mode 100644
index 000..2d1d2fd4991
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110108-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2 -funsigned-char" } */
+/* { dg-final { scan-assembler-times "vpblendvb" 2 } } */
+
+#include 
+__m128i do_stuff_128(__m128i X0, __m128i X1, __m128i X2) {
+  __m128i Result = _mm_blendv_epi8(X0, X1, X2);
+  return Result;
+}
+
+__m256i do_stuff_256(__m256i X0, __m256i X1, __m256i X2) {
+  __m256i Result = _mm256_blendv_epi8(X0, X1, X

Re: [PATCH] Don't fold _mm{, 256}_blendv_epi8 into (mask < 0 ? src1 : src2) when -funsigned-char.

2023-06-05 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 5, 2023 at 9:34 PM liuhongt via Gcc-patches
 wrote:
>
> Since mask < 0 will be always false when -funsigned-char, but
> vpblendvb needs to check the most significant bit.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk and backport to GCC12/GCC13 release branch?

I think this is a better patch and will always be correct and still
get folded at the gimple level (correctly):
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index d4ff56ee8dd..02bf5ba93a5 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -18561,8 +18561,10 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
  tree itype = GET_MODE_INNER (TYPE_MODE (type)) == E_SFmode
? intSI_type_node : intDI_type_node;
  type = get_same_sized_vectype (itype, type);
- arg2 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, arg2);
}
+ else
+   type = signed_type_for (type);
+ arg2 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, arg2);
  tree zero_vec = build_zero_cst (type);
  tree cmp_type = truth_type_for (type);
  tree cmp = gimple_build (&stmts, LT_EXPR, cmp_type, arg2, zero_vec);


Thanks,
Andrew Pinski


>
> gcc/ChangeLog:
>
> PR target/110108
> * config/i386/i386-builtin.def (BDESC): Replace
> CODE_FOR_nothing with real code name for blendvb builtins.
> * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
> _mm{,256}_blendv_epi8 into (mask < 0 ? src1 : src2) when
> -funsigned-char.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr110108-2.c: New test.
> ---
>  gcc/config/i386/i386-builtin.def   |  4 ++--
>  gcc/config/i386/i386.cc|  7 +++
>  gcc/testsuite/gcc.target/i386/pr110108-2.c | 14 ++
>  3 files changed, 23 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110108-2.c
>
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index 7ba5b6a9d11..b4c99ff62a2 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -944,7 +944,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, 
> "__builtin_ia32_dppd", I
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, 
> "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) 
> V4SF_FTYPE_V4SF_V4SF_INT)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps_v4sf, 
> "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) 
> V4SF_FTYPE_V4SF_V4SF_INT)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_mpsadbw, 
> "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) 
> V16QI_FTYPE_V16QI_V16QI_INT)
> -BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) 
> V16QI_FTYPE_V16QI_V16QI_V16QI)
> +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendvb, 
> "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) 
> V16QI_FTYPE_V16QI_V16QI_V16QI)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendw, 
> "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) 
> V8HI_FTYPE_V8HI_V8HI_INT)
>
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_sign_extendv8qiv8hi2, 
> "__builtin_ia32_pmovsxbw128", IX86_BUILTIN_PMOVSXBW128, UNKNOWN, (int) 
> V8HI_FTYPE_V16QI)
> @@ -1198,7 +1198,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_andv4di3, 
> "__builtin_ia32_andsi256", IX
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_andnotv4di3, 
> "__builtin_ia32_andnotsi256", IX86_BUILTIN_ANDNOT256I, UNKNOWN, (int) 
> V4DI_FTYPE_V4DI_V4DI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_uavgv32qi3, 
> "__builtin_ia32_pavgb256",  IX86_BUILTIN_PAVGB256, UNKNOWN, (int) 
> V32QI_FTYPE_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_uavgv16hi3, 
> "__builtin_ia32_pavgw256",  IX86_BUILTIN_PAVGW256, UNKNOWN, (int) 
> V16HI_FTYPE_V16HI_V16HI)
> -BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pblendvb256", IX86_BUILTIN_PBLENDVB256, UNKNOWN, (int) 
> V32QI_FTYPE_V32QI_V32QI_V32QI)
> +BDESC (OPTION_MASK_ISA_AVX2, 0,  CODE_FOR_avx2_pblendvb, 
> "__builtin_ia32_pblendvb256", IX86_BUILTIN_PBLENDVB256, UNKNOWN, (int) 
> V32QI_FTYPE_V32QI_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pblendw, 
> "__builtin_ia32_pblendw256", IX86_BUILTIN_PBLENDVW256, UNKNOWN, (int) 
> V16HI_FTYPE_V16HI_V16HI_INT)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pcmpeqb256", IX86_BUILTIN_PCMPEQB256, UNKNOWN, (int) 
> V32QI_FTYPE_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pcmpeqw256", IX86_BUILTIN_PCMPEQW256, UNKNOWN, (int) 
> V16HI_FTYPE_V16HI_V16HI)
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b09b3c79e99..f8f6c26c8eb 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-05 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 5, 2023 at 9:34 PM liuhongt via Gcc-patches
 wrote:
>
> r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
> TYPE_MIN, but PABSB will store unsigned result into dst. The patch
> uses ABSU_EXPR + VCE instead of ABS_EXPR.
>
> Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit
> vector absm2 is guarded with TARGET_MMX_WITH_SSE.
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
>
> gcc/ChangeLog:
>
> PR target/110108
> * config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
> _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple
> ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o
> TARGET_64BIT.
> * config/i386/i386-builtin.def: Replace CODE_FOR_nothing with
> real codename for __builtin_ia32_pabs{b,w,d}.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr110108.c: New test.
> ---
>  gcc/config/i386/i386-builtin.def |  6 ++--
>  gcc/config/i386/i386.cc  | 44 
>  gcc/testsuite/gcc.target/i386/pr110108.c | 16 +
>  3 files changed, 56 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110108.c
>
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index 383b68a9bb8..7ba5b6a9d11 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -900,11 +900,11 @@ BDESC (OPTION_MASK_ISA_SSE3, 0, 
> CODE_FOR_sse3_hsubv2df3, "__builtin_ia32_hsubpd"
>
>  /* SSSE3 */
>  BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pabsb128", IX86_BUILTIN_PABSB128, UNKNOWN, (int) 
> V16QI_FTYPE_V16QI)
> -BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pabsb", IX86_BUILTIN_PABSB, UNKNOWN, (int) V8QI_FTYPE_V8QI)
> +BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
> CODE_FOR_ssse3_absv8qi2, "__builtin_ia32_pabsb", IX86_BUILTIN_PABSB, UNKNOWN, 
> (int) V8QI_FTYPE_V8QI)
>  BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pabsw128", IX86_BUILTIN_PABSW128, UNKNOWN, (int) 
> V8HI_FTYPE_V8HI)
> -BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pabsw", IX86_BUILTIN_PABSW, UNKNOWN, (int) V4HI_FTYPE_V4HI)
> +BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
> CODE_FOR_ssse3_absv4hi2, "__builtin_ia32_pabsw", IX86_BUILTIN_PABSW, UNKNOWN, 
> (int) V4HI_FTYPE_V4HI)
>  BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pabsd128", IX86_BUILTIN_PABSD128, UNKNOWN, (int) 
> V4SI_FTYPE_V4SI)
> -BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, 
> "__builtin_ia32_pabsd", IX86_BUILTIN_PABSD, UNKNOWN, (int) V2SI_FTYPE_V2SI)
> +BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
> CODE_FOR_ssse3_absv2si2, "__builtin_ia32_pabsd", IX86_BUILTIN_PABSD, UNKNOWN, 
> (int) V2SI_FTYPE_V2SI)
>
>  BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_phaddwv8hi3, 
> "__builtin_ia32_phaddw128", IX86_BUILTIN_PHADDW128, UNKNOWN, (int) 
> V8HI_FTYPE_V8HI_V8HI)
>  BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
> CODE_FOR_ssse3_phaddwv4hi3, "__builtin_ia32_phaddw", IX86_BUILTIN_PHADDW, 
> UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d4ff56ee8dd..b09b3c79e99 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -18433,6 +18433,7 @@ bool
>  ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  {
>gimple *stmt = gsi_stmt (*gsi), *g;
> +  gimple_seq stmts = NULL;
>tree fndecl = gimple_call_fndecl (stmt);
>gcc_checking_assert (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD));
>int n_args = gimple_call_num_args (stmt);
> @@ -18555,7 +18556,6 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> {
>   loc = gimple_location (stmt);
>   tree type = TREE_TYPE (arg2);
> - gimple_seq stmts = NULL;
>   if (VECTOR_FLOAT_TYPE_P (type))
> {
>   tree itype = GET_MODE_INNER (TYPE_MODE (type)) == E_SFmode
> @@ -18610,7 +18610,6 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>   tree zero_vec = build_zero_cst (type);
>   tree minus_one_vec = build_minus_one_cst (type);
>   tree cmp_type = truth_type_for (type);
> - gimple_seq stmts = NULL;
>   tree cmp = gimple_build (&stmts, tcode, cmp_type, arg0, arg1);
>   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>   g = gimple_build_assign (gimple_call_lhs (stmt),
> @@ -18904,14 +18903,18 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>break;
>
>  case IX86_BUILTIN_PABSB:
> +case IX86_BUILTIN_PABSW:
> +case IX86_BUILTIN_PABSD:
> +  /* 64-bit vector abs2 is only supported under 
> TARGET_MMX_WITH_SSE.  */
> +  if (!TARGET_64BIT)
> +   break;
> +  /* FALLTHRU.  */
>  case IX86_BUILTIN_

[PATCH] riscv: Fix insn cost calculation

2023-06-05 Thread Dimitar Dimitrov
When building riscv32-none-elf with "--enable-checking=yes,rtl", the
following ICE is observed:

  cc1: internal compiler error: RTL check: expected code 'const_int', have 
'const_double' in riscv_const_insns, at config/riscv/riscv.cc:1313
  0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
  0x8eab61 riscv_const_insns(rtx_def*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:1313
  0x15443bb riscv_legitimate_constant_p
  /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:826
  0xdd3c71 emit_move_insn(rtx_def*, rtx_def*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/expr.cc:4310
  0x15f28e5 run_const_vector_selftests
  
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:285
  0x15f37bd selftest::riscv_run_selftests()
  
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:364
  0x1f6fba9 selftest::run_tests()
  /mnt/nvme/dinux/local-workspace/gcc/gcc/selftest-run-tests.cc:111
  0x11d1f39 toplev::run_self_tests()
  /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:2185

Fix by following the spirit of the adjacent comment, and using the
dedicated riscv_const_insns() function to calculate cost for loading a
constant element.  Infinite recursion is not possible because the first
invocation is on a CONST_VECTOR, whereas the second is on a single
element of the vector (e.g. CONST_INT or CONST_DOUBLE).

Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.  I don't have setup to test riscv64.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Recursively call
for constant element of a vector.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3954c89a039..c15da1d0e30 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1310,7 +1310,7 @@ riscv_const_insns (rtx x)
   a general-purpose register.  This means we need as many
   insns as it takes to load the constant into the GPR
   and one vmv.v.x.  */
-   return 1 + riscv_integer_cost (INTVAL (elt));
+   return 1 + riscv_const_insns (elt);
  }
  }
 
-- 
2.40.1



[PATCH] riscv: Fix scope for memory model calculation

2023-06-05 Thread Dimitar Dimitrov
During libgcc configure stage for riscv32-none-elf, when
"--enable-checking=yes,rtl" has been activated, the following error
is observed:

  configure:3814: 
/home/dinux/projects/pru/local-workspace/riscv32-gcc-build/./gcc/xgcc 
-B/home/dinux/projects/pru/local-workspace/riscv32-gcc-build/./gcc/ 
-B/mnt/nvme/dinux/local-workspace/riscv32-opt/riscv32-none-elf/bin/ 
-B/mnt/nvme/dinux/local-workspace/riscv32-opt/riscv32-none-elf/lib/ -isystem 
/mnt/nvme/dinux/local-workspace/riscv32-opt/riscv32-none-elf/include -isystem 
/mnt/nvme/dinux/local-workspace/riscv32-opt/riscv32-none-elf/sys-include-c 
-g -O2  conftest.c >&5
  during RTL pass: final
  conftest.c: In function 'main':
  conftest.c:16:1: internal compiler error: RTL check: expected code 
'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462
 16 | }
| ^
  0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
  0x8ea823 riscv_print_operand
  /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462
  0xde84b5 output_operand(rtx_def*, int)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632
  0xde8ef8 output_asm_insn(char const*, rtx_def**)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544
  0xded33b output_asm_insn(char const*, rtx_def**)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421
  0xded33b final_scan_insn_1
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841
  0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887
  0xded8b7 final_1
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979
  0xdee518 rest_of_handle_final
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240
  0xdee518 execute
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318

Fix by moving the calculation of memmodel to the cases where it is used.

Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.  I don't have setup to test riscv64.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/riscv/riscv.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c15da1d0e30..fa4bc3e1f7e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4459,7 +4459,6 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 }
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
-  const enum memmodel model = memmodel_base (INTVAL (op));
 
   switch (letter)
 {
@@ -4596,7 +4595,8 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   fputs (GET_RTX_NAME (code), file);
   break;
 
-case 'A':
+case 'A': {
+  const enum memmodel model = memmodel_base (INTVAL (op));
   if (riscv_memmodel_needs_amo_acquire (model)
  && riscv_memmodel_needs_amo_release (model))
fputs (".aqrl", file);
@@ -4605,18 +4605,23 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   else if (riscv_memmodel_needs_amo_release (model))
fputs (".rl", file);
   break;
+}
 
-case 'I':
+case 'I': {
+  const enum memmodel model = memmodel_base (INTVAL (op));
   if (model == MEMMODEL_SEQ_CST)
fputs (".aqrl", file);
   else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
   break;
+}
 
-case 'J':
+case 'J': {
+  const enum memmodel model = memmodel_base (INTVAL (op));
   if (riscv_memmodel_needs_amo_release (model))
fputs (".rl", file);
   break;
+}
 
 case 'i':
   if (code != REG)
-- 
2.40.1



Re: [PATCH v2] MIPS16: Implement `code_readable` function attribute.

2023-06-05 Thread YunQiang Su via Gcc-patches
Jie Mei  于2023年5月19日周五 16:07写道:
>
> From: Simon Dardis 
>
> Support for __attribute__ ((code_readable)).  Takes up to one argument of
> "yes", "no", "pcrel".  This will change the code readability setting for just
> that function.  If no argument is supplied, then the setting is 'yes'.
>
> gcc/ChangeLog:
>
> * config/mips/mips.cc (enum mips_code_readable_setting):New enmu.
> (mips_handle_code_readable_attr):New static function.
> (mips_get_code_readable_attr):New static enum function.
> (mips_set_current_function):Set the code_readable mode.
> (mips_option_override):Same as above.
> * doc/extend.texi:Document code_readable.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/mips/code-readable-attr-1.c: New test.
> * gcc.target/mips/code-readable-attr-2.c: New test.
> * gcc.target/mips/code-readable-attr-3.c: New test.
> * gcc.target/mips/code-readable-attr-4.c: New test.
> * gcc.target/mips/code-readable-attr-5.c: New test.
> ---
>  gcc/config/mips/mips.cc   | 97 ++-
>  gcc/doc/extend.texi   | 17 
>  .../gcc.target/mips/code-readable-attr-1.c| 51 ++
>  .../gcc.target/mips/code-readable-attr-2.c| 49 ++
>  .../gcc.target/mips/code-readable-attr-3.c| 50 ++
>  .../gcc.target/mips/code-readable-attr-4.c| 51 ++
>  .../gcc.target/mips/code-readable-attr-5.c|  5 +
>  7 files changed, 319 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-2.c
>  create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-3.c
>  create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-4.c
>  create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-5.c
>
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index ca822758b41..97f45e67529 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -498,6 +498,9 @@ static int mips_base_target_flags;
>  /* The default compression mode.  */
>  unsigned int mips_base_compression_flags;
>
> +/* The default code readable setting.  */
> +enum mips_code_readable_setting mips_base_code_readable;
> +
>  /* The ambient values of other global variables.  */
>  static int mips_base_schedule_insns; /* flag_schedule_insns */
>  static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
> @@ -602,6 +605,7 @@ const enum reg_class 
> mips_regno_to_class[FIRST_PSEUDO_REGISTER] = {
>ALL_REGS,ALL_REGS,   ALL_REGS,   ALL_REGS
>  };
>
> +static tree mips_handle_code_readable_attr (tree *, tree, tree, int, bool *);
>  static tree mips_handle_interrupt_attr (tree *, tree, tree, int, bool *);
>  static tree mips_handle_use_shadow_register_set_attr (tree *, tree, tree, 
> int,
>   bool *);
> @@ -623,6 +627,8 @@ static const struct attribute_spec mips_attribute_table[] 
> = {
>{ "micromips",   0, 0, true,  false, false, false, NULL, NULL },
>{ "nomicromips", 0, 0, true,  false, false, false, NULL, NULL },
>{ "nocompression", 0, 0, true,  false, false, false, NULL, NULL },
> +  { "code_readable", 0, 1, true,  false, false, false,
> +mips_handle_code_readable_attr, NULL },
>/* Allow functions to be specified as interrupt handlers */
>{ "interrupt",   0, 1, false, true,  true, false, 
> mips_handle_interrupt_attr,
>  NULL },
> @@ -1310,6 +1316,81 @@ mips_use_debug_exception_return_p (tree type)
>TYPE_ATTRIBUTES (type)) != NULL;
>  }
>
> +
> +/* Verify the arguments to a code_readable attribute.  */
> +
> +static tree
> +mips_handle_code_readable_attr (tree *node ATTRIBUTE_UNUSED, tree name,
> +   tree args, int flags ATTRIBUTE_UNUSED,
> +   bool *no_add_attrs)
> +{
> +  if (!is_attribute_p ("code_readable", name) || args == NULL)
> +return NULL_TREE;
> +
> +  if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
> +{
> +  warning (OPT_Wattributes,
> +  "%qE attribute requires a string argument", name);
> +  *no_add_attrs = true;
> +}
> +  else if (strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "no") != 0
> +  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "pcrel") != 0
> +  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "yes") != 0)
> +{
> +  warning (OPT_Wattributes,
> +  "argument to %qE attribute is neither no, pcrel nor yes", 
> name);
> +  *no_add_attrs = true;
> +}
> +
> +  return NULL_TREE;
> +}
> +
> +/* Determine the code_readable setting for a function if it has one.  Set
> +   *valid to true if we have a properly formed argument and
> +   return the result. If there's no argument, return GCC's default.

contrib/check_GNU_style.sh complains that one more spac

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-05 Thread Richard Biener via Gcc-patches
On Mon, Jun 5, 2023 at 3:49 PM Andrew Stubbs  wrote:
>
> On 30/05/2023 07:26, Richard Biener wrote:
> > On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs  wrote:
> >>
> >> Hi all,
> >>
> >> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
> >> do it because the GCC middle-end models DIVMOD's return value as
> >> "complex int" type, and there are no vector equivalents of that type.
> >>
> >> Therefore, this patch adds minimal support for "complex vector int"
> >> modes.  I have not attempted to provide any means to use these modes
> >> from C, so they're really only useful for DIVMOD.  The actual libfunc
> >> implementation will pack the data into wider vector modes manually.
> >>
> >> A knock-on effect of this is that I needed to increase the range of
> >> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
> >> the previous 255-byte limit).
> >>
> >> Since this change would add a large number of new, unused modes to many
> >> architectures, I have elected to *not* enable them, by default, in
> >> machmode.def (where the other complex modes are created).  The new modes
> >> are therefore inactive on all architectures but amdgcn, for now.
> >>
> >> OK for mainline?  (I've not done a full test yet, but I will.)
> >
> > I think it makes more sense to map vector CSImode to vector SImode with
> > the double number of lanes.  In fact since divmod is a libgcc function
> > I wonder where your vector variant would reside and how GCC decides to
> > emit calls to it?  That is, there's no way to OMP simd declare this 
> > function?
>
> The divmod implementation lives in libgcc. It's not too difficult to
> write using vector extensions and some asm tricks. I did try an OMP simd
> declare implementation, but it didn't vectorize well, and that's a yack
> I don't wish to shave right now.
>
> In any case, the OMP simd declare will not help us here, directly,
> because the DIVMOD transformation happens too late in the pass pipeline,
> long after ifcvt and vect. My implementation (not yet posted), uses a
> libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way.
> It just needs the complex vector modes to exist.
>
> Using vectors twice the length is problematic also. If I create a new
> V128SImode that spans across two 64-lane vector registers then that will
> probably have the desired effect ("real" quotient in v8, "imaginary"
> remainder in v9), but if I use V64SImode to represent two V32SImode
> vectors then that's a one-register mode, and I'll have to use a
> permutation (a memory operation) to extract lanes 32-63 into lanes 0-31,
> and if we ever want to implement instructions that operate on these
> modes (as opposed to the odd/even add/sub complex patterns we have now)
> then the masking will be all broken and we'd need to constantly
> disassemble the double length vectors to operate on them.

I'm a bit confused as I don't see the difference between V64SCImode and
V128SImode since both contain 128 SImode values.  And I would expect
the imag/real parts to be _always_ interleaved, irrespective of whether
the result fits one or two vector registers.

> The implementation I proposed is essentially a struct containing two
> vectors placed in consecutive registers. This is the natural
> representation for the architecture.

I don't think you did that?  Or at least I don't see how vectors of
complex modes would match that.  It would be a complex of a vector
mode instead, no?

I do see that internal functions with more than one output would be
desirable and I think I proposed ASMs with a "coded text" aka
something like a pattern ID or an optab identifier would be the best
fit on GIMPLE but TARGET_EXPAND_DIVMOD_LIBFUNC for this
particular case should be a good fit as well, no?

Can you share what you needed to change to get your complex vector int
code actually working?  What does the divmod pattern matching create
for the return type?  The pass has

  /* Disable the transform if either is a constant, since division-by-constant
 may have specialized expansion.  */
  if (CONSTANT_CLASS_P (op1))
return false;

  if (CONSTANT_CLASS_P (op2))
{
  if (integer_pow2p (op2))
return false;

  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
  && TYPE_PRECISION (type) <= BITS_PER_WORD)
return false;

at least the TYPE_PRECISION query is bogus when type is a vector type
and the IFN building does

 /* Part 3: Create libcall to internal fn DIVMOD:
 divmod_tmp = DIVMOD (op1, op2).  */

  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
  tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
 call_stmt, "divmod_tmp");

so that builds a complex type with a vector component, not a vector
with complex components.

Richard.


> Anyway, you don't like this patch and I see that AArch64 is picking
> apart BLKmode to see if there's complex inside, so maybe I can make
> somethi

Re: [RFA] Improve strcmp expansion when one input is a constant string.

2023-06-05 Thread Richard Biener via Gcc-patches
On Mon, Jun 5, 2023 at 8:41 PM Jeff Law  wrote:
>
>
>
> On 6/5/23 00:29, Richard Biener wrote:
>
> >
> > But then for example x86 has smaller encoding for byte ops and while
> > widening is easily done later, truncation is not.
> Sadly, the x86 costing looks totally bogus here.  We actually emit the
> exact same code for a QI mode loads vs a zero-extending load from QI to
> SI.  But the costing is different and would tend to prefer QImode.  That
> in turn is going to force an extension at the end of the sequence which
> would be a regression relative to the current code.  Additionally we may
> get partial register stalls for the byte ops to implement the comparison
> steps.
>
> The net result is that querying the backend's costs would do the exact
> opposite of what I think we want on x86.  One could argue the x86
> maintainers should improve this situation...
>
> >
> > Note I would have expected to use the mode of the load so we truly
> > elide some extensions, using word_mode looks like just another
> > mode here?  The key to note is probably
> >
> >op0 = convert_modes (mode, unit_mode, op0, 1);
> >op1 = convert_modes (mode, unit_mode, op1, 1);
> >rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
> >result, 1, OPTAB_WIDEN);
> >
> > which uses OPTAB_WIDEN - wouldn't it be better to pass in the
> > unconverted modes and leave the decision which mode to use
> > to OPTAB_WIDEN?  Should we somehow query the target for
> > the smallest mode from unit_mode it can do both the MINUS
> > and the compare?
> And avoiding OPTAB_WIDEN isn't going to help rv64 at all.  The core
> issue being that we do define 32bit ops.  With Jivan's patch those 32bit
> ops expose the sign extending nature.  So a 32bit add would look
> something like
>
> (set (temp:DI) (sign_extend:DI (plus:SI (op:SI) (op:SI
> (set (res:SI) (subreg:SI (temp:DI) 0)
>
> Where we mark the subreg with SUBREG_PROMOTED_VAR_P.
>
>
> I'm not sure the best way to proceed now.  I could just put this on the
> back-burner as it's RISC-V specific and the gains elsewhere dwarf this
> issue.

I wonder if there's some more generic target macro we can key the
behavior off - SLOW_BYTE_ACCESS isn't a good fit, WORD_REGISTER_OPERATIONS
is maybe closer but it's exact implications are unknown to me.  Maybe
there's something else as well ...

The point about OPTAB_WIDEN above was that I wonder why we
extend 'op0' and 'op1' before emitting the binop when we allow WIDEN
anyway.  Yes, we want the result in 'mode' (but why?  As you say we
can extend at the end) and there's likely no way to tell expand_simple_binop
to "expand as needed and not narrow the result".  So I wonder if we should
emulate that somehow (also taking into consideration the compare).

Richard.

>
> jeff


[PATCH v3] MIPS16: Implement `code_readable` function attribute.

2023-06-05 Thread Jie Mei
From: Simon Dardis 

Support for __attribute__ ((code_readable)).  Takes up to one argument of
"yes", "no", "pcrel".  This will change the code readability setting for just
that function.  If no argument is supplied, then the setting is 'yes'.

gcc/ChangeLog:

* config/mips/mips.cc (enum mips_code_readable_setting):New enmu.
(mips_handle_code_readable_attr):New static function.
(mips_get_code_readable_attr):New static enum function.
(mips_set_current_function):Set the code_readable mode.
(mips_option_override):Same as above.
* doc/extend.texi:Document code_readable.

gcc/testsuite/ChangeLog:

* gcc.target/mips/code-readable-attr-1.c: New test.
* gcc.target/mips/code-readable-attr-2.c: New test.
* gcc.target/mips/code-readable-attr-3.c: New test.
* gcc.target/mips/code-readable-attr-4.c: New test.
* gcc.target/mips/code-readable-attr-5.c: New test.
---
 gcc/config/mips/mips.cc   | 97 ++-
 gcc/doc/extend.texi   | 17 
 .../gcc.target/mips/code-readable-attr-1.c| 51 ++
 .../gcc.target/mips/code-readable-attr-2.c| 49 ++
 .../gcc.target/mips/code-readable-attr-3.c| 50 ++
 .../gcc.target/mips/code-readable-attr-4.c| 51 ++
 .../gcc.target/mips/code-readable-attr-5.c|  5 +
 7 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-5.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..97f45e67529 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -498,6 +498,9 @@ static int mips_base_target_flags;
 /* The default compression mode.  */
 unsigned int mips_base_compression_flags;
 
+/* The default code readable setting.  */
+enum mips_code_readable_setting mips_base_code_readable;
+
 /* The ambient values of other global variables.  */
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
@@ -602,6 +605,7 @@ const enum reg_class 
mips_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   ALL_REGS,ALL_REGS,   ALL_REGS,   ALL_REGS
 };
 
+static tree mips_handle_code_readable_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_interrupt_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_use_shadow_register_set_attr (tree *, tree, tree, int,
  bool *);
@@ -623,6 +627,8 @@ static const struct attribute_spec mips_attribute_table[] = 
{
   { "micromips",   0, 0, true,  false, false, false, NULL, NULL },
   { "nomicromips", 0, 0, true,  false, false, false, NULL, NULL },
   { "nocompression", 0, 0, true,  false, false, false, NULL, NULL },
+  { "code_readable", 0, 1, true,  false, false, false,
+mips_handle_code_readable_attr, NULL },
   /* Allow functions to be specified as interrupt handlers */
   { "interrupt",   0, 1, false, true,  true, false, mips_handle_interrupt_attr,
 NULL },
@@ -1310,6 +1316,81 @@ mips_use_debug_exception_return_p (tree type)
   TYPE_ATTRIBUTES (type)) != NULL;
 }
 
+
+/* Verify the arguments to a code_readable attribute.  */
+
+static tree
+mips_handle_code_readable_attr (tree *node ATTRIBUTE_UNUSED, tree name,
+   tree args, int flags ATTRIBUTE_UNUSED,
+   bool *no_add_attrs)
+{
+  if (!is_attribute_p ("code_readable", name) || args == NULL)
+return NULL_TREE;
+
+  if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  warning (OPT_Wattributes,
+  "%qE attribute requires a string argument", name);
+  *no_add_attrs = true;
+}
+  else if (strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "no") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "pcrel") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "yes") != 0)
+{
+  warning (OPT_Wattributes,
+  "argument to %qE attribute is neither no, pcrel nor yes", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Determine the code_readable setting for a function if it has one.  Set
+   *valid to true if we have a properly formed argument and
+   return the result.  If there's no argument, return GCC's default.
+   Otherwise, leave valid false and return mips_base_code_readable.  In
+   that case the result should be unused anyway.  */
+
+static enum mips_code_readable_setting
+mips_get_code_readable_attr (tree decl)
+{
+  tree attr;
+
+  if (decl == NULL)
+return mips_base_code_readable;
+
+  attr =

Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-05 Thread Richard Biener via Gcc-patches
On Tue, Jun 6, 2023 at 6:17 AM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch enables basic VLA SLP auto-vectorization.
> Consider this following case:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
>   for (int i = 0; i < 100; ++i)
> {
>   a[i * 8 + 0] = b[i * 8 + 7] + 1;
>   a[i * 8 + 1] = b[i * 8 + 7] + 2;
>   a[i * 8 + 2] = b[i * 8 + 7] + 8;
>   a[i * 8 + 3] = b[i * 8 + 7] + 4;
>   a[i * 8 + 4] = b[i * 8 + 7] + 5;
>   a[i * 8 + 5] = b[i * 8 + 7] + 6;
>   a[i * 8 + 6] = b[i * 8 + 7] + 7;
>   a[i * 8 + 7] = b[i * 8 + 7] + 3;
> }
> }
>
> To enable VLA SLP auto-vectorization, we should be able to handle this 
> following const vector:
>
> 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
> { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
> 16, ... }
>
> 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
> { 1, 2, 8, 4, 5, 6, 7, 3, ... }
>
> And these vector can be generated at prologue.
>
> After this patch, we end up with this following codegen:
>
> Prologue:
> ...
> vsetvli a7,zero,e16,m2,ta,ma
> vid.v   v4
> vsrl.vi v4,v4,3
> li  a3,8
> vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 
> 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
> ...
> li  t1,67633152
> addit1,t1,513
> li  a3,50790400
> addia3,a3,1541
> sllia3,a3,32
> add a3,a3,t1
> vsetvli t1,zero,e64,m1,ta,ma
> vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
> ...
> LoopBody:
> ...
> min a3,...
> vsetvli zero,a3,e8,m1,ta,ma
> vle8.v  v2,0(a6)
> vsetvli a7,zero,e8,m1,ta,ma
> vrgatherei16.vv v1,v2,v4
> vadd.vv v1,v1,v3
> vsetvli zero,a3,e8,m1,ta,ma
> vse8.v  v1,0(a2)
> add a6,a6,a4
> add a2,a2,a4
> mv  a3,a5
> add a5,a5,t1
> bgtua3,a4,.L3
> ...
>
> Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
> since "vrgatherei16.vv" can cover larger
>   range than "vrgather.vv" (which only can maximum element index = 255).
> Epilogue:
> lbu a5,799(a1)
> addiw   a4,a5,1
> sb  a4,792(a0)
> addiw   a4,a5,2
> sb  a4,793(a0)
> addiw   a4,a5,8
> sb  a4,794(a0)
> addiw   a4,a5,4
> sb  a4,795(a0)
> addiw   a4,a5,5
> sb  a4,796(a0)
> addiw   a4,a5,6
> sb  a4,797(a0)
> addiw   a4,a5,7
> sb  a4,798(a0)
> addiw   a5,a5,3
> sb  a5,799(a0)
> ret
>
> There is one more last thing we need to do is the "Epilogue 
> auto-vectorization" which needs VLS modes support.
> I will support VLS modes for "Epilogue auto-vectorization" in the future.

What's the epilogue generated for?  With a VLA main loop body you
shouldn't have one apart from
when that body isn't entered because of cost or alias reasons?

>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
> * config/riscv/riscv-v.cc 
> (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
> (rvv_builder::single_step_npatterns_p): New function.
> (rvv_builder::npatterns_all_equal_p): Ditto.
> (const_vec_all_in_range_p): Support POLY handling.
> (gen_const_vector_dup): Ditto.
> (emit_vlmax_gather_insn): Add vrgatherei16.
> (emit_vlmax_masked_gather_mu_insn): Ditto.
> (expand_const_vector): Add VLA SLP const vector support.
> (expand_vec_perm): Support POLY.
> (struct expand_vec_perm_d): New struct.
> (shuffle_generic_patterns): New function.
> (expand_vec_perm_const_1): Ditto.
> (expand_vec_perm_const): Ditto.
> * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
> (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
> vectorizer.
> * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
>