Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
> Kito (or somebody else), would you mind doing a RISC-V bootstrap? It would > take forever on my machine. Thank you. I did a bootstrap myself now and it finally finished. Going to commit the attached tomorrow. Regards Robin Subject: [PATCH] Change MODE_BITSIZE to MODE_PRECISION for MODE_VECTOR_BOOL. RISC-V lowers the TYPE_PRECISION for MODE_VECTOR_BOOL vectors in order to distinguish between VNx1BI, VNx2BI, VNx4BI and VNx8BI. This patch adjusts uses of MODE_VECTOR_BOOL to use GET_MODE_PRECISION instead of GET_MODE_BITSIZE. The RISC-V tests are provided by Juzhe. Co-Authored-By: Juzhe-Zhong gcc/c-family/ChangeLog: * c-common.cc (c_common_type_for_mode): Use GET_MODE_PRECISION. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Ditto. * simplify-rtx.cc (native_encode_rtx): Ditto. (native_decode_vector_rtx): Ditto. (simplify_const_vector_byte_offset): Ditto. (simplify_const_vector_subreg): Ditto. * tree.cc (build_truth_vector_type_for_mode): Ditto. * varasm.cc (output_constant_pool_2): Ditto. gcc/fortran/ChangeLog: * trans-types.cc (gfc_type_for_mode): Ditto. gcc/go/ChangeLog: * go-lang.cc (go_langhook_type_for_mode): Ditto. gcc/lto/ChangeLog: * lto-lang.cc (lto_type_for_mode): Ditto. gcc/rust/ChangeLog: * backend/rust-tree.cc (c_common_type_for_mode): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test. --- gcc/c-family/c-common.cc | 2 +- gcc/config/riscv/riscv-v.cc | 12 gcc/fortran/trans-types.cc| 2 +- gcc/go/go-lang.cc | 2 +- gcc/lto/lto-lang.cc | 2 +- gcc/rust/backend/rust-tree.cc | 2 +- gcc/simplify-rtx.cc | 10 +++ .../riscv/rvv/autovec/vls-vlmax/bitmask-1.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-10.c | 22 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-11.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-12.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-13.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-14.c | 24 +++ .../riscv/rvv/autovec/vls-vlmax/bitmask-2.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-3.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-4.c | 23 ++ .../riscv/rvv/autovec/vls-vlmax/bitmask-5.c | 25 .../riscv/rvv/autovec/vls-vlmax/bitmask-6.c | 27 + .../riscv/rvv/autovec/vls-vlmax/bitmask-7.c | 30 +++ .../riscv/rvv/autovec/vls-vlmax/bitmask-8.c | 30 +++ .../riscv/rvv/autovec/vls-vlmax/bitmask-9.c | 30 +++ gcc/tree.cc | 2 +- gcc/varasm.cc | 8 - 23 files changed, 374 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c create mode 100644
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
>> Hi Robin: >> >>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc >>> index 52d7626e92e..14d419c2013 100644 >>> --- a/gcc/lto/lto-lang.cc >>> +++ b/gcc/lto/lto-lang.cc >>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p) >>>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL >>>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode))) >>> { >>> - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE >>> (mode), >>> + unsigned int elem_bits = vectwhereor_element_size >>> (GET_MODE_PRECISION (mode), >> >> This seems weird? Indeed :D Must be an accidental middle-click in Thunderbird. I just re-checked and the diff itself is fine. > FWIW, I bootstrapped & regression-tested the patch with that fixed > on aarch64-linux-gnu (all languages). > > So OK with the above fixed from my POV. Oh, thanks! Mine is still running, not even with all languages. I picked the M1 from the compile farm which only has eight cores. Kito (or somebody else), would you mind doing a RISC-V bootstrap? It would take forever on my machine. Thank you. Regards Robin
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Kito Cheng writes: > Hi Robin: > >> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc >> index 52d7626e92e..14d419c2013 100644 >> --- a/gcc/lto/lto-lang.cc >> +++ b/gcc/lto/lto-lang.cc >> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p) >>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL >>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode))) >> { >> - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), >> + unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION >> (mode), > > This seems weird? FWIW, I bootstrapped & regression-tested the patch with that fixed on aarch64-linux-gnu (all languages). So OK with the above fixed from my POV. Thanks, Richard > >> GET_MODE_NUNITS (mode)); >>tree bool_type = build_nonstandard_boolean_type (elem_bits); >>return build_vector_type_for_mode (bool_type, mode);
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Hi Robin: > diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc > index 52d7626e92e..14d419c2013 100644 > --- a/gcc/lto/lto-lang.cc > +++ b/gcc/lto/lto-lang.cc > @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p) >else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL >&& valid_vector_subparts_p (GET_MODE_NUNITS (mode))) > { > - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), > + unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION > (mode), This seems weird? > GET_MODE_NUNITS (mode)); >tree bool_type = build_nonstandard_boolean_type (elem_bits); >return build_vector_type_for_mode (bool_type, mode);
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
> This should probably use GET_MODE_PRECISION as well. > > OK if it bootstraps/tests on both aarch64 and riscv. > > Richard. I found a several other instances, also in the frontends that I'm not exactly sure about. I'm currently testing this but aarch64 bootstrap is still going to take a while, various aarch compile farm machines are down? Regards Robin >From ef919a27f4a156afeca6b4825e6029d9f44be556 Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Wed, 28 Jun 2023 20:59:29 +0200 Subject: [PATCH] mode_bitsize -> precision. bitsize -> precision. --- gcc/c-family/c-common.cc | 2 +- gcc/fortran/trans-types.cc| 2 +- gcc/go/go-lang.cc | 2 +- gcc/lto/lto-lang.cc | 2 +- gcc/rust/backend/rust-tree.cc | 2 +- gcc/simplify-rtx.cc | 10 +- gcc/tree.cc | 2 +- gcc/varasm.cc | 2 +- 8 files changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index 34566a342bd..6ab63dae997 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -2458,7 +2458,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp) else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) { - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), + unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode)); tree bool_type = build_nonstandard_boolean_type (elem_bits); return build_vector_type_for_mode (bool_type, mode); diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc index d718f28cc86..987e3d26c46 100644 --- a/gcc/fortran/trans-types.cc +++ b/gcc/fortran/trans-types.cc @@ -3403,7 +3403,7 @@ gfc_type_for_mode (machine_mode mode, int unsignedp) else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) { - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), + unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode)); tree bool_type = build_nonstandard_boolean_type (elem_bits); return build_vector_type_for_mode (bool_type, mode); diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc index e85a4bfe949..d5c871a533c 100644 --- a/gcc/go/go-lang.cc +++ b/gcc/go/go-lang.cc @@ -414,7 +414,7 @@ go_langhook_type_for_mode (machine_mode mode, int unsignedp) if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) { - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), + unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode)); tree bool_type = build_nonstandard_boolean_type (elem_bits); return build_vector_type_for_mode (bool_type, mode); diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc index 52d7626e92e..14d419c2013 100644 --- a/gcc/lto/lto-lang.cc +++ b/gcc/lto/lto-lang.cc @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p) else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) { - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), + unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode)); tree bool_type = build_nonstandard_boolean_type (elem_bits); return build_vector_type_for_mode (bool_type, mode); diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc index 8243d4cf5c6..66e859cd70c 100644 --- a/gcc/rust/backend/rust-tree.cc +++ b/gcc/rust/backend/rust-tree.cc @@ -5320,7 +5320,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp) && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) { unsigned int elem_bits - = vector_element_size (GET_MODE_BITSIZE (mode), GET_MODE_NUNITS (mode)); + = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode)); tree bool_type = build_nonstandard_boolean_type (elem_bits); return build_vector_type_for_mode (bool_type, mode); } diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 99cbdd47d93..d7315d82aa3 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7076,7 +7076,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec , /* CONST_VECTOR_ELT follows target memory order, so no shuffling is necessary. The only complication is that MODE_VECTOR_BOOL vectors can have several elements per byte. */ - unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode), + unsigned int
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
On Thu, Jun 29, 2023 at 11:10 AM Robin Dapp via Gcc-patches wrote: > > > Yeah, that part is OK, and was the case I was thinking about when > > I said OK yesterday. But now that we allow BITSIZE != PRECISION, > > it's possible for BITSIZE - PRECISION to be more than a full byte, > > in which case the new loop would not initialise every byte of > > the mode. > > Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1. Luckily > this cannot happen with RVV as all we do is adjust the precision > of the modes that have BITSIZE == 8. I'm going to add an assert. > Juzhe would rather work around that in the backend, though. > > The other thing I just noticed is > > tree > build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode) > { > gcc_assert (mask_mode != BLKmode); > > unsigned HOST_WIDE_INT esize; > if (VECTOR_MODE_P (mask_mode)) > { > poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode); > esize = vector_element_size (vsize, nunits); > } > else > esize = 1; > > tree bool_type = build_nonstandard_boolean_type (esize); > > return make_vector_type (bool_type, nunits, mask_mode); > } > > which gives us wrong precision as we rely on the BITSIZE here as well. > This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for > VNx2BI. This should probably use GET_MODE_PRECISION as well. OK if it bootstraps/tests on both aarch64 and riscv. Richard. > > Maybe this isn't a problem per se but to me it appears > just wrong. > > Regards > Robin >
Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
No, I am not saying I want to fix it in RISC-V backend. Actually, if you can quickly land the fix in generic codes and not block of the RISC-V following patches. I am glad to see. Otherwise, I prefer to fix it RISC-V backend for now if it is not a big issue for performance and defer it to GCC-15 to make it perfect. The reason why I plan that is global reviewers bandwidth is very limit. We should make the highest priority auto-vectorizaiton middle-end support first and then let's come back to see the corner case issues. Thanks. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-06-29 17:09 To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; palmer; richard.sandiford CC: rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI > Yeah, that part is OK, and was the case I was thinking about when > I said OK yesterday. But now that we allow BITSIZE != PRECISION, > it's possible for BITSIZE - PRECISION to be more than a full byte, > in which case the new loop would not initialise every byte of > the mode. Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1. Luckily this cannot happen with RVV as all we do is adjust the precision of the modes that have BITSIZE == 8. I'm going to add an assert. Juzhe would rather work around that in the backend, though. The other thing I just noticed is tree build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode) { gcc_assert (mask_mode != BLKmode); unsigned HOST_WIDE_INT esize; if (VECTOR_MODE_P (mask_mode)) { poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode); esize = vector_element_size (vsize, nunits); } else esize = 1; tree bool_type = build_nonstandard_boolean_type (esize); return make_vector_type (bool_type, nunits, mask_mode); } which gives us wrong precision as we rely on the BITSIZE here as well. This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for VNx2BI. Maybe this isn't a problem per se but to me it appears just wrong. Regards Robin
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
> Yeah, that part is OK, and was the case I was thinking about when > I said OK yesterday. But now that we allow BITSIZE != PRECISION, > it's possible for BITSIZE - PRECISION to be more than a full byte, > in which case the new loop would not initialise every byte of > the mode. Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1. Luckily this cannot happen with RVV as all we do is adjust the precision of the modes that have BITSIZE == 8. I'm going to add an assert. Juzhe would rather work around that in the backend, though. The other thing I just noticed is tree build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode) { gcc_assert (mask_mode != BLKmode); unsigned HOST_WIDE_INT esize; if (VECTOR_MODE_P (mask_mode)) { poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode); esize = vector_element_size (vsize, nunits); } else esize = 1; tree bool_type = build_nonstandard_boolean_type (esize); return make_vector_type (bool_type, nunits, mask_mode); } which gives us wrong precision as we rely on the BITSIZE here as well. This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for VNx2BI. Maybe this isn't a problem per se but to me it appears just wrong. Regards Robin
Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Yes, we have no choice since DSE is base on BYTESIZE. So I walk around in RISC-V backend making VNx1BI, VNx2BI, VNx4BI precision different with VNx8BI to prevent incorrect DSE. I think such issue can be addressed when we adjust everything using BITSIZE instead of BYTESIZE but it may change to much. I prefer it to be GCC-15 (such issue can be walk around in RISC-V backend) since we have to much things need to be landed in GCC-14. Thanks. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-06-29 16:53 To: juzhe.zh...@rivai.ai; gcc-patches; jeffreyalaw; kito.cheng; Kito.cheng; palmer; palmer; richard.sandiford CC: rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI >>> are we absolutely sure this is the only problem we will have >>> with precision != bitsize and it is confined to the backend? > Yes. With vinfo.vector_mode == VNx4SI mask_type = get_mask_type_for_scalar_type (vinfo, int) mask_type is: vector(4) I.e. the precision is 2. This is definitely fishy and related to the same problem. I would almost bet that something in the middle-end relies on the precision for some optimization but we just haven't hit it yet. Then we have vector(2) (precision 4) as a mask type for vector(2) long int. Likewise we would likely have a precision of 8 for a vector(1)? Those might be less severe but still... And that's just what I'm seeing spontaneously after like five minutes. Regards Robin
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Robin Dapp writes: >> Sorry, only realised later, but: if the precision can cover fewer >> bytes than the bitsize, I suppose there ought to be some zero-byte >> padding at the end as well. > It looks like this problem, and also the padding, has been discussed > before when the precision of VNx1BI etc. was first adjusted in the > RISC-V backend? Very probably. Can't remember now. > I didn't immediately get the padding, though. So if we e.g. have a > VNx2BI constant {0, 1} what would we pad the resulting value "2" to? > A full byte? Yeah, that part is OK, and was the case I was thinking about when I said OK yesterday. But now that we allow BITSIZE != PRECISION, it's possible for BITSIZE - PRECISION to be more than a full byte, in which case the new loop would not initialise every byte of the mode. I vaguely remembered that that could happen for RVV_FIXED_VLMAX, but perhaps I misremember. If it can't happen then an assert would be OK instead. Thanks, Richard
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
>>> are we absolutely sure this is the only problem we will have >>> with precision != bitsize and it is confined to the backend? > Yes. With vinfo.vector_mode == VNx4SI mask_type = get_mask_type_for_scalar_type (vinfo, int) mask_type is: vector(4) I.e. the precision is 2. This is definitely fishy and related to the same problem. I would almost bet that something in the middle-end relies on the precision for some optimization but we just haven't hit it yet. Then we have vector(2) (precision 4) as a mask type for vector(2) long int. Likewise we would likely have a precision of 8 for a vector(1)? Those might be less severe but still... And that's just what I'm seeing spontaneously after like five minutes. Regards Robin
Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
>> are we absolutely sure this is the only problem we will have >> with precision != bitsize and it is confined to the backend? Yes. >>I would >>not dare to make that call. How does DSE come in here at all as you >>keep mentioning it? I mentioned DSE is because: We have DSE issue before so we use ADJUST_PRECISION to make PRECISON != BITSIZE but we still to walk around this DSE issue: https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f However, this fix patch fixed DSE issue which makes PRECISON != BITSIZE, then GCC will generate padding bits for it which we don't want it. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-06-29 16:14 To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; palmer; richard.sandiford CC: rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI I grep'ed a bit and found several more instances of the same pattern which would probably all have to be adjusted (frontend-related mostly but also in native_encode_rtx). Most likely they would all have to be adjusted? > Sorry, only realised later, but: if the precision can cover fewer > bytes than the bitsize, I suppose there ought to be some zero-byte > padding at the end as well. It looks like this problem, and also the padding, has been discussed before when the precision of VNx1BI etc. was first adjusted in the RISC-V backend? I didn't immediately get the padding, though. So if we e.g. have a VNx2BI constant {0, 1} what would we pad the resulting value "2" to? A full byte? Juzhe, are we absolutely sure this is the only problem we will have with precision != bitsize and it is confined to the backend? I would not dare to make that call. How does DSE come in here at all as you keep mentioning it? Regards Robin
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
I grep'ed a bit and found several more instances of the same pattern which would probably all have to be adjusted (frontend-related mostly but also in native_encode_rtx). Most likely they would all have to be adjusted? > Sorry, only realised later, but: if the precision can cover fewer > bytes than the bitsize, I suppose there ought to be some zero-byte > padding at the end as well. It looks like this problem, and also the padding, has been discussed before when the precision of VNx1BI etc. was first adjusted in the RISC-V backend? I didn't immediately get the padding, though. So if we e.g. have a VNx2BI constant {0, 1} what would we pad the resulting value "2" to? A full byte? Juzhe, are we absolutely sure this is the only problem we will have with precision != bitsize and it is confined to the backend? I would not dare to make that call. How does DSE come in here at all as you keep mentioning it? Regards Robin
Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Yes. There is a trick fix in RVV. Ideally, each mode should have PRECISION == BITSIZE. However, for RVV, there is a bug which cause incorrect DSE. We have VNx1BI (occupy 1bit), VNx2BI (occupy 2bit), VNx4BI (occupy 4bit), VNx8BI (occupy 8bit), since they are having same BYTESIZE, it cause incorrect DSE. So we add a trick (ADJUST_PRECISION) to fix it: https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f which will prevent the incorrect DSE. But the maskbit layout in memory comes wrong since the inconsistency between PRECISION and BITSIZE. So, I force GCC handle this in the RISC-V backend for VNx1BI/VNx2BI/VNx4BI. I think this is RISC-V backend issue and can be well addressed in RISC-V port (as this patch I post). No need to bother generic codes since other target could not have the same issues. Thanks. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-06-29 15:53 To: Robin Dapp via Gcc-patches CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI Richard Sandiford writes: > Robin Dapp via Gcc-patches writes: >> Hi Juzhe, >> >> I find the bug description rather confusing. What I can see is that >> the constant in the literal pool is indeed wrong but how would DSE or >> so play a role there? Particularly only for the smaller modes? >> >> My suspicion would be that the constant in the literal/constant pool >> is wrong from start to finish. >> >> I just played around with the following hunk: >> >> diff --git a/gcc/varasm.cc b/gcc/varasm.cc >> index 542315f88cd..5223c08924f 100644 >> --- a/gcc/varasm.cc >> +++ b/gcc/varasm.cc >> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, >> unsigned int align) >>whole element. Often this is byte_mode and contains more >>than one element. */ >> unsigned int nelts = GET_MODE_NUNITS (mode); >> - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts; >> + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; >> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); >> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require >> (); >> >> With this all your examples pass for me. We then pack e.g. 16 VNx2BI >> elements >> into an int and not just 8. It would also explain why it works for modes >> where PRECISION == BITSIZE. Now it will certainly require a more thorough >> analysis but maybe it's a start? > > Yeah. Preapproved for trunk & any necessary branches. Sorry, only realised later, but: if the precision can cover fewer bytes than the bitsize, I suppose there ought to be some zero-byte padding at the end as well. Thanks, Richard
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Richard Sandiford writes: > Robin Dapp via Gcc-patches writes: >> Hi Juzhe, >> >> I find the bug description rather confusing. What I can see is that >> the constant in the literal pool is indeed wrong but how would DSE or >> so play a role there? Particularly only for the smaller modes? >> >> My suspicion would be that the constant in the literal/constant pool >> is wrong from start to finish. >> >> I just played around with the following hunk: >> >> diff --git a/gcc/varasm.cc b/gcc/varasm.cc >> index 542315f88cd..5223c08924f 100644 >> --- a/gcc/varasm.cc >> +++ b/gcc/varasm.cc >> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, >> unsigned int align) >>whole element. Often this is byte_mode and contains more >>than one element. */ >> unsigned int nelts = GET_MODE_NUNITS (mode); >> - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts; >> + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; >> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); >> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require >> (); >> >> With this all your examples pass for me. We then pack e.g. 16 VNx2BI >> elements >> into an int and not just 8. It would also explain why it works for modes >> where PRECISION == BITSIZE. Now it will certainly require a more thorough >> analysis but maybe it's a start? > > Yeah. Preapproved for trunk & any necessary branches. Sorry, only realised later, but: if the precision can cover fewer bytes than the bitsize, I suppose there ought to be some zero-byte padding at the end as well. Thanks, Richard
Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Ok. Plz go ahead commit this change with the testcases. Then it won't block the following patches. Thanks. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-06-29 04:42 To: Robin Dapp via Gcc-patches CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI Robin Dapp via Gcc-patches writes: > Hi Juzhe, > > I find the bug description rather confusing. What I can see is that > the constant in the literal pool is indeed wrong but how would DSE or > so play a role there? Particularly only for the smaller modes? > > My suspicion would be that the constant in the literal/constant pool > is wrong from start to finish. > > I just played around with the following hunk: > > diff --git a/gcc/varasm.cc b/gcc/varasm.cc > index 542315f88cd..5223c08924f 100644 > --- a/gcc/varasm.cc > +++ b/gcc/varasm.cc > @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, > unsigned int align) >whole element. Often this is byte_mode and contains more >than one element. */ > unsigned int nelts = GET_MODE_NUNITS (mode); > - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts; > + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; > unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); > scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); > > With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements > into an int and not just 8. It would also explain why it works for modes > where PRECISION == BITSIZE. Now it will certainly require a more thorough > analysis but maybe it's a start? Yeah. Preapproved for trunk & any necessary branches. Thanks, Richard
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Robin Dapp via Gcc-patches writes: > Hi Juzhe, > > I find the bug description rather confusing. What I can see is that > the constant in the literal pool is indeed wrong but how would DSE or > so play a role there? Particularly only for the smaller modes? > > My suspicion would be that the constant in the literal/constant pool > is wrong from start to finish. > > I just played around with the following hunk: > > diff --git a/gcc/varasm.cc b/gcc/varasm.cc > index 542315f88cd..5223c08924f 100644 > --- a/gcc/varasm.cc > +++ b/gcc/varasm.cc > @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, > unsigned int align) >whole element. Often this is byte_mode and contains more >than one element. */ > unsigned int nelts = GET_MODE_NUNITS (mode); > - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts; > + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; > unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); > scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); > > With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements > into an int and not just 8. It would also explain why it works for modes > where PRECISION == BITSIZE. Now it will certainly require a more thorough > analysis but maybe it's a start? Yeah. Preapproved for trunk & any necessary branches. Thanks, Richard
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Hi Juzhe, I find the bug description rather confusing. What I can see is that the constant in the literal pool is indeed wrong but how would DSE or so play a role there? Particularly only for the smaller modes? My suspicion would be that the constant in the literal/constant pool is wrong from start to finish. I just played around with the following hunk: diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 542315f88cd..5223c08924f 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align) whole element. Often this is byte_mode and contains more than one element. */ unsigned int nelts = GET_MODE_NUNITS (mode); - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts; + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts; unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT); scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements into an int and not just 8. It would also explain why it works for modes where PRECISION == BITSIZE. Now it will certainly require a more thorough analysis but maybe it's a start? Regards Robin
Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Try this: https://godbolt.org/z/x7bM5Pr84 juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-06-29 02:11 To: Juzhe-Zhong; gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI On 6/28/23 03:47, Juzhe-Zhong wrote: > This bug blocks the following patches. > > GCC doesn't know RVV is using compact mask model. > Consider this following case: > > #define N 16 > > int > main () > { >int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1}; >int8_t out[N] = {0}; >for (int8_t i = 0; i < N; ++i) > if (mask[i]) >out[i] = i; >for (int8_t i = 0; i < N; ++i) > { >if (mask[i]) > assert (out[i] == i); >else > assert (out[i] == 0); > } > } > > Before this patch, the pre-calculated mask in constant memory pool: > .LC1: > .byte 68 > 0b01000100 > > This is incorrect, such case failed in execution. > > After this patch: > .LC1: > .byte 10 > 0b1010 So I don't get anything like this in my testing. What are the precise arguments you're using to build the testcase? I'm compiling the test use a trunk compiler with -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv I get the attached code both before and after your patch. Clearly I'm doing something different/wrong.So my request is for the precise command line you're using and the before/after resulting assembly code. Jeff
Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
On 6/28/23 03:47, Juzhe-Zhong wrote: This bug blocks the following patches. GCC doesn't know RVV is using compact mask model. Consider this following case: #define N 16 int main () { int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1}; int8_t out[N] = {0}; for (int8_t i = 0; i < N; ++i) if (mask[i]) out[i] = i; for (int8_t i = 0; i < N; ++i) { if (mask[i]) assert (out[i] == i); else assert (out[i] == 0); } } Before this patch, the pre-calculated mask in constant memory pool: .LC1: .byte 68 > 0b01000100 This is incorrect, such case failed in execution. After this patch: .LC1: .byte 10 > 0b1010 So I don't get anything like this in my testing. What are the precise arguments you're using to build the testcase? I'm compiling the test use a trunk compiler with -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv I get the attached code both before and after your patch. Clearly I'm doing something different/wrong.So my request is for the precise command line you're using and the before/after resulting assembly code. Jeff.file "j.c" .option nopic .attribute arch, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0" .attribute unaligned_access, 0 .attribute stack_align, 16 .text .section.rodata.str1.8,"aMS",@progbits,1 .align 3 .LC1: .string "j.c" .align 3 .LC2: .string "out[i] == i" .align 3 .LC3: .string "out[i] == 0" .section.text.startup,"ax",@progbits .align 1 .globl main .type main, @function main: .LFB0: .cfi_startproc lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) ld a4,0(a5) ld a5,8(a5) addisp,sp,-48 .cfi_def_cfa_offset 48 vsetivlizero,16,e8,m1,ta,ma sd zero,16(sp) sd a4,0(sp) sd a5,8(sp) sd ra,40(sp) .cfi_offset 1, -8 addia5,sp,16 sd zero,24(sp) vid.v v1 vl1re8.vv0,0(sp) vmsne.viv0,v0,0 vsetvli a4,zero,e8,m1,ta,ma vse8.v v1,0(a5),v0.t lbu a5,16(sp) bne a5,zero,.L2 lbu a4,17(sp) li a5,1 bne a4,a5,.L3 lbu a5,18(sp) bne a5,zero,.L2 lbu a4,19(sp) li a5,3 bne a4,a5,.L3 lbu a5,20(sp) bne a5,zero,.L2 lbu a4,21(sp) li a5,5 bne a4,a5,.L3 lbu a5,22(sp) bne a5,zero,.L2 lbu a4,23(sp) li a5,7 bne a4,a5,.L3 lbu a5,24(sp) bne a5,zero,.L2 lbu a4,25(sp) li a5,9 bne a4,a5,.L3 lbu a5,26(sp) bne a5,zero,.L2 lbu a4,27(sp) li a5,11 bne a4,a5,.L3 lbu a5,28(sp) bne a5,zero,.L2 lbu a4,29(sp) li a5,13 bne a4,a5,.L3 lbu a5,30(sp) bne a5,zero,.L2 lbu a4,31(sp) li a5,15 bne a4,a5,.L3 ld ra,40(sp) .cfi_remember_state .cfi_restore 1 li a0,0 addisp,sp,48 .cfi_def_cfa_offset 0 jr ra .L2: .cfi_restore_state lui a3,%hi(__PRETTY_FUNCTION__.0) lui a1,%hi(.LC1) lui a0,%hi(.LC3) addia3,a3,%lo(__PRETTY_FUNCTION__.0) li a2,18 addia1,a1,%lo(.LC1) addia0,a0,%lo(.LC3) call__assert_fail .L3: lui a3,%hi(__PRETTY_FUNCTION__.0) lui a1,%hi(.LC1) lui a0,%hi(.LC2) addia3,a3,%lo(__PRETTY_FUNCTION__.0) li a2,16 addia1,a1,%lo(.LC1) addia0,a0,%lo(.LC2) call__assert_fail .cfi_endproc .LFE0: .size main, .-main .section.rodata .align 3 .set.LANCHOR0,. + 0 .LC0: .string "" .string "\001" .string "\001" .string "\001" .string "\001" .string "\001" .string "\001" .string "\001" .ascii "\001" .section.srodata,"a" .align 3 .type __PRETTY_FUNCTION__.0, @object .size __PRETTY_FUNCTION__.0, 5 __PRETTY_FUNCTION__.0: .string "main" .ident "GCC: (GNU) 14.0.0 20230628 (experimental)" .section.note.GNU-stack,"",@progbits