Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-07-04 Thread Robin Dapp via Gcc-patches
> Kito (or somebody else), would you mind doing a RISC-V bootstrap?  It would
> take forever on my machine.  Thank you.
I did a bootstrap myself now and it finally finished.  Going to commit the
attached tomorrow.

Regards
 Robin

Subject: [PATCH] Change MODE_BITSIZE to MODE_PRECISION for MODE_VECTOR_BOOL.

RISC-V lowers the TYPE_PRECISION for MODE_VECTOR_BOOL vectors in order
to distinguish between VNx1BI, VNx2BI, VNx4BI and VNx8BI.

This patch adjusts uses of MODE_VECTOR_BOOL to use GET_MODE_PRECISION
instead of GET_MODE_BITSIZE.

The RISC-V tests are provided by Juzhe.

Co-Authored-By: Juzhe-Zhong 

gcc/c-family/ChangeLog:

* c-common.cc (c_common_type_for_mode): Use GET_MODE_PRECISION.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Ditto.
* simplify-rtx.cc (native_encode_rtx): Ditto.
(native_decode_vector_rtx): Ditto.
(simplify_const_vector_byte_offset): Ditto.
(simplify_const_vector_subreg): Ditto.
* tree.cc (build_truth_vector_type_for_mode): Ditto.
* varasm.cc (output_constant_pool_2): Ditto.

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_type_for_mode): Ditto.

gcc/go/ChangeLog:

* go-lang.cc (go_langhook_type_for_mode): Ditto.

gcc/lto/ChangeLog:

* lto-lang.cc (lto_type_for_mode): Ditto.

gcc/rust/ChangeLog:

* backend/rust-tree.cc (c_common_type_for_mode): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
---
 gcc/c-family/c-common.cc  |  2 +-
 gcc/config/riscv/riscv-v.cc   | 12 
 gcc/fortran/trans-types.cc|  2 +-
 gcc/go/go-lang.cc |  2 +-
 gcc/lto/lto-lang.cc   |  2 +-
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/simplify-rtx.cc   | 10 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-1.c   | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-10.c  | 22 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-11.c  | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-12.c  | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-13.c  | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-14.c  | 24 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-2.c   | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-3.c   | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-4.c   | 23 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-5.c   | 25 
 .../riscv/rvv/autovec/vls-vlmax/bitmask-6.c   | 27 +
 .../riscv/rvv/autovec/vls-vlmax/bitmask-7.c   | 30 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-8.c   | 30 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-9.c   | 30 +++
 gcc/tree.cc   |  2 +-
 gcc/varasm.cc |  8 -
 23 files changed, 374 insertions(+), 17 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
 create mode 100644 

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches
>> Hi Robin:
>>
>>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
>>> index 52d7626e92e..14d419c2013 100644
>>> --- a/gcc/lto/lto-lang.cc
>>> +++ b/gcc/lto/lto-lang.cc
>>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>>>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>>>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>>>  {
>>> -  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE 
>>> (mode),
>>> +  unsigned int elem_bits = vectwhereor_element_size 
>>> (GET_MODE_PRECISION (mode),
>>
>> This seems weird?

Indeed :D Must be an accidental middle-click in Thunderbird.  I just
re-checked and the diff itself is fine.

> FWIW, I bootstrapped & regression-tested the patch with that fixed
> on aarch64-linux-gnu (all languages).
> 
> So OK with the above fixed from my POV.
Oh, thanks!  Mine is still running, not even with all languages.  I picked
the M1 from the compile farm which only has eight cores.

Kito (or somebody else), would you mind doing a RISC-V bootstrap?  It would
take forever on my machine.  Thank you.

Regards
 Robin


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Richard Sandiford via Gcc-patches
Kito Cheng  writes:
> Hi Robin:
>
>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
>> index 52d7626e92e..14d419c2013 100644
>> --- a/gcc/lto/lto-lang.cc
>> +++ b/gcc/lto/lto-lang.cc
>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>>  {
>> -  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
>> +  unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION 
>> (mode),
>
> This seems weird?

FWIW, I bootstrapped & regression-tested the patch with that fixed
on aarch64-linux-gnu (all languages).

So OK with the above fixed from my POV.

Thanks,
Richard

>
>> GET_MODE_NUNITS (mode));
>>tree bool_type = build_nonstandard_boolean_type (elem_bits);
>>return build_vector_type_for_mode (bool_type, mode);


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Kito Cheng via Gcc-patches
Hi Robin:

> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
> index 52d7626e92e..14d419c2013 100644
> --- a/gcc/lto/lto-lang.cc
> +++ b/gcc/lto/lto-lang.cc
> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>  {
> -  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
> +  unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION 
> (mode),

This seems weird?

> GET_MODE_NUNITS (mode));
>tree bool_type = build_nonstandard_boolean_type (elem_bits);
>return build_vector_type_for_mode (bool_type, mode);


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches
> This should probably use GET_MODE_PRECISION as well.
> 
> OK if it bootstraps/tests on both aarch64 and riscv.
> 
> Richard.

I found a several other instances, also in the frontends that
I'm not exactly sure about.  I'm currently testing this but aarch64
bootstrap is still going to take a while, various aarch compile
farm machines are down?

Regards
 Robin

>From ef919a27f4a156afeca6b4825e6029d9f44be556 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 28 Jun 2023 20:59:29 +0200
Subject: [PATCH] mode_bitsize -> precision.

bitsize -> precision.
---
 gcc/c-family/c-common.cc  |  2 +-
 gcc/fortran/trans-types.cc|  2 +-
 gcc/go/go-lang.cc |  2 +-
 gcc/lto/lto-lang.cc   |  2 +-
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/simplify-rtx.cc   | 10 +-
 gcc/tree.cc   |  2 +-
 gcc/varasm.cc |  2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 34566a342bd..6ab63dae997 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2458,7 +2458,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index d718f28cc86..987e3d26c46 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3403,7 +3403,7 @@ gfc_type_for_mode (machine_mode mode, int unsignedp)
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc
index e85a4bfe949..d5c871a533c 100644
--- a/gcc/go/go-lang.cc
+++ b/gcc/go/go-lang.cc
@@ -414,7 +414,7 @@ go_langhook_type_for_mode (machine_mode mode, int unsignedp)
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
index 52d7626e92e..14d419c2013 100644
--- a/gcc/lto/lto-lang.cc
+++ b/gcc/lto/lto-lang.cc
@@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION 
(mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index 8243d4cf5c6..66e859cd70c 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -5320,7 +5320,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
   unsigned int elem_bits
-   = vector_element_size (GET_MODE_BITSIZE (mode), GET_MODE_NUNITS (mode));
+   = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS 
(mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
 }
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 99cbdd47d93..d7315d82aa3 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7076,7 +7076,7 @@ native_encode_rtx (machine_mode mode, rtx x, 
vec ,
   /* CONST_VECTOR_ELT follows target memory order, so no shuffling
 is necessary.  The only complication is that MODE_VECTOR_BOOL
 vectors can have several elements per byte.  */
-  unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int 

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Richard Biener via Gcc-patches
On Thu, Jun 29, 2023 at 11:10 AM Robin Dapp via Gcc-patches
 wrote:
>
> > Yeah, that part is OK, and was the case I was thinking about when
> > I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> > it's possible for BITSIZE - PRECISION to be more than a full byte,
> > in which case the new loop would not initialise every byte of
> > the mode.
>
> Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
> this cannot happen with RVV as all we do is adjust the precision
> of the modes that have BITSIZE == 8.  I'm going to add an assert.
> Juzhe would rather work around that in the backend, though.
>
> The other thing I just noticed is
>
> tree
> build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
> {
>   gcc_assert (mask_mode != BLKmode);
>
>   unsigned HOST_WIDE_INT esize;
>   if (VECTOR_MODE_P (mask_mode))
> {
>   poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
>   esize = vector_element_size (vsize, nunits);
> }
>   else
> esize = 1;
>
>   tree bool_type = build_nonstandard_boolean_type (esize);
>
>   return make_vector_type (bool_type, nunits, mask_mode);
> }
>
> which gives us wrong precision as we rely on the BITSIZE here as well.
> This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
> VNx2BI.

This should probably use GET_MODE_PRECISION as well.

OK if it bootstraps/tests on both aarch64 and riscv.

Richard.

>
> Maybe this isn't a problem per se but to me it appears
> just wrong.
>
> Regards
>  Robin
>


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
No, I am not saying I want to fix it in RISC-V backend.
Actually, if you can quickly land the fix in generic codes and not block of the 
RISC-V following patches.
I am glad to see. Otherwise, I prefer to fix it RISC-V backend for now if it is 
not a big issue for performance and defer it to GCC-15 to make it perfect.

The reason why I plan that is global reviewers bandwidth is very limit.
We should make the highest priority auto-vectorizaiton middle-end support first 
and then let's come back to see the corner case issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 17:09
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; 
palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.
 
Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8.  I'm going to add an assert.
Juzhe would rather work around that in the backend, though.
 
The other thing I just noticed is
 
tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
  gcc_assert (mask_mode != BLKmode);
 
  unsigned HOST_WIDE_INT esize;
  if (VECTOR_MODE_P (mask_mode))
{
  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
  esize = vector_element_size (vsize, nunits);
}
  else
esize = 1;
 
  tree bool_type = build_nonstandard_boolean_type (esize);
 
  return make_vector_type (bool_type, nunits, mask_mode);
}
 
which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.
 
Maybe this isn't a problem per se but to me it appears
just wrong.
 
Regards
Robin
 
 


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches
> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.

Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8.  I'm going to add an assert.
Juzhe would rather work around that in the backend, though.

The other thing I just noticed is

tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
  gcc_assert (mask_mode != BLKmode);

  unsigned HOST_WIDE_INT esize;
  if (VECTOR_MODE_P (mask_mode))
{
  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
  esize = vector_element_size (vsize, nunits);
}
  else
esize = 1;

  tree bool_type = build_nonstandard_boolean_type (esize);

  return make_vector_type (bool_type, nunits, mask_mode);
}

which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.

Maybe this isn't a problem per se but to me it appears
just wrong.

Regards
 Robin



Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
Yes, we have no choice since DSE is base on BYTESIZE.

So I walk around in RISC-V backend making VNx1BI, VNx2BI, VNx4BI precision 
different with VNx8BI to prevent incorrect DSE.
I think such issue can be addressed when we adjust everything using BITSIZE 
instead of BYTESIZE but it may change to much.
I prefer it to be GCC-15 (such issue can be walk around in RISC-V backend) 
since we have to much things need to be landed in GCC-14.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 16:53
To: juzhe.zh...@rivai.ai; gcc-patches; jeffreyalaw; kito.cheng; Kito.cheng; 
palmer; palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
>>> are we absolutely sure this is the only problem we will have
>>> with precision != bitsize and it is confined to the backend?
> Yes.
 
With vinfo.vector_mode == VNx4SI
mask_type = get_mask_type_for_scalar_type (vinfo, int)
mask_type is:
vector(4) 
 
I.e. the precision is 2.  This is definitely fishy and related
to the same problem.  I would almost bet that something in the
middle-end relies on the precision for some optimization but
we just haven't hit it yet.
 
Then we have
vector(2)  (precision 4)
as a mask type for vector(2) long int.
 
Likewise we would likely have a precision of 8 for a vector(1)?
Those might be less severe but still...
 
And that's just what I'm seeing spontaneously after like five
minutes.
 
Regards
Robin
 


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
>> Sorry, only realised later, but: if the precision can cover fewer
>> bytes than the bitsize, I suppose there ought to be some zero-byte
>> padding at the end as well.
> It looks like this problem, and also the padding, has been discussed
> before when the precision of VNx1BI etc. was first adjusted in the
> RISC-V backend?

Very probably.  Can't remember now.

> I didn't immediately get the padding, though.  So if we e.g. have a
> VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
> A full byte?

Yeah, that part is OK, and was the case I was thinking about when
I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
it's possible for BITSIZE - PRECISION to be more than a full byte,
in which case the new loop would not initialise every byte of
the mode.

I vaguely remembered that that could happen for RVV_FIXED_VLMAX,
but perhaps I misremember.  If it can't happen then an assert
would be OK instead.

Thanks,
Richard


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches
>>> are we absolutely sure this is the only problem we will have
>>> with precision != bitsize and it is confined to the backend?
> Yes.

With vinfo.vector_mode == VNx4SI
 mask_type = get_mask_type_for_scalar_type (vinfo, int)
mask_type is:
 vector(4) 

I.e. the precision is 2.  This is definitely fishy and related
to the same problem.  I would almost bet that something in the
middle-end relies on the precision for some optimization but
we just haven't hit it yet.

Then we have
 vector(2)  (precision 4)
as a mask type for vector(2) long int.

Likewise we would likely have a precision of 8 for a vector(1)?
Those might be less severe but still...

And that's just what I'm seeing spontaneously after like five
minutes.

Regards
 Robin


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
>> are we absolutely sure this is the only problem we will have
>> with precision != bitsize and it is confined to the backend?
Yes.


 >>I would
>>not dare to make that call.  How does DSE come in here at all as you
>>keep mentioning it?
I mentioned DSE is because:
We have DSE issue before so we use ADJUST_PRECISION to make PRECISON  != 
BITSIZE but we still to walk around this DSE issue:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f
 

However, this fix patch fixed DSE issue which makes PRECISON != BITSIZE, then 
GCC will generate padding bits for it which we
don't want it.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 16:14
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; 
palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
I grep'ed a bit and found several more instances of the same pattern
which would probably all have to be adjusted (frontend-related mostly
but also in native_encode_rtx).  Most likely they would all have to
be adjusted? 
 
> Sorry, only realised later, but: if the precision can cover fewer
> bytes than the bitsize, I suppose there ought to be some zero-byte
> padding at the end as well.
It looks like this problem, and also the padding, has been discussed
before when the precision of VNx1BI etc. was first adjusted in the
RISC-V backend?
 
I didn't immediately get the padding, though.  So if we e.g. have a
VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
A full byte?
 
Juzhe, are we absolutely sure this is the only problem we will have
with precision != bitsize and it is confined to the backend?  I would
not dare to make that call.  How does DSE come in here at all as you
keep mentioning it?
 
Regards
Robin
 
 


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches
I grep'ed a bit and found several more instances of the same pattern
which would probably all have to be adjusted (frontend-related mostly
but also in native_encode_rtx).  Most likely they would all have to
be adjusted? 

> Sorry, only realised later, but: if the precision can cover fewer
> bytes than the bitsize, I suppose there ought to be some zero-byte
> padding at the end as well.
It looks like this problem, and also the padding, has been discussed
before when the precision of VNx1BI etc. was first adjusted in the
RISC-V backend?

I didn't immediately get the padding, though.  So if we e.g. have a
VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
A full byte?

Juzhe, are we absolutely sure this is the only problem we will have
with precision != bitsize and it is confined to the backend?  I would
not dare to make that call.  How does DSE come in here at all as you
keep mentioning it?

Regards
 Robin



Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
Yes. There is a trick fix in RVV.

Ideally, each mode should have PRECISION == BITSIZE. However, for RVV, there is 
a bug which cause incorrect DSE.
We have VNx1BI (occupy 1bit), VNx2BI (occupy 2bit), VNx4BI (occupy 4bit), 
VNx8BI (occupy 8bit),  since they are having same BYTESIZE,
it cause incorrect DSE.

So we add a trick (ADJUST_PRECISION) to fix it:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f
 
which will prevent the incorrect DSE.

But the maskbit layout in memory comes wrong since the inconsistency between 
PRECISION and BITSIZE. 
So, I force GCC handle this in the RISC-V backend for VNx1BI/VNx2BI/VNx4BI.

I think this is RISC-V backend issue and can be well addressed in RISC-V port 
(as this patch I post). 
No need to bother generic codes since other target could not have the same 
issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-29 15:53
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
Richard Sandiford  writes:
> Robin Dapp via Gcc-patches  writes:
>> Hi Juzhe,
>>
>> I find the bug description rather confusing.  What I can see is that
>> the constant in the literal pool is indeed wrong but how would DSE or
>> so play a role there?  Particularly only for the smaller modes?
>>
>> My suspicion would be that the constant in the literal/constant pool
>> is wrong from start to finish.
>>
>> I just played around with the following hunk:
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 542315f88cd..5223c08924f 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
>> unsigned int align)
>>whole element.  Often this is byte_mode and contains more
>>than one element.  */
>> unsigned int nelts = GET_MODE_NUNITS (mode);
>> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
>> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require 
>> ();
>>
>> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI 
>> elements
>> into an int and not just 8.  It would also explain why it works for modes
>> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
>> analysis but maybe it's a start?
>
> Yeah.  Preapproved for trunk & any necessary branches.
 
Sorry, only realised later, but: if the precision can cover fewer
bytes than the bitsize, I suppose there ought to be some zero-byte
padding at the end as well.
 
Thanks,
Richard
 


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
> Robin Dapp via Gcc-patches  writes:
>> Hi Juzhe,
>>
>> I find the bug description rather confusing.  What I can see is that
>> the constant in the literal pool is indeed wrong but how would DSE or
>> so play a role there?  Particularly only for the smaller modes?
>>
>> My suspicion would be that the constant in the literal/constant pool
>> is wrong from start to finish.
>>
>> I just played around with the following hunk:
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 542315f88cd..5223c08924f 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
>> unsigned int align)
>>whole element.  Often this is byte_mode and contains more
>>than one element.  */
>> unsigned int nelts = GET_MODE_NUNITS (mode);
>> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
>> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require 
>> ();
>>
>> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI 
>> elements
>> into an int and not just 8.  It would also explain why it works for modes
>> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
>> analysis but maybe it's a start?
>
> Yeah.  Preapproved for trunk & any necessary branches.

Sorry, only realised later, but: if the precision can cover fewer
bytes than the bitsize, I suppose there ought to be some zero-byte
padding at the end as well.

Thanks,
Richard


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread 钟居哲
Ok. Plz go ahead commit this change with the testcases.
Then it won't block the following patches.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-29 04:42
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
Robin Dapp via Gcc-patches  writes:
> Hi Juzhe,
>
> I find the bug description rather confusing.  What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there?  Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
> unsigned int align)
>whole element.  Often this is byte_mode and contains more
>than one element.  */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
> into an int and not just 8.  It would also explain why it works for modes
> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
> analysis but maybe it's a start?
 
Yeah.  Preapproved for trunk & any necessary branches.
 
Thanks,
Richard
 


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread Richard Sandiford via Gcc-patches
Robin Dapp via Gcc-patches  writes:
> Hi Juzhe,
>
> I find the bug description rather confusing.  What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there?  Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
> unsigned int align)
>whole element.  Often this is byte_mode and contains more
>than one element.  */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
> into an int and not just 8.  It would also explain why it works for modes
> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
> analysis but maybe it's a start?

Yeah.  Preapproved for trunk & any necessary branches.

Thanks,
Richard


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

I find the bug description rather confusing.  What I can see is that
the constant in the literal pool is indeed wrong but how would DSE or
so play a role there?  Particularly only for the smaller modes?

My suspicion would be that the constant in the literal/constant pool
is wrong from start to finish.

I just played around with the following hunk:

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 542315f88cd..5223c08924f 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
unsigned int align)
   whole element.  Often this is byte_mode and contains more
   than one element.  */
unsigned int nelts = GET_MODE_NUNITS (mode);
-   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
+   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();

With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
into an int and not just 8.  It would also explain why it works for modes
where PRECISION == BITSIZE.  Now it will certainly require a more thorough
analysis but maybe it's a start?

Regards
 Robin



Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread 钟居哲
Try this:
https://godbolt.org/z/x7bM5Pr84 




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 02:11
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
 
 
On 6/28/23 03:47, Juzhe-Zhong wrote:
> This bug blocks the following patches.
> 
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
> 
> #define N 16
> 
> int
> main ()
> {
>int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
>int8_t out[N] = {0};
>for (int8_t i = 0; i < N; ++i)
>  if (mask[i])
>out[i] = i;
>for (int8_t i = 0; i < N; ++i)
>  {
>if (mask[i])
> assert (out[i] == i);
>else
> assert (out[i] == 0);
>  }
> }
> 
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
>  .byte   68 > 0b01000100
> 
> This is incorrect, such case failed in execution.
> 
> After this patch:
> .LC1:
> .byte 10 > 0b1010
So I don't get anything like this in my testing.  What are the precise 
arguments you're using to build the testcase?
 
I'm compiling the test use a trunk compiler with
 
  -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv
 
I get the attached code both before and after your patch.  Clearly I'm 
doing something different/wrong.So my request is for the precise 
command line you're using and the before/after resulting assembly code.
 
Jeff


Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread Jeff Law via Gcc-patches



On 6/28/23 03:47, Juzhe-Zhong wrote:

This bug blocks the following patches.

GCC doesn't know RVV is using compact mask model.
Consider this following case:

#define N 16

int
main ()
{
   int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
   int8_t out[N] = {0};
   for (int8_t i = 0; i < N; ++i)
 if (mask[i])
   out[i] = i;
   for (int8_t i = 0; i < N; ++i)
 {
   if (mask[i])
assert (out[i] == i);
   else
assert (out[i] == 0);
 }
}

Before this patch, the pre-calculated mask in constant memory pool:
.LC1:
 .byte   68 > 0b01000100

This is incorrect, such case failed in execution.

After this patch:
.LC1:
.byte   10 > 0b1010
So I don't get anything like this in my testing.  What are the precise 
arguments you're using to build the testcase?


I'm compiling the test use a trunk compiler with

 -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv

I get the attached code both before and after your patch.  Clearly I'm 
doing something different/wrong.So my request is for the precise 
command line you're using and the before/after resulting assembly code.


Jeff.file   "j.c"
.option nopic
.attribute arch, 
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.section.rodata.str1.8,"aMS",@progbits,1
.align  3
.LC1:
.string "j.c"
.align  3
.LC2:
.string "out[i] == i"
.align  3
.LC3:
.string "out[i] == 0"
.section.text.startup,"ax",@progbits
.align  1
.globl  main
.type   main, @function
main:
.LFB0:
.cfi_startproc
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
ld  a4,0(a5)
ld  a5,8(a5)
addisp,sp,-48
.cfi_def_cfa_offset 48
vsetivlizero,16,e8,m1,ta,ma
sd  zero,16(sp)
sd  a4,0(sp)
sd  a5,8(sp)
sd  ra,40(sp)
.cfi_offset 1, -8
addia5,sp,16
sd  zero,24(sp)
vid.v   v1
vl1re8.vv0,0(sp)
vmsne.viv0,v0,0
vsetvli a4,zero,e8,m1,ta,ma
vse8.v  v1,0(a5),v0.t
lbu a5,16(sp)
bne a5,zero,.L2
lbu a4,17(sp)
li  a5,1
bne a4,a5,.L3
lbu a5,18(sp)
bne a5,zero,.L2
lbu a4,19(sp)
li  a5,3
bne a4,a5,.L3
lbu a5,20(sp)
bne a5,zero,.L2
lbu a4,21(sp)
li  a5,5
bne a4,a5,.L3
lbu a5,22(sp)
bne a5,zero,.L2
lbu a4,23(sp)
li  a5,7
bne a4,a5,.L3
lbu a5,24(sp)
bne a5,zero,.L2
lbu a4,25(sp)
li  a5,9
bne a4,a5,.L3
lbu a5,26(sp)
bne a5,zero,.L2
lbu a4,27(sp)
li  a5,11
bne a4,a5,.L3
lbu a5,28(sp)
bne a5,zero,.L2
lbu a4,29(sp)
li  a5,13
bne a4,a5,.L3
lbu a5,30(sp)
bne a5,zero,.L2
lbu a4,31(sp)
li  a5,15
bne a4,a5,.L3
ld  ra,40(sp)
.cfi_remember_state
.cfi_restore 1
li  a0,0
addisp,sp,48
.cfi_def_cfa_offset 0
jr  ra
.L2:
.cfi_restore_state
lui a3,%hi(__PRETTY_FUNCTION__.0)
lui a1,%hi(.LC1)
lui a0,%hi(.LC3)
addia3,a3,%lo(__PRETTY_FUNCTION__.0)
li  a2,18
addia1,a1,%lo(.LC1)
addia0,a0,%lo(.LC3)
call__assert_fail
.L3:
lui a3,%hi(__PRETTY_FUNCTION__.0)
lui a1,%hi(.LC1)
lui a0,%hi(.LC2)
addia3,a3,%lo(__PRETTY_FUNCTION__.0)
li  a2,16
addia1,a1,%lo(.LC1)
addia0,a0,%lo(.LC2)
call__assert_fail
.cfi_endproc
.LFE0:
.size   main, .-main
.section.rodata
.align  3
.set.LANCHOR0,. + 0
.LC0:
.string ""
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.ascii  "\001"
.section.srodata,"a"
.align  3
.type   __PRETTY_FUNCTION__.0, @object
.size   __PRETTY_FUNCTION__.0, 5
__PRETTY_FUNCTION__.0:
.string "main"
.ident  "GCC: (GNU) 14.0.0 20230628 (experimental)"
.section.note.GNU-stack,"",@progbits