Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
No, I am not saying I want to fix it in RISC-V backend.
Actually, if you can quickly land the fix in generic codes and not block of the 
RISC-V following patches.
I am glad to see. Otherwise, I prefer to fix it RISC-V backend for now if it is 
not a big issue for performance and defer it to GCC-15 to make it perfect.

The reason why I plan that is global reviewers bandwidth is very limit.
We should make the highest priority auto-vectorizaiton middle-end support first 
and then let's come back to see the corner case issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 17:09
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; 
palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.
 
Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8.  I'm going to add an assert.
Juzhe would rather work around that in the backend, though.
 
The other thing I just noticed is
 
tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
  gcc_assert (mask_mode != BLKmode);
 
  unsigned HOST_WIDE_INT esize;
  if (VECTOR_MODE_P (mask_mode))
{
  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
  esize = vector_element_size (vsize, nunits);
}
  else
esize = 1;
 
  tree bool_type = build_nonstandard_boolean_type (esize);
 
  return make_vector_type (bool_type, nunits, mask_mode);
}
 
which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.
 
Maybe this isn't a problem per se but to me it appears
just wrong.
 
Regards
Robin
 
 


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
Yes, we have no choice since DSE is base on BYTESIZE.

So I walk around in RISC-V backend making VNx1BI, VNx2BI, VNx4BI precision 
different with VNx8BI to prevent incorrect DSE.
I think such issue can be addressed when we adjust everything using BITSIZE 
instead of BYTESIZE but it may change to much.
I prefer it to be GCC-15 (such issue can be walk around in RISC-V backend) 
since we have to much things need to be landed in GCC-14.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 16:53
To: juzhe.zh...@rivai.ai; gcc-patches; jeffreyalaw; kito.cheng; Kito.cheng; 
palmer; palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
>>> are we absolutely sure this is the only problem we will have
>>> with precision != bitsize and it is confined to the backend?
> Yes.
 
With vinfo.vector_mode == VNx4SI
mask_type = get_mask_type_for_scalar_type (vinfo, int)
mask_type is:
vector(4) 
 
I.e. the precision is 2.  This is definitely fishy and related
to the same problem.  I would almost bet that something in the
middle-end relies on the precision for some optimization but
we just haven't hit it yet.
 
Then we have
vector(2)  (precision 4)
as a mask type for vector(2) long int.
 
Likewise we would likely have a precision of 8 for a vector(1)?
Those might be less severe but still...
 
And that's just what I'm seeing spontaneously after like five
minutes.
 
Regards
Robin
 


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
>> are we absolutely sure this is the only problem we will have
>> with precision != bitsize and it is confined to the backend?
Yes.


 >>I would
>>not dare to make that call.  How does DSE come in here at all as you
>>keep mentioning it?
I mentioned DSE is because:
We have DSE issue before so we use ADJUST_PRECISION to make PRECISON  != 
BITSIZE but we still to walk around this DSE issue:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f
 

However, this fix patch fixed DSE issue which makes PRECISON != BITSIZE, then 
GCC will generate padding bits for it which we
don't want it.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 16:14
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; 
palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
I grep'ed a bit and found several more instances of the same pattern
which would probably all have to be adjusted (frontend-related mostly
but also in native_encode_rtx).  Most likely they would all have to
be adjusted? 
 
> Sorry, only realised later, but: if the precision can cover fewer
> bytes than the bitsize, I suppose there ought to be some zero-byte
> padding at the end as well.
It looks like this problem, and also the padding, has been discussed
before when the precision of VNx1BI etc. was first adjusted in the
RISC-V backend?
 
I didn't immediately get the padding, though.  So if we e.g. have a
VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
A full byte?
 
Juzhe, are we absolutely sure this is the only problem we will have
with precision != bitsize and it is confined to the backend?  I would
not dare to make that call.  How does DSE come in here at all as you
keep mentioning it?
 
Regards
Robin
 
 


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai
Yes. There is a trick fix in RVV.

Ideally, each mode should have PRECISION == BITSIZE. However, for RVV, there is 
a bug which cause incorrect DSE.
We have VNx1BI (occupy 1bit), VNx2BI (occupy 2bit), VNx4BI (occupy 4bit), 
VNx8BI (occupy 8bit),  since they are having same BYTESIZE,
it cause incorrect DSE.

So we add a trick (ADJUST_PRECISION) to fix it:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f
 
which will prevent the incorrect DSE.

But the maskbit layout in memory comes wrong since the inconsistency between 
PRECISION and BITSIZE. 
So, I force GCC handle this in the RISC-V backend for VNx1BI/VNx2BI/VNx4BI.

I think this is RISC-V backend issue and can be well addressed in RISC-V port 
(as this patch I post). 
No need to bother generic codes since other target could not have the same 
issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-29 15:53
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
Richard Sandiford  writes:
> Robin Dapp via Gcc-patches  writes:
>> Hi Juzhe,
>>
>> I find the bug description rather confusing.  What I can see is that
>> the constant in the literal pool is indeed wrong but how would DSE or
>> so play a role there?  Particularly only for the smaller modes?
>>
>> My suspicion would be that the constant in the literal/constant pool
>> is wrong from start to finish.
>>
>> I just played around with the following hunk:
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 542315f88cd..5223c08924f 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
>> unsigned int align)
>>whole element.  Often this is byte_mode and contains more
>>than one element.  */
>> unsigned int nelts = GET_MODE_NUNITS (mode);
>> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
>> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require 
>> ();
>>
>> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI 
>> elements
>> into an int and not just 8.  It would also explain why it works for modes
>> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
>> analysis but maybe it's a start?
>
> Yeah.  Preapproved for trunk & any necessary branches.
 
Sorry, only realised later, but: if the precision can cover fewer
bytes than the bitsize, I suppose there ought to be some zero-byte
padding at the end as well.
 
Thanks,
Richard
 


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread 钟居哲
Ok. Plz go ahead commit this change with the testcases.
Then it won't block the following patches.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-29 04:42
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
Robin Dapp via Gcc-patches  writes:
> Hi Juzhe,
>
> I find the bug description rather confusing.  What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there?  Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
> unsigned int align)
>whole element.  Often this is byte_mode and contains more
>than one element.  */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
> into an int and not just 8.  It would also explain why it works for modes
> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
> analysis but maybe it's a start?
 
Yeah.  Preapproved for trunk & any necessary branches.
 
Thanks,
Richard
 


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread 钟居哲
Try this:
https://godbolt.org/z/x7bM5Pr84 




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 02:11
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
 
 
On 6/28/23 03:47, Juzhe-Zhong wrote:
> This bug blocks the following patches.
> 
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
> 
> #define N 16
> 
> int
> main ()
> {
>int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
>int8_t out[N] = {0};
>for (int8_t i = 0; i < N; ++i)
>  if (mask[i])
>out[i] = i;
>for (int8_t i = 0; i < N; ++i)
>  {
>if (mask[i])
> assert (out[i] == i);
>else
> assert (out[i] == 0);
>  }
> }
> 
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
>  .byte   68 > 0b01000100
> 
> This is incorrect, such case failed in execution.
> 
> After this patch:
> .LC1:
> .byte 10 > 0b1010
So I don't get anything like this in my testing.  What are the precise 
arguments you're using to build the testcase?
 
I'm compiling the test use a trunk compiler with
 
  -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv
 
I get the attached code both before and after your patch.  Clearly I'm 
doing something different/wrong.So my request is for the precise 
command line you're using and the before/after resulting assembly code.
 
Jeff