Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 3:17 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod  wrote:
> >>
> >> On 7/17/24 4:36 PM, Richard Biener wrote:
> >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  
> >> > wrote:
> >> >>
> >> >> On 7/15/24 6:05 PM, Richard Biener wrote:
> >> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  
> >> >>> wrote:
> >> 
> >>  On 7/15/24 12:16 PM, Tejas Belagod wrote:
> >> > On 7/12/24 6:40 PM, Richard Biener wrote:
> >> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  
> >> >> wrote:
> >> >>>
> >> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> >>  Padding is only an issue for very small vectors - the obvious 
> >>  choice is
> >>  to disallow vector types that would require any padding.  I can 
> >>  hardly
> >>  see where those are faster than using a vector of up to 4 char
> >>  elements.
> >>  Problematic are 1-bit elements with 4, 2 or one element vectors,
> >>  2-bit elements
> >>  with 2 or one element vectors and 4-bit elements with 1 element
> >>  vectors.
> >> >>>
> >> >>> I'd really like to avoid having to support something like
> >> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) 
> >> >>> *
> >> >>> 16)))
> >> >>> _BitInt(2) to say size of long long could be acceptable.
> >> >>
> >> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
> >> >> way to say
> >> >> the element should have n (< 8) bits.
> >> >>
> >>  I have no idea what the stance of supporting _BitInt in C++ are,
> >>  but most certainly diverging support (or even semantics) of the
> >>  vector extension in C vs. C++ is undesirable.
> >> >>>
> >> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> >> >>> didn't
> >> >>> look favorably to _BitInt support in C++, so at least until 
> >> >>> something
> >> >>> like
> >> >>> that is standardized in C++ the answer is probably no.
> >> >>
> >> >> OK, I think that rules out _BitInt use here so while bool is then 
> >> >> natural
> >> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to 
> >> >> specify the
> >> >> number of bits explicitly.  There is signed_bool_precision but like
> >> >> vector_mask it's use is restricted to the GIMPLE frontend because
> >> >> interaction with the rest of the language isn't defined.
> >> >>
> >> >
> >> > Thanks for all the suggestions - really insightful (to me) 
> >> > discussions.
> >> >
> >> > Yeah, BitInt seemed like it was best placed for this, but not having 
> >> > C++
> >> > support is definitely a blocker. But as you say, in the absence of
> >> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> >> > way to specify non-1-bit widths could be overloading vector_size.
> >> >
> >> > Also, I think overloading GIMPLE's vector_mask takes us into the
> >> > earlier-discussed territory of what it should actually mean - it 
> >> > meaning
> >> > the target truth type in GIMPLE and a generic vector extension in 
> >> > the FE
> >> > will probably confuse gcc developers more than users.
> >> >
> >> >> That said - we're mixing two things here.  The desire to have 
> >> >> "proper"
> >> >> svbool (fix: declare in the backend) and the desire to have "packed"
> >> >> bit-precision vectors (for whatever actual reason) as part of the
> >> >> GCC vector extension.
> >> >>
> >> >
> >> > If we leave lane-disambiguation of svbool to the backend, the values 
> >> > I
> >> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> >> > supporting BitInt(N) vectors possibly in the future 2) having a way 
> >> > for
> >> > targets to define their intrinsics' bool vector types using GNU
> >> > extensions 3) feature parity with Clang's ext_vector_type?
> >> >
> >> > I believe the primary motivation for Clang to support ext_vector_type
> >> > was to have a way to define target intrinsics' vector bool type using
> >> > vector extensions.
> >> >
> >> 
> >> 
> >>  Interestingly, Clang seems to support
> >> 
> >>  typedef struct {
> >> _Bool i:1;
> >>  } STR;
> >> 
> >>  typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof 
> >>  (STR)
> >>  * 4))) vec;
> >> 
> >> 
> >>  int foo (vec b) {
> >>    return sizeof b;
> >>  }
> >> 
> >>  I can't find documentation about how it is implemented, but I suspect
> >>  the vector is constructed as an array STR[] i.e. possibly each
> >>  bit-element padded to byte boundary etc. Also, I can't seem to apply
> >>  many operations other than sizeof.
> >>

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod  wrote:
>>
>> On 7/17/24 4:36 PM, Richard Biener wrote:
>> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  
>> > wrote:
>> >>
>> >> On 7/15/24 6:05 PM, Richard Biener wrote:
>> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  
>> >>> wrote:
>> 
>>  On 7/15/24 12:16 PM, Tejas Belagod wrote:
>> > On 7/12/24 6:40 PM, Richard Biener wrote:
>> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  
>> >> wrote:
>> >>>
>> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
>>  Padding is only an issue for very small vectors - the obvious 
>>  choice is
>>  to disallow vector types that would require any padding.  I can 
>>  hardly
>>  see where those are faster than using a vector of up to 4 char
>>  elements.
>>  Problematic are 1-bit elements with 4, 2 or one element vectors,
>>  2-bit elements
>>  with 2 or one element vectors and 4-bit elements with 1 element
>>  vectors.
>> >>>
>> >>> I'd really like to avoid having to support something like
>> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
>> >>> 16)))
>> >>> _BitInt(2) to say size of long long could be acceptable.
>> >>
>> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
>> >> way to say
>> >> the element should have n (< 8) bits.
>> >>
>>  I have no idea what the stance of supporting _BitInt in C++ are,
>>  but most certainly diverging support (or even semantics) of the
>>  vector extension in C vs. C++ is undesirable.
>> >>>
>> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
>> >>> didn't
>> >>> look favorably to _BitInt support in C++, so at least until something
>> >>> like
>> >>> that is standardized in C++ the answer is probably no.
>> >>
>> >> OK, I think that rules out _BitInt use here so while bool is then 
>> >> natural
>> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify 
>> >> the
>> >> number of bits explicitly.  There is signed_bool_precision but like
>> >> vector_mask it's use is restricted to the GIMPLE frontend because
>> >> interaction with the rest of the language isn't defined.
>> >>
>> >
>> > Thanks for all the suggestions - really insightful (to me) discussions.
>> >
>> > Yeah, BitInt seemed like it was best placed for this, but not having 
>> > C++
>> > support is definitely a blocker. But as you say, in the absence of
>> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
>> > way to specify non-1-bit widths could be overloading vector_size.
>> >
>> > Also, I think overloading GIMPLE's vector_mask takes us into the
>> > earlier-discussed territory of what it should actually mean - it 
>> > meaning
>> > the target truth type in GIMPLE and a generic vector extension in the 
>> > FE
>> > will probably confuse gcc developers more than users.
>> >
>> >> That said - we're mixing two things here.  The desire to have "proper"
>> >> svbool (fix: declare in the backend) and the desire to have "packed"
>> >> bit-precision vectors (for whatever actual reason) as part of the
>> >> GCC vector extension.
>> >>
>> >
>> > If we leave lane-disambiguation of svbool to the backend, the values I
>> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
>> > supporting BitInt(N) vectors possibly in the future 2) having a way for
>> > targets to define their intrinsics' bool vector types using GNU
>> > extensions 3) feature parity with Clang's ext_vector_type?
>> >
>> > I believe the primary motivation for Clang to support ext_vector_type
>> > was to have a way to define target intrinsics' vector bool type using
>> > vector extensions.
>> >
>> 
>> 
>>  Interestingly, Clang seems to support
>> 
>>  typedef struct {
>> _Bool i:1;
>>  } STR;
>> 
>>  typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
>>  * 4))) vec;
>> 
>> 
>>  int foo (vec b) {
>>    return sizeof b;
>>  }
>> 
>>  I can't find documentation about how it is implemented, but I suspect
>>  the vector is constructed as an array STR[] i.e. possibly each
>>  bit-element padded to byte boundary etc. Also, I can't seem to apply
>>  many operations other than sizeof.
>> 
>>  I don't know if we've tried to support such cases in GNU in the past?
>> >>>
>> >>> Why should we do that?  It doesn't make much sense.
>> >>>
>> >>> single-bit vectors is what _BitInt was invented for.
>> >>
>> >> Forgive me if I'm misunderstanding - I'm trying to figure out how
>> >> _BitInts can be made to have single-bit generic vector semantic

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod  wrote:
>
> On 7/17/24 4:36 PM, Richard Biener wrote:
> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  
> > wrote:
> >>
> >> On 7/15/24 6:05 PM, Richard Biener wrote:
> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  
> >>> wrote:
> 
>  On 7/15/24 12:16 PM, Tejas Belagod wrote:
> > On 7/12/24 6:40 PM, Richard Biener wrote:
> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
> >>>
> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
>  Padding is only an issue for very small vectors - the obvious choice 
>  is
>  to disallow vector types that would require any padding.  I can 
>  hardly
>  see where those are faster than using a vector of up to 4 char
>  elements.
>  Problematic are 1-bit elements with 4, 2 or one element vectors,
>  2-bit elements
>  with 2 or one element vectors and 4-bit elements with 1 element
>  vectors.
> >>>
> >>> I'd really like to avoid having to support something like
> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
> >>> 16)))
> >>> _BitInt(2) to say size of long long could be acceptable.
> >>
> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
> >> way to say
> >> the element should have n (< 8) bits.
> >>
>  I have no idea what the stance of supporting _BitInt in C++ are,
>  but most certainly diverging support (or even semantics) of the
>  vector extension in C vs. C++ is undesirable.
> >>>
> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> >>> didn't
> >>> look favorably to _BitInt support in C++, so at least until something
> >>> like
> >>> that is standardized in C++ the answer is probably no.
> >>
> >> OK, I think that rules out _BitInt use here so while bool is then 
> >> natural
> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify 
> >> the
> >> number of bits explicitly.  There is signed_bool_precision but like
> >> vector_mask it's use is restricted to the GIMPLE frontend because
> >> interaction with the rest of the language isn't defined.
> >>
> >
> > Thanks for all the suggestions - really insightful (to me) discussions.
> >
> > Yeah, BitInt seemed like it was best placed for this, but not having C++
> > support is definitely a blocker. But as you say, in the absence of
> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> > way to specify non-1-bit widths could be overloading vector_size.
> >
> > Also, I think overloading GIMPLE's vector_mask takes us into the
> > earlier-discussed territory of what it should actually mean - it meaning
> > the target truth type in GIMPLE and a generic vector extension in the FE
> > will probably confuse gcc developers more than users.
> >
> >> That said - we're mixing two things here.  The desire to have "proper"
> >> svbool (fix: declare in the backend) and the desire to have "packed"
> >> bit-precision vectors (for whatever actual reason) as part of the
> >> GCC vector extension.
> >>
> >
> > If we leave lane-disambiguation of svbool to the backend, the values I
> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> > supporting BitInt(N) vectors possibly in the future 2) having a way for
> > targets to define their intrinsics' bool vector types using GNU
> > extensions 3) feature parity with Clang's ext_vector_type?
> >
> > I believe the primary motivation for Clang to support ext_vector_type
> > was to have a way to define target intrinsics' vector bool type using
> > vector extensions.
> >
> 
> 
>  Interestingly, Clang seems to support
> 
>  typedef struct {
> _Bool i:1;
>  } STR;
> 
>  typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
>  * 4))) vec;
> 
> 
>  int foo (vec b) {
>    return sizeof b;
>  }
> 
>  I can't find documentation about how it is implemented, but I suspect
>  the vector is constructed as an array STR[] i.e. possibly each
>  bit-element padded to byte boundary etc. Also, I can't seem to apply
>  many operations other than sizeof.
> 
>  I don't know if we've tried to support such cases in GNU in the past?
> >>>
> >>> Why should we do that?  It doesn't make much sense.
> >>>
> >>> single-bit vectors is what _BitInt was invented for.
> >>
> >> Forgive me if I'm misunderstanding - I'm trying to figure out how
> >> _BitInts can be made to have single-bit generic vector semantics. For
> >> eg. If I want to initialize a _BitInt as vector, I can't do:
> >>
> >>_BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};
> >>
> >> as 'a' expects a scalar initialization

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Tejas Belagod

On 7/17/24 4:36 PM, Richard Biener wrote:

On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  wrote:


On 7/15/24 6:05 PM, Richard Biener wrote:

On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:


On 7/15/24 12:16 PM, Tejas Belagod wrote:

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char
elements.
Problematic are 1-bit elements with 4, 2 or one element vectors,
2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element
vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
16)))
_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
way to say
the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason
didn't
look favorably to _BitInt support in C++, so at least until something
like
that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++
support is definitely a blocker. But as you say, in the absence of
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
way to specify non-1-bit widths could be overloading vector_size.

Also, I think overloading GIMPLE's vector_mask takes us into the
earlier-discussed territory of what it should actually mean - it meaning
the target truth type in GIMPLE and a generic vector extension in the FE
will probably confuse gcc developers more than users.


That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I
see in supporting 1, 2 and 4 bitsizes are 1) first step towards
supporting BitInt(N) vectors possibly in the future 2) having a way for
targets to define their intrinsics' bool vector types using GNU
extensions 3) feature parity with Clang's ext_vector_type?

I believe the primary motivation for Clang to support ext_vector_type
was to have a way to define target intrinsics' vector bool type using
vector extensions.




Interestingly, Clang seems to support

typedef struct {
   _Bool i:1;
} STR;

typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
* 4))) vec;


int foo (vec b) {
  return sizeof b;
}

I can't find documentation about how it is implemented, but I suspect
the vector is constructed as an array STR[] i.e. possibly each
bit-element padded to byte boundary etc. Also, I can't seem to apply
many operations other than sizeof.

I don't know if we've tried to support such cases in GNU in the past?


Why should we do that?  It doesn't make much sense.

single-bit vectors is what _BitInt was invented for.


Forgive me if I'm misunderstanding - I'm trying to figure out how
_BitInts can be made to have single-bit generic vector semantics. For
eg. If I want to initialize a _BitInt as vector, I can't do:

   _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};

as 'a' expects a scalar initialization.

Of if I want to convert an int vector to bit vector, I can't do

v4si_p = v4si_a > v4si_b;
_BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));

Also semantics of conditionals with _BitInt behave like scalars

_BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they
behave as scalars.

Also, I can't do things like

typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt
(2)) * 4)));

to force it to behave as a vector because _BitInt is disallowed here.



All I'm trying to say is that when people want to use vector as
a large packed bitfield they can now use _BitInt instead.  Of course
with a different (but portable) API.
> I don't see single-bit element vectors something as especially
useful with a "vector API".  What's its the use-case? (similar
for the two and four bit elements, with or without padding)



I'm trying to figure out if we had a portable (generi

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  wrote:
>
> On 7/15/24 6:05 PM, Richard Biener wrote:
> > On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:
> >>
> >> On 7/15/24 12:16 PM, Tejas Belagod wrote:
> >>> On 7/12/24 6:40 PM, Richard Biener wrote:
>  On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
> >
> > On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> >> Padding is only an issue for very small vectors - the obvious choice is
> >> to disallow vector types that would require any padding.  I can hardly
> >> see where those are faster than using a vector of up to 4 char
> >> elements.
> >> Problematic are 1-bit elements with 4, 2 or one element vectors,
> >> 2-bit elements
> >> with 2 or one element vectors and 4-bit elements with 1 element
> >> vectors.
> >
> > I'd really like to avoid having to support something like
> > _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
> > 16)))
> > _BitInt(2) to say size of long long could be acceptable.
> 
>  I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
>  way to say
>  the element should have n (< 8) bits.
> 
> >> I have no idea what the stance of supporting _BitInt in C++ are,
> >> but most certainly diverging support (or even semantics) of the
> >> vector extension in C vs. C++ is undesirable.
> >
> > I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> > didn't
> > look favorably to _BitInt support in C++, so at least until something
> > like
> > that is standardized in C++ the answer is probably no.
> 
>  OK, I think that rules out _BitInt use here so while bool is then natural
>  for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
>  number of bits explicitly.  There is signed_bool_precision but like
>  vector_mask it's use is restricted to the GIMPLE frontend because
>  interaction with the rest of the language isn't defined.
> 
> >>>
> >>> Thanks for all the suggestions - really insightful (to me) discussions.
> >>>
> >>> Yeah, BitInt seemed like it was best placed for this, but not having C++
> >>> support is definitely a blocker. But as you say, in the absence of
> >>> BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> >>> way to specify non-1-bit widths could be overloading vector_size.
> >>>
> >>> Also, I think overloading GIMPLE's vector_mask takes us into the
> >>> earlier-discussed territory of what it should actually mean - it meaning
> >>> the target truth type in GIMPLE and a generic vector extension in the FE
> >>> will probably confuse gcc developers more than users.
> >>>
>  That said - we're mixing two things here.  The desire to have "proper"
>  svbool (fix: declare in the backend) and the desire to have "packed"
>  bit-precision vectors (for whatever actual reason) as part of the
>  GCC vector extension.
> 
> >>>
> >>> If we leave lane-disambiguation of svbool to the backend, the values I
> >>> see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> >>> supporting BitInt(N) vectors possibly in the future 2) having a way for
> >>> targets to define their intrinsics' bool vector types using GNU
> >>> extensions 3) feature parity with Clang's ext_vector_type?
> >>>
> >>> I believe the primary motivation for Clang to support ext_vector_type
> >>> was to have a way to define target intrinsics' vector bool type using
> >>> vector extensions.
> >>>
> >>
> >>
> >> Interestingly, Clang seems to support
> >>
> >> typedef struct {
> >>   _Bool i:1;
> >> } STR;
> >>
> >> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
> >> * 4))) vec;
> >>
> >>
> >> int foo (vec b) {
> >>  return sizeof b;
> >> }
> >>
> >> I can't find documentation about how it is implemented, but I suspect
> >> the vector is constructed as an array STR[] i.e. possibly each
> >> bit-element padded to byte boundary etc. Also, I can't seem to apply
> >> many operations other than sizeof.
> >>
> >> I don't know if we've tried to support such cases in GNU in the past?
> >
> > Why should we do that?  It doesn't make much sense.
> >
> > single-bit vectors is what _BitInt was invented for.
>
> Forgive me if I'm misunderstanding - I'm trying to figure out how
> _BitInts can be made to have single-bit generic vector semantics. For
> eg. If I want to initialize a _BitInt as vector, I can't do:
>
>   _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};
>
> as 'a' expects a scalar initialization.
>
> Of if I want to convert an int vector to bit vector, I can't do
>
>v4si_p = v4si_a > v4si_b;
>_BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));
>
> Also semantics of conditionals with _BitInt behave like scalars
>
>_BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they
> behave as scalars.
>
> Also, I can't do things like
>
>typedef 

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Tejas Belagod

On 7/15/24 6:05 PM, Richard Biener wrote:

On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:


On 7/15/24 12:16 PM, Tejas Belagod wrote:

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char
elements.
Problematic are 1-bit elements with 4, 2 or one element vectors,
2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element
vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
16)))
_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
way to say
the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason
didn't
look favorably to _BitInt support in C++, so at least until something
like
that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++
support is definitely a blocker. But as you say, in the absence of
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
way to specify non-1-bit widths could be overloading vector_size.

Also, I think overloading GIMPLE's vector_mask takes us into the
earlier-discussed territory of what it should actually mean - it meaning
the target truth type in GIMPLE and a generic vector extension in the FE
will probably confuse gcc developers more than users.


That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I
see in supporting 1, 2 and 4 bitsizes are 1) first step towards
supporting BitInt(N) vectors possibly in the future 2) having a way for
targets to define their intrinsics' bool vector types using GNU
extensions 3) feature parity with Clang's ext_vector_type?

I believe the primary motivation for Clang to support ext_vector_type
was to have a way to define target intrinsics' vector bool type using
vector extensions.




Interestingly, Clang seems to support

typedef struct {
  _Bool i:1;
} STR;

typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
* 4))) vec;


int foo (vec b) {
 return sizeof b;
}

I can't find documentation about how it is implemented, but I suspect
the vector is constructed as an array STR[] i.e. possibly each
bit-element padded to byte boundary etc. Also, I can't seem to apply
many operations other than sizeof.

I don't know if we've tried to support such cases in GNU in the past?


Why should we do that?  It doesn't make much sense.

single-bit vectors is what _BitInt was invented for. 


Forgive me if I'm misunderstanding - I'm trying to figure out how 
_BitInts can be made to have single-bit generic vector semantics. For 
eg. If I want to initialize a _BitInt as vector, I can't do:


 _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};

as 'a' expects a scalar initialization.

Of if I want to convert an int vector to bit vector, I can't do

  v4si_p = v4si_a > v4si_b;
  _BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));

Also semantics of conditionals with _BitInt behave like scalars

  _BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they 
behave as scalars.


Also, I can't do things like

  typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt 
(2)) * 4)));


to force it to behave as a vector because _BitInt is disallowed here.


 2-bit and 4-bit

element vectors is what's missing, but the scope is narrow and
efficient lowering or native support is missing.  Vectors of bit
elements but with padding is just something stupid to look for
as a general feature.



Fair enough.


Thanks,
Tejas.



Richard.


Thanks,
Tejas.




Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-15 Thread Richard Biener
On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:
>
> On 7/15/24 12:16 PM, Tejas Belagod wrote:
> > On 7/12/24 6:40 PM, Richard Biener wrote:
> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
> >>>
> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
>  Padding is only an issue for very small vectors - the obvious choice is
>  to disallow vector types that would require any padding.  I can hardly
>  see where those are faster than using a vector of up to 4 char
>  elements.
>  Problematic are 1-bit elements with 4, 2 or one element vectors,
>  2-bit elements
>  with 2 or one element vectors and 4-bit elements with 1 element
>  vectors.
> >>>
> >>> I'd really like to avoid having to support something like
> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
> >>> 16)))
> >>> _BitInt(2) to say size of long long could be acceptable.
> >>
> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
> >> way to say
> >> the element should have n (< 8) bits.
> >>
>  I have no idea what the stance of supporting _BitInt in C++ are,
>  but most certainly diverging support (or even semantics) of the
>  vector extension in C vs. C++ is undesirable.
> >>>
> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> >>> didn't
> >>> look favorably to _BitInt support in C++, so at least until something
> >>> like
> >>> that is standardized in C++ the answer is probably no.
> >>
> >> OK, I think that rules out _BitInt use here so while bool is then natural
> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
> >> number of bits explicitly.  There is signed_bool_precision but like
> >> vector_mask it's use is restricted to the GIMPLE frontend because
> >> interaction with the rest of the language isn't defined.
> >>
> >
> > Thanks for all the suggestions - really insightful (to me) discussions.
> >
> > Yeah, BitInt seemed like it was best placed for this, but not having C++
> > support is definitely a blocker. But as you say, in the absence of
> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> > way to specify non-1-bit widths could be overloading vector_size.
> >
> > Also, I think overloading GIMPLE's vector_mask takes us into the
> > earlier-discussed territory of what it should actually mean - it meaning
> > the target truth type in GIMPLE and a generic vector extension in the FE
> > will probably confuse gcc developers more than users.
> >
> >> That said - we're mixing two things here.  The desire to have "proper"
> >> svbool (fix: declare in the backend) and the desire to have "packed"
> >> bit-precision vectors (for whatever actual reason) as part of the
> >> GCC vector extension.
> >>
> >
> > If we leave lane-disambiguation of svbool to the backend, the values I
> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> > supporting BitInt(N) vectors possibly in the future 2) having a way for
> > targets to define their intrinsics' bool vector types using GNU
> > extensions 3) feature parity with Clang's ext_vector_type?
> >
> > I believe the primary motivation for Clang to support ext_vector_type
> > was to have a way to define target intrinsics' vector bool type using
> > vector extensions.
> >
>
>
> Interestingly, Clang seems to support
>
> typedef struct {
>  _Bool i:1;
> } STR;
>
> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
> * 4))) vec;
>
>
> int foo (vec b) {
> return sizeof b;
> }
>
> I can't find documentation about how it is implemented, but I suspect
> the vector is constructed as an array STR[] i.e. possibly each
> bit-element padded to byte boundary etc. Also, I can't seem to apply
> many operations other than sizeof.
>
> I don't know if we've tried to support such cases in GNU in the past?

Why should we do that?  It doesn't make much sense.

single-bit vectors is what _BitInt was invented for.  2-bit and 4-bit
element vectors is what's missing, but the scope is narrow and
efficient lowering or native support is missing.  Vectors of bit
elements but with padding is just something stupid to look for
as a general feature.

Richard.

> Thanks,
> Tejas.


Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-15 Thread Tejas Belagod

On 7/15/24 12:16 PM, Tejas Belagod wrote:

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char 
elements.
Problematic are 1-bit elements with 4, 2 or one element vectors, 
2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element 
vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 
16)))

_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic 
way to say

the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason 
didn't
look favorably to _BitInt support in C++, so at least until something 
like

that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++ 
support is definitely a blocker. But as you say, in the absence of 
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One 
way to specify non-1-bit widths could be overloading vector_size.


Also, I think overloading GIMPLE's vector_mask takes us into the 
earlier-discussed territory of what it should actually mean - it meaning 
the target truth type in GIMPLE and a generic vector extension in the FE 
will probably confuse gcc developers more than users.



That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I 
see in supporting 1, 2 and 4 bitsizes are 1) first step towards 
supporting BitInt(N) vectors possibly in the future 2) having a way for 
targets to define their intrinsics' bool vector types using GNU 
extensions 3) feature parity with Clang's ext_vector_type?


I believe the primary motivation for Clang to support ext_vector_type 
was to have a way to define target intrinsics' vector bool type using 
vector extensions.





Interestingly, Clang seems to support

typedef struct {
_Bool i:1;
} STR;

typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) 
* 4))) vec;



int foo (vec b) {
   return sizeof b;
}

I can't find documentation about how it is implemented, but I suspect 
the vector is constructed as an array STR[] i.e. possibly each 
bit-element padded to byte boundary etc. Also, I can't seem to apply 
many operations other than sizeof.


I don't know if we've tried to support such cases in GNU in the past?

Thanks,
Tejas.


Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-14 Thread Tejas Belagod

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char elements.
Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16)))
_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say
the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't
look favorably to _BitInt support in C++, so at least until something like
that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++ 
support is definitely a blocker. But as you say, in the absence of 
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One 
way to specify non-1-bit widths could be overloading vector_size.


Also, I think overloading GIMPLE's vector_mask takes us into the 
earlier-discussed territory of what it should actually mean - it meaning 
the target truth type in GIMPLE and a generic vector extension in the FE 
will probably confuse gcc developers more than users.



That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I 
see in supporting 1, 2 and 4 bitsizes are 1) first step towards 
supporting BitInt(N) vectors possibly in the future 2) having a way for 
targets to define their intrinsics' bool vector types using GNU 
extensions 3) feature parity with Clang's ext_vector_type?


I believe the primary motivation for Clang to support ext_vector_type 
was to have a way to define target intrinsics' vector bool type using 
vector extensions.


Thanks,
Tejas.


Richard.


 Jakub





Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
>
> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> > Padding is only an issue for very small vectors - the obvious choice is
> > to disallow vector types that would require any padding.  I can hardly
> > see where those are faster than using a vector of up to 4 char elements.
> > Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit 
> > elements
> > with 2 or one element vectors and 4-bit elements with 1 element vectors.
>
> I'd really like to avoid having to support something like
> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16)))
> _BitInt(2) to say size of long long could be acceptable.

I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say
the element should have n (< 8) bits.

> > I have no idea what the stance of supporting _BitInt in C++ are,
> > but most certainly diverging support (or even semantics) of the
> > vector extension in C vs. C++ is undesirable.
>
> I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't
> look favorably to _BitInt support in C++, so at least until something like
> that is standardized in C++ the answer is probably no.

OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.

That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.

Richard.

> Jakub
>


Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Jakub Jelinek
On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> Padding is only an issue for very small vectors - the obvious choice is
> to disallow vector types that would require any padding.  I can hardly
> see where those are faster than using a vector of up to 4 char elements.
> Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit 
> elements
> with 2 or one element vectors and 4-bit elements with 1 element vectors.

I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16)))
_BitInt(2) to say size of long long could be acceptable.

> I have no idea what the stance of supporting _BitInt in C++ are,
> but most certainly diverging support (or even semantics) of the
> vector extension in C vs. C++ is undesirable.

I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't
look favorably to _BitInt support in C++, so at least until something like
that is standardized in C++ the answer is probably no.

Jakub



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 12:44 PM Tejas Belagod  wrote:
>
> On 7/12/24 11:46 AM, Richard Biener wrote:
> > On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod  wrote:
> >>
> >> On 7/10/24 4:37 PM, Richard Biener wrote:
> >>> On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
> >>>  wrote:
> 
>  Tejas Belagod  writes:
> > On 7/10/24 2:38 PM, Richard Biener wrote:
> >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> On 7/9/24 4:22 PM, Richard Biener wrote:
>  On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod 
>   wrote:
> >
> > On 7/8/24 4:45 PM, Richard Biener wrote:
> >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod 
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorry to have dropped the ball on
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, 
> >>> but
> >>> here I've tried to pick it up again and write up a strawman 
> >>> proposal for
> >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>>
> >>>
> >>> Thanks,
> >>> Tejas.
> >>>
> >>> Motivation
> >>> --
> >>>
> >>> The idea of packed boolean vectors came about when we wanted to 
> >>> support
> >>> C/C++ operators on SVE ACLE types. The current vector boolean 
> >>> type that
> >>> ACLE specifies does not adequately disambiguate vector lane sizes 
> >>> which
> >>> they were derived off of. Consider this simple, albeit 
> >>> unrealistic, example:
> >>>
> >>> bool foo (svint32_t a, svint32_t b)
> >>> {
> >>>   svbool_t p = a > b;
> >>>
> >>>   // Here p[2] is not the same as a[2] > b[2].
> >>>   return p[2];
> >>> }
> >>>
> >>> In the above example, because svbool_t has a fixed 
> >>> 1-lane-per-byte, p[i]
> >>> does not return the bool value corresponding to a[i] > b[i]. This
> >>> necessitates a 'typed' vector boolean value that unambiguously
> >>> represents results of operations
> >>> of the same type.
> >>>
> >>> __attribute__((vector_mask))
> >>> -
> >>>
> >>> Note: If interested in historical discussions refer to:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>>
> >>> We define this new attribute which when applied to a base data 
> >>> vector
> >>> produces a new boolean vector type that represents a boolean type 
> >>> that
> >>> is produced as a result of operations on the corresponding base 
> >>> vector
> >>> type. The following is the syntax.
> >>>
> >>> typedef int v8si __attribute__((vector_size (8 * sizeof 
> >>> (int)));
> >>> typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> Here the 'base' data vector type is v8si or a vector of 8 
> >>> integers.
> >>>
> >>> Rules
> >>>
> >>> • The layout/size of the boolean vector type is 
> >>> implementation-defined
> >>> for its base data vector type.
> >>>
> >>> • Two boolean vector types who's base data vector types have same 
> >>> number
> >>> of elements and lane-width have the same layout and size.
> >>>
> >>> • Consequently, two boolean vectors who's base data vector types 
> >>> have
> >>> different number of elements or different lane-size have 
> >>> different layouts.
> >>>
> >>> This aligns with gnu vector extensions that generate integer 
> >>> vectors as
> >>> a result of comparisons - "The result of the comparison is a 
> >>> vector of
> >>> the same width and number of elements as the comparison operands 
> >>> with a
> >>> signed integral element type." according to
> >>>  
> >>> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>
> >> Without having the time to re-review this all in detail I think 
> >> the GNU
> >> vector extension does not expose the result of the comparison as 
> >> the
> >> machine would produce it but instead a comparison "decays" to
> >> a conditional:
> >>
> >> typedef int v4si __attribute__((vector_size(16)));
> >>
> >> v4si a;
> >> v4si b;
> >>
> >> void foo()
> >> {
> >>auto r = a < b;
> >> }
> >>
> >> produces, with C23:
> >>
> >>vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 
> >> } , { 0,
> 

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Tejas Belagod

On 7/12/24 11:46 AM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod  wrote:


On 7/10/24 4:37 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
 wrote:


Tejas Belagod  writes:

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

bool foo (svint32_t a, svint32_t b)
{
  svbool_t p = a > b;

  // Here p[2] is not the same as a[2] > b[2].
  return p[2];
}

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
 https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
   auto r = a < b;
}

produces, with C23:

   vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Res

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-11 Thread Richard Biener
On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod  wrote:
>
> On 7/10/24 4:37 PM, Richard Biener wrote:
> > On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
> >  wrote:
> >>
> >> Tejas Belagod  writes:
> >>> On 7/10/24 2:38 PM, Richard Biener wrote:
>  On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  
>  wrote:
> >
> > On 7/9/24 4:22 PM, Richard Biener wrote:
> >> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> On 7/8/24 4:45 PM, Richard Biener wrote:
>  On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod 
>   wrote:
> >
> > Hi,
> >
> > Sorry to have dropped the ball on
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> > here I've tried to pick it up again and write up a strawman 
> > proposal for
> > elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >
> >
> > Thanks,
> > Tejas.
> >
> > Motivation
> > --
> >
> > The idea of packed boolean vectors came about when we wanted to 
> > support
> > C/C++ operators on SVE ACLE types. The current vector boolean type 
> > that
> > ACLE specifies does not adequately disambiguate vector lane sizes 
> > which
> > they were derived off of. Consider this simple, albeit unrealistic, 
> > example:
> >
> >bool foo (svint32_t a, svint32_t b)
> >{
> >  svbool_t p = a > b;
> >
> >  // Here p[2] is not the same as a[2] > b[2].
> >  return p[2];
> >}
> >
> > In the above example, because svbool_t has a fixed 1-lane-per-byte, 
> > p[i]
> > does not return the bool value corresponding to a[i] > b[i]. This
> > necessitates a 'typed' vector boolean value that unambiguously
> > represents results of operations
> > of the same type.
> >
> > __attribute__((vector_mask))
> > -
> >
> > Note: If interested in historical discussions refer to:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >
> > We define this new attribute which when applied to a base data 
> > vector
> > produces a new boolean vector type that represents a boolean type 
> > that
> > is produced as a result of operations on the corresponding base 
> > vector
> > type. The following is the syntax.
> >
> >typedef int v8si __attribute__((vector_size (8 * sizeof 
> > (int)));
> >typedef v8si v8sib __attribute__((vector_mask));
> >
> > Here the 'base' data vector type is v8si or a vector of 8 integers.
> >
> > Rules
> >
> > • The layout/size of the boolean vector type is 
> > implementation-defined
> > for its base data vector type.
> >
> > • Two boolean vector types who's base data vector types have same 
> > number
> > of elements and lane-width have the same layout and size.
> >
> > • Consequently, two boolean vectors who's base data vector types 
> > have
> > different number of elements or different lane-size have different 
> > layouts.
> >
> > This aligns with gnu vector extensions that generate integer 
> > vectors as
> > a result of comparisons - "The result of the comparison is a vector 
> > of
> > the same width and number of elements as the comparison operands 
> > with a
> > signed integral element type." according to
> > https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> 
>  Without having the time to re-review this all in detail I think the 
>  GNU
>  vector extension does not expose the result of the comparison as the
>  machine would produce it but instead a comparison "decays" to
>  a conditional:
> 
>  typedef int v4si __attribute__((vector_size(16)));
> 
>  v4si a;
>  v4si b;
> 
>  void foo()
>  {
>    auto r = a < b;
>  }
> 
>  produces, with C23:
> 
>    vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } 
>  , { 0,
>  0, 0, 0 } > ;
> 
>  In fact on x86_64 with AVX and AVX512 you have two different "machine
>  produced" mask types and the above could either produce a AVX mask 
>  with
>  32bit elements or a AVX512 mask with 1bit elements.
> 
>  Not exposing "native" mask types requires the compiler optimizing 
>  subsequent
>  uses and makes generic vector

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-11 Thread Tejas Belagod

On 7/10/24 4:37 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
 wrote:


Tejas Belagod  writes:

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

   bool foo (svint32_t a, svint32_t b)
   {
 svbool_t p = a > b;

 // Here p[2] is not the same as a[2] > b[2].
 return p[2];
   }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
   typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
  auto r = a < b;
}

produces, with C23:

  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
type

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
 wrote:
>
> Tejas Belagod  writes:
> > On 7/10/24 2:38 PM, Richard Biener wrote:
> >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> On 7/9/24 4:22 PM, Richard Biener wrote:
>  On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  
>  wrote:
> >
> > On 7/8/24 4:45 PM, Richard Biener wrote:
> >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorry to have dropped the ball on
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> >>> here I've tried to pick it up again and write up a strawman proposal 
> >>> for
> >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>>
> >>>
> >>> Thanks,
> >>> Tejas.
> >>>
> >>> Motivation
> >>> --
> >>>
> >>> The idea of packed boolean vectors came about when we wanted to 
> >>> support
> >>> C/C++ operators on SVE ACLE types. The current vector boolean type 
> >>> that
> >>> ACLE specifies does not adequately disambiguate vector lane sizes 
> >>> which
> >>> they were derived off of. Consider this simple, albeit unrealistic, 
> >>> example:
> >>>
> >>>   bool foo (svint32_t a, svint32_t b)
> >>>   {
> >>> svbool_t p = a > b;
> >>>
> >>> // Here p[2] is not the same as a[2] > b[2].
> >>> return p[2];
> >>>   }
> >>>
> >>> In the above example, because svbool_t has a fixed 1-lane-per-byte, 
> >>> p[i]
> >>> does not return the bool value corresponding to a[i] > b[i]. This
> >>> necessitates a 'typed' vector boolean value that unambiguously
> >>> represents results of operations
> >>> of the same type.
> >>>
> >>> __attribute__((vector_mask))
> >>> -
> >>>
> >>> Note: If interested in historical discussions refer to:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>>
> >>> We define this new attribute which when applied to a base data vector
> >>> produces a new boolean vector type that represents a boolean type that
> >>> is produced as a result of operations on the corresponding base vector
> >>> type. The following is the syntax.
> >>>
> >>>   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
> >>>   typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> Here the 'base' data vector type is v8si or a vector of 8 integers.
> >>>
> >>> Rules
> >>>
> >>> • The layout/size of the boolean vector type is implementation-defined
> >>> for its base data vector type.
> >>>
> >>> • Two boolean vector types who's base data vector types have same 
> >>> number
> >>> of elements and lane-width have the same layout and size.
> >>>
> >>> • Consequently, two boolean vectors who's base data vector types have
> >>> different number of elements or different lane-size have different 
> >>> layouts.
> >>>
> >>> This aligns with gnu vector extensions that generate integer vectors 
> >>> as
> >>> a result of comparisons - "The result of the comparison is a vector of
> >>> the same width and number of elements as the comparison operands with 
> >>> a
> >>> signed integral element type." according to
> >>>https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>
> >> Without having the time to re-review this all in detail I think the GNU
> >> vector extension does not expose the result of the comparison as the
> >> machine would produce it but instead a comparison "decays" to
> >> a conditional:
> >>
> >> typedef int v4si __attribute__((vector_size(16)));
> >>
> >> v4si a;
> >> v4si b;
> >>
> >> void foo()
> >> {
> >>  auto r = a < b;
> >> }
> >>
> >> produces, with C23:
> >>
> >>  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 
> >> 0,
> >> 0, 0, 0 } > ;
> >>
> >> In fact on x86_64 with AVX and AVX512 you have two different "machine
> >> produced" mask types and the above could either produce a AVX mask with
> >> 32bit elements or a AVX512 mask with 1bit elements.
> >>
> >> Not exposing "native" mask types requires the compiler optimizing 
> >> subsequent
> >> uses and makes generic vectors difficult to combine with for example 
> >> AVX512
> >> intrinsics (where masks are just 'int').  Across an ABI boundary it's 
> >> also
> >> even more difficult to optimize mask transitions.
> >>
> >> But it at least allows portable code and it does not suffer from users 
> >> trying to
> >> expose machine representations of masks as input to generic vector code
> >> with all the problems of constant folding not only requiring 
>

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Richard Sandiford
Tejas Belagod  writes:
> On 7/10/24 2:38 PM, Richard Biener wrote:
>> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:
>>>
>>> On 7/9/24 4:22 PM, Richard Biener wrote:
 On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  
 wrote:
>
> On 7/8/24 4:45 PM, Richard Biener wrote:
>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  
>> wrote:
>>>
>>> Hi,
>>>
>>> Sorry to have dropped the ball on
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
>>> here I've tried to pick it up again and write up a strawman proposal for
>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
>>>
>>>
>>> Thanks,
>>> Tejas.
>>>
>>> Motivation
>>> --
>>>
>>> The idea of packed boolean vectors came about when we wanted to support
>>> C/C++ operators on SVE ACLE types. The current vector boolean type that
>>> ACLE specifies does not adequately disambiguate vector lane sizes which
>>> they were derived off of. Consider this simple, albeit unrealistic, 
>>> example:
>>>
>>>   bool foo (svint32_t a, svint32_t b)
>>>   {
>>> svbool_t p = a > b;
>>>
>>> // Here p[2] is not the same as a[2] > b[2].
>>> return p[2];
>>>   }
>>>
>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
>>> does not return the bool value corresponding to a[i] > b[i]. This
>>> necessitates a 'typed' vector boolean value that unambiguously
>>> represents results of operations
>>> of the same type.
>>>
>>> __attribute__((vector_mask))
>>> -
>>>
>>> Note: If interested in historical discussions refer to:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
>>>
>>> We define this new attribute which when applied to a base data vector
>>> produces a new boolean vector type that represents a boolean type that
>>> is produced as a result of operations on the corresponding base vector
>>> type. The following is the syntax.
>>>
>>>   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
>>>   typedef v8si v8sib __attribute__((vector_mask));
>>>
>>> Here the 'base' data vector type is v8si or a vector of 8 integers.
>>>
>>> Rules
>>>
>>> • The layout/size of the boolean vector type is implementation-defined
>>> for its base data vector type.
>>>
>>> • Two boolean vector types who's base data vector types have same number
>>> of elements and lane-width have the same layout and size.
>>>
>>> • Consequently, two boolean vectors who's base data vector types have
>>> different number of elements or different lane-size have different 
>>> layouts.
>>>
>>> This aligns with gnu vector extensions that generate integer vectors as
>>> a result of comparisons - "The result of the comparison is a vector of
>>> the same width and number of elements as the comparison operands with a
>>> signed integral element type." according to
>>>https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
>>
>> Without having the time to re-review this all in detail I think the GNU
>> vector extension does not expose the result of the comparison as the
>> machine would produce it but instead a comparison "decays" to
>> a conditional:
>>
>> typedef int v4si __attribute__((vector_size(16)));
>>
>> v4si a;
>> v4si b;
>>
>> void foo()
>> {
>>  auto r = a < b;
>> }
>>
>> produces, with C23:
>>
>>  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
>> 0, 0, 0 } > ;
>>
>> In fact on x86_64 with AVX and AVX512 you have two different "machine
>> produced" mask types and the above could either produce a AVX mask with
>> 32bit elements or a AVX512 mask with 1bit elements.
>>
>> Not exposing "native" mask types requires the compiler optimizing 
>> subsequent
>> uses and makes generic vectors difficult to combine with for example 
>> AVX512
>> intrinsics (where masks are just 'int').  Across an ABI boundary it's 
>> also
>> even more difficult to optimize mask transitions.
>>
>> But it at least allows portable code and it does not suffer from users 
>> trying to
>> expose machine representations of masks as input to generic vector code
>> with all the problems of constant folding not only requiring 
>> self-consistent
>> code within the compiler but compatibility with user produced constant 
>> masks.
>>
>> That said, I somewhat question the need to expose the target mask layout
>> to users for GCCs generic vector extension.
>>
>
> Thanks for your feedback.
>
> IIUC, I can imagine how having a GNU vector extension exposing the
> target vector 

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Tejas Belagod

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

  bool foo (svint32_t a, svint32_t b)
  {
svbool_t p = a > b;

// Here p[2] is not the same as a[2] > b[2].
return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
 auto r = a < b;
}

produces, with C23:

 vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Richard Biener
On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:
>
> On 7/9/24 4:22 PM, Richard Biener wrote:
> > On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:
> >>
> >> On 7/8/24 4:45 PM, Richard Biener wrote:
> >>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  
> >>> wrote:
> 
>  Hi,
> 
>  Sorry to have dropped the ball on
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
>  here I've tried to pick it up again and write up a strawman proposal for
>  elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> 
> 
>  Thanks,
>  Tejas.
> 
>  Motivation
>  --
> 
>  The idea of packed boolean vectors came about when we wanted to support
>  C/C++ operators on SVE ACLE types. The current vector boolean type that
>  ACLE specifies does not adequately disambiguate vector lane sizes which
>  they were derived off of. Consider this simple, albeit unrealistic, 
>  example:
> 
>   bool foo (svint32_t a, svint32_t b)
>   {
> svbool_t p = a > b;
> 
> // Here p[2] is not the same as a[2] > b[2].
> return p[2];
>   }
> 
>  In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
>  does not return the bool value corresponding to a[i] > b[i]. This
>  necessitates a 'typed' vector boolean value that unambiguously
>  represents results of operations
>  of the same type.
> 
>  __attribute__((vector_mask))
>  -
> 
>  Note: If interested in historical discussions refer to:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> 
>  We define this new attribute which when applied to a base data vector
>  produces a new boolean vector type that represents a boolean type that
>  is produced as a result of operations on the corresponding base vector
>  type. The following is the syntax.
> 
>   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
>   typedef v8si v8sib __attribute__((vector_mask));
> 
>  Here the 'base' data vector type is v8si or a vector of 8 integers.
> 
>  Rules
> 
>  • The layout/size of the boolean vector type is implementation-defined
>  for its base data vector type.
> 
>  • Two boolean vector types who's base data vector types have same number
>  of elements and lane-width have the same layout and size.
> 
>  • Consequently, two boolean vectors who's base data vector types have
>  different number of elements or different lane-size have different 
>  layouts.
> 
>  This aligns with gnu vector extensions that generate integer vectors as
>  a result of comparisons - "The result of the comparison is a vector of
>  the same width and number of elements as the comparison operands with a
>  signed integral element type." according to
>    https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>>
> >>> Without having the time to re-review this all in detail I think the GNU
> >>> vector extension does not expose the result of the comparison as the
> >>> machine would produce it but instead a comparison "decays" to
> >>> a conditional:
> >>>
> >>> typedef int v4si __attribute__((vector_size(16)));
> >>>
> >>> v4si a;
> >>> v4si b;
> >>>
> >>> void foo()
> >>> {
> >>> auto r = a < b;
> >>> }
> >>>
> >>> produces, with C23:
> >>>
> >>> vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
> >>> 0, 0, 0 } > ;
> >>>
> >>> In fact on x86_64 with AVX and AVX512 you have two different "machine
> >>> produced" mask types and the above could either produce a AVX mask with
> >>> 32bit elements or a AVX512 mask with 1bit elements.
> >>>
> >>> Not exposing "native" mask types requires the compiler optimizing 
> >>> subsequent
> >>> uses and makes generic vectors difficult to combine with for example 
> >>> AVX512
> >>> intrinsics (where masks are just 'int').  Across an ABI boundary it's also
> >>> even more difficult to optimize mask transitions.
> >>>
> >>> But it at least allows portable code and it does not suffer from users 
> >>> trying to
> >>> expose machine representations of masks as input to generic vector code
> >>> with all the problems of constant folding not only requiring 
> >>> self-consistent
> >>> code within the compiler but compatibility with user produced constant 
> >>> masks.
> >>>
> >>> That said, I somewhat question the need to expose the target mask layout
> >>> to users for GCCs generic vector extension.
> >>>
> >>
> >> Thanks for your feedback.
> >>
> >> IIUC, I can imagine how having a GNU vector extension exposing the
> >> target vector mask layout can pose a challenge - maybe making it a
> >> generic GNU vector extension was too ambitious. I wonder if there's
> >> value in pursuing these alternate paths?
> >>
> >> 1. Can implementing this extension in 

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Tejas Belagod

On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

 bool foo (svint32_t a, svint32_t b)
 {
   svbool_t p = a > b;

   // Here p[2] is not the same as a[2] > b[2].
   return p[2];
 }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

 typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
 typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
  https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
auto r = a < b;
}

produces, with C23:

vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
this would require more fine-grained defn of lane-size to mask-bits mapping.


I think the ta

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-09 Thread Richard Biener
On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:
>
> On 7/8/24 4:45 PM, Richard Biener wrote:
> > On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:
> >>
> >> Hi,
> >>
> >> Sorry to have dropped the ball on
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> >> here I've tried to pick it up again and write up a strawman proposal for
> >> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>
> >>
> >> Thanks,
> >> Tejas.
> >>
> >> Motivation
> >> --
> >>
> >> The idea of packed boolean vectors came about when we wanted to support
> >> C/C++ operators on SVE ACLE types. The current vector boolean type that
> >> ACLE specifies does not adequately disambiguate vector lane sizes which
> >> they were derived off of. Consider this simple, albeit unrealistic, 
> >> example:
> >>
> >> bool foo (svint32_t a, svint32_t b)
> >> {
> >>   svbool_t p = a > b;
> >>
> >>   // Here p[2] is not the same as a[2] > b[2].
> >>   return p[2];
> >> }
> >>
> >> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
> >> does not return the bool value corresponding to a[i] > b[i]. This
> >> necessitates a 'typed' vector boolean value that unambiguously
> >> represents results of operations
> >> of the same type.
> >>
> >> __attribute__((vector_mask))
> >> -
> >>
> >> Note: If interested in historical discussions refer to:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>
> >> We define this new attribute which when applied to a base data vector
> >> produces a new boolean vector type that represents a boolean type that
> >> is produced as a result of operations on the corresponding base vector
> >> type. The following is the syntax.
> >>
> >> typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
> >> typedef v8si v8sib __attribute__((vector_mask));
> >>
> >> Here the 'base' data vector type is v8si or a vector of 8 integers.
> >>
> >> Rules
> >>
> >> • The layout/size of the boolean vector type is implementation-defined
> >> for its base data vector type.
> >>
> >> • Two boolean vector types who's base data vector types have same number
> >> of elements and lane-width have the same layout and size.
> >>
> >> • Consequently, two boolean vectors who's base data vector types have
> >> different number of elements or different lane-size have different layouts.
> >>
> >> This aligns with gnu vector extensions that generate integer vectors as
> >> a result of comparisons - "The result of the comparison is a vector of
> >> the same width and number of elements as the comparison operands with a
> >> signed integral element type." according to
> >>  https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >
> > Without having the time to re-review this all in detail I think the GNU
> > vector extension does not expose the result of the comparison as the
> > machine would produce it but instead a comparison "decays" to
> > a conditional:
> >
> > typedef int v4si __attribute__((vector_size(16)));
> >
> > v4si a;
> > v4si b;
> >
> > void foo()
> > {
> >auto r = a < b;
> > }
> >
> > produces, with C23:
> >
> >vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
> > 0, 0, 0 } > ;
> >
> > In fact on x86_64 with AVX and AVX512 you have two different "machine
> > produced" mask types and the above could either produce a AVX mask with
> > 32bit elements or a AVX512 mask with 1bit elements.
> >
> > Not exposing "native" mask types requires the compiler optimizing subsequent
> > uses and makes generic vectors difficult to combine with for example AVX512
> > intrinsics (where masks are just 'int').  Across an ABI boundary it's also
> > even more difficult to optimize mask transitions.
> >
> > But it at least allows portable code and it does not suffer from users 
> > trying to
> > expose machine representations of masks as input to generic vector code
> > with all the problems of constant folding not only requiring self-consistent
> > code within the compiler but compatibility with user produced constant 
> > masks.
> >
> > That said, I somewhat question the need to expose the target mask layout
> > to users for GCCs generic vector extension.
> >
>
> Thanks for your feedback.
>
> IIUC, I can imagine how having a GNU vector extension exposing the
> target vector mask layout can pose a challenge - maybe making it a
> generic GNU vector extension was too ambitious. I wonder if there's
> value in pursuing these alternate paths?
>
> 1. Can implementing this extension in a 'generic' way i.e. possibly not
> implement it with a target mask, but just a generic int vector, still
> maintain the consistency of GNU predicate vectors within the compiler? I
> know it may not seem very different from how boolean vectors are
> currently implemented (as in your above example), but, having the
> __attribute__((vector_mask)) as a 'property' of the object makes it
> useful to op

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-09 Thread Tejas Belagod

On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

bool foo (svint32_t a, svint32_t b)
{
  svbool_t p = a > b;

  // Here p[2] is not the same as a[2] > b[2].
  return p[2];
}

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
 https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
   auto r = a < b;
}

produces, with C23:

   vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the 
target vector mask layout can pose a challenge - maybe making it a 
generic GNU vector extension was too ambitious. I wonder if there's 
value in pursuing these alternate paths?


1. Can implementing this extension in a 'generic' way i.e. possibly not 
implement it with a target mask, but just a generic int vector, still 
maintain the consistency of GNU predicate vectors within the compiler? I 
know it may not seem very different from how boolean vectors are 
currently implemented (as in your above example), but, having the 
__attribute__((vector_mask)) as a 'property' of the object makes it 
useful to optimize its uses to target predicates in subsequent stages of 
the compiler.


2. Restricting __attribute__((vector_mask)) to apply only to target 
intrinsic types? Eg.


On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though 
this would require more fine-grained defn of lane-size to mask-bits mapping.


Would not be allowed on GNU Vector Extensions:
typedef v4si v4sib __attribute__((vector_mask)); // Error - v

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-08 Thread Richard Biener
On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:
>
> Hi,
>
> Sorry to have dropped the ball on
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> here I've tried to pick it up again and write up a strawman proposal for
> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
>
>
> Thanks,
> Tejas.
>
> Motivation
> --
>
> The idea of packed boolean vectors came about when we wanted to support
> C/C++ operators on SVE ACLE types. The current vector boolean type that
> ACLE specifies does not adequately disambiguate vector lane sizes which
> they were derived off of. Consider this simple, albeit unrealistic, example:
>
>bool foo (svint32_t a, svint32_t b)
>{
>  svbool_t p = a > b;
>
>  // Here p[2] is not the same as a[2] > b[2].
>  return p[2];
>}
>
> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
> does not return the bool value corresponding to a[i] > b[i]. This
> necessitates a 'typed' vector boolean value that unambiguously
> represents results of operations
> of the same type.
>
> __attribute__((vector_mask))
> -
>
> Note: If interested in historical discussions refer to:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
>
> We define this new attribute which when applied to a base data vector
> produces a new boolean vector type that represents a boolean type that
> is produced as a result of operations on the corresponding base vector
> type. The following is the syntax.
>
>typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
>typedef v8si v8sib __attribute__((vector_mask));
>
> Here the 'base' data vector type is v8si or a vector of 8 integers.
>
> Rules
>
> • The layout/size of the boolean vector type is implementation-defined
> for its base data vector type.
>
> • Two boolean vector types who's base data vector types have same number
> of elements and lane-width have the same layout and size.
>
> • Consequently, two boolean vectors who's base data vector types have
> different number of elements or different lane-size have different layouts.
>
> This aligns with gnu vector extensions that generate integer vectors as
> a result of comparisons - "The result of the comparison is a vector of
> the same width and number of elements as the comparison operands with a
> signed integral element type." according to
> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.

Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
  auto r = a < b;
}

produces, with C23:

  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.

> Producers and Consumers of PBV
> --
>
> With GNU vector extensions, comparisons produce boolean vectors;
> conditional and bitwise operators consume them. Comparison producers
> generate signed integer vectors of the same lane-width as the operands
> of the comparison operator. This means conditionals and bitwise
> operators cannot be applied to mixed vectors that are a result of
> different width operands. Eg.
>
>v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
>{
>  return a > b || c > d; // error!
>  return a > b || e < f; // OK - no explicit conversion needed.
>  return a > b || __builtin_convertvector (c > d, v8si); // OK.
>  return a | b && c | d; // error!
>  return a | b && __builtin_convertvector (c | d, v8si); // OK.
>}
>
> __builtin_convertvector () needs to be applied to convert vectors to the
> type one wants to do the comparison in. IoW, the integer vectors that
> represent boolean vectors are 'strictly-typed'. If we extend these rules
> to vector_mask, this will look like:
>
>typedef v8sib v8si __attribute__((vector_mask));
>typedef v8hib v8hi __attribute__((vector_mask));
>typed

[RFC] Proposal to support Packed Boolean Vector masks.

2024-07-08 Thread Tejas Belagod

Hi,

Sorry to have dropped the ball on 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but 
here I've tried to pick it up again and write up a strawman proposal for 
elevating __attribute__((vector_mask)) to the FE from GIMPLE.



Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support 
C/C++ operators on SVE ACLE types. The current vector boolean type that 
ACLE specifies does not adequately disambiguate vector lane sizes which 
they were derived off of. Consider this simple, albeit unrealistic, example:


  bool foo (svint32_t a, svint32_t b)
  {
svbool_t p = a > b;

// Here p[2] is not the same as a[2] > b[2].
return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] 
does not return the bool value corresponding to a[i] > b[i]. This 
necessitates a 'typed' vector boolean value that unambiguously 
represents results of operations

of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector 
produces a new boolean vector type that represents a boolean type that 
is produced as a result of operations on the corresponding base vector 
type. The following is the syntax.


  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined 
for its base data vector type.


• Two boolean vector types who's base data vector types have same number 
of elements and lane-width have the same layout and size.


• Consequently, two boolean vectors who's base data vector types have 
different number of elements or different lane-size have different layouts.


This aligns with gnu vector extensions that generate integer vectors as 
a result of comparisons - "The result of the comparison is a vector of 
the same width and number of elements as the comparison operands with a 
signed integral element type." according to

   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.

Producers and Consumers of PBV
--

With GNU vector extensions, comparisons produce boolean vectors; 
conditional and bitwise operators consume them. Comparison producers 
generate signed integer vectors of the same lane-width as the operands 
of the comparison operator. This means conditionals and bitwise 
operators cannot be applied to mixed vectors that are a result of 
different width operands. Eg.


  v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
return a > b || c > d; // error!
return a > b || e < f; // OK - no explicit conversion needed.
return a > b || __builtin_convertvector (c > d, v8si); // OK.
return a | b && c | d; // error!
return a | b && __builtin_convertvector (c | d, v8si); // OK.
  }

__builtin_convertvector () needs to be applied to convert vectors to the 
type one wants to do the comparison in. IoW, the integer vectors that 
represent boolean vectors are 'strictly-typed'. If we extend these rules 
to vector_mask, this will look like:


  typedef v8sib v8si __attribute__((vector_mask));
  typedef v8hib v8hi __attribute__((vector_mask));
  typedef v8sfb v8sf __attribute__((vector_mask));

  v8sib foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
v8sib psi = a > b;
v8hib phi = c > d;
v8sfb psf = e < f;

return psi || phi; // error!
return psi || psf; // OK - no explicit conversion needed.
return psi || __builtin_convertvector (phi, v8sib); // OK.
return psi | phi; // error!
return psi | __builtin_convertvector (phi, v8sib); // OK.
return psi | psf; // OK - no explicit conversion needed.
  }

Now according to the rules explained above, v8sib and v8hib will have 
different layouts (which is why they can't be used directly without 
conversion if used as operands of operations). OTOH, the same rules 
dictate that the layout of, say v8sib and v8sfb, where v8sfb is the 
float base data vector equivalent of v8sib which when applied ensure 
that v8sib and v8sfb have the same layout and hence can be used as 
operands of operators without explicit conversion. This aligns with the 
GNU vector extensions rules where comparison of 2 v8sf vectors results 
in a v8si of the same lane-width and number of elements as that would 
result in comparison of 2 v8si vectors.


Application of vector_mask to sizeless types


__attribute__((vector_mask)) has the advantage that it can be applied to 
sizeless types seamlessly.  When __attribute__((vector_mask)) is applied 
to a data vector that is a sizeless type, the resulting vector mask also 
becomes a sizeless type.

Eg.

  typedef svpre