Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Wed, Jul 17, 2024 at 3:17 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote: > >> > >> On 7/17/24 4:36 PM, Richard Biener wrote: > >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod > >> > wrote: > >> >> > >> >> On 7/15/24 6:05 PM, Richard Biener wrote: > >> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod > >> >>> wrote: > >> > >> On 7/15/24 12:16 PM, Tejas Belagod wrote: > >> > On 7/12/24 6:40 PM, Richard Biener wrote: > >> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek > >> >> wrote: > >> >>> > >> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > >> Padding is only an issue for very small vectors - the obvious > >> choice is > >> to disallow vector types that would require any padding. I can > >> hardly > >> see where those are faster than using a vector of up to 4 char > >> elements. > >> Problematic are 1-bit elements with 4, 2 or one element vectors, > >> 2-bit elements > >> with 2 or one element vectors and 4-bit elements with 1 element > >> vectors. > >> >>> > >> >>> I'd really like to avoid having to support something like > >> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) > >> >>> * > >> >>> 16))) > >> >>> _BitInt(2) to say size of long long could be acceptable. > >> >> > >> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic > >> >> way to say > >> >> the element should have n (< 8) bits. > >> >> > >> I have no idea what the stance of supporting _BitInt in C++ are, > >> but most certainly diverging support (or even semantics) of the > >> vector extension in C vs. C++ is undesirable. > >> >>> > >> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason > >> >>> didn't > >> >>> look favorably to _BitInt support in C++, so at least until > >> >>> something > >> >>> like > >> >>> that is standardized in C++ the answer is probably no. > >> >> > >> >> OK, I think that rules out _BitInt use here so while bool is then > >> >> natural > >> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to > >> >> specify the > >> >> number of bits explicitly. There is signed_bool_precision but like > >> >> vector_mask it's use is restricted to the GIMPLE frontend because > >> >> interaction with the rest of the language isn't defined. > >> >> > >> > > >> > Thanks for all the suggestions - really insightful (to me) > >> > discussions. > >> > > >> > Yeah, BitInt seemed like it was best placed for this, but not having > >> > C++ > >> > support is definitely a blocker. But as you say, in the absence of > >> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One > >> > way to specify non-1-bit widths could be overloading vector_size. > >> > > >> > Also, I think overloading GIMPLE's vector_mask takes us into the > >> > earlier-discussed territory of what it should actually mean - it > >> > meaning > >> > the target truth type in GIMPLE and a generic vector extension in > >> > the FE > >> > will probably confuse gcc developers more than users. > >> > > >> >> That said - we're mixing two things here. The desire to have > >> >> "proper" > >> >> svbool (fix: declare in the backend) and the desire to have "packed" > >> >> bit-precision vectors (for whatever actual reason) as part of the > >> >> GCC vector extension. > >> >> > >> > > >> > If we leave lane-disambiguation of svbool to the backend, the values > >> > I > >> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards > >> > supporting BitInt(N) vectors possibly in the future 2) having a way > >> > for > >> > targets to define their intrinsics' bool vector types using GNU > >> > extensions 3) feature parity with Clang's ext_vector_type? > >> > > >> > I believe the primary motivation for Clang to support ext_vector_type > >> > was to have a way to define target intrinsics' vector bool type using > >> > vector extensions. > >> > > >> > >> > >> Interestingly, Clang seems to support > >> > >> typedef struct { > >> _Bool i:1; > >> } STR; > >> > >> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof > >> (STR) > >> * 4))) vec; > >> > >> > >> int foo (vec b) { > >> return sizeof b; > >> } > >> > >> I can't find documentation about how it is implemented, but I suspect > >> the vector is constructed as an array STR[] i.e. possibly each > >> bit-element padded to byte boundary etc. Also, I can't seem to apply > >> many operations other than sizeof. > >>
Re: [RFC] Proposal to support Packed Boolean Vector masks.
Richard Biener writes: > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote: >> >> On 7/17/24 4:36 PM, Richard Biener wrote: >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod >> > wrote: >> >> >> >> On 7/15/24 6:05 PM, Richard Biener wrote: >> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod >> >>> wrote: >> >> On 7/15/24 12:16 PM, Tejas Belagod wrote: >> > On 7/12/24 6:40 PM, Richard Biener wrote: >> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek >> >> wrote: >> >>> >> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: >> Padding is only an issue for very small vectors - the obvious >> choice is >> to disallow vector types that would require any padding. I can >> hardly >> see where those are faster than using a vector of up to 4 char >> elements. >> Problematic are 1-bit elements with 4, 2 or one element vectors, >> 2-bit elements >> with 2 or one element vectors and 4-bit elements with 1 element >> vectors. >> >>> >> >>> I'd really like to avoid having to support something like >> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * >> >>> 16))) >> >>> _BitInt(2) to say size of long long could be acceptable. >> >> >> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic >> >> way to say >> >> the element should have n (< 8) bits. >> >> >> I have no idea what the stance of supporting _BitInt in C++ are, >> but most certainly diverging support (or even semantics) of the >> vector extension in C vs. C++ is undesirable. >> >>> >> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason >> >>> didn't >> >>> look favorably to _BitInt support in C++, so at least until something >> >>> like >> >>> that is standardized in C++ the answer is probably no. >> >> >> >> OK, I think that rules out _BitInt use here so while bool is then >> >> natural >> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify >> >> the >> >> number of bits explicitly. There is signed_bool_precision but like >> >> vector_mask it's use is restricted to the GIMPLE frontend because >> >> interaction with the rest of the language isn't defined. >> >> >> > >> > Thanks for all the suggestions - really insightful (to me) discussions. >> > >> > Yeah, BitInt seemed like it was best placed for this, but not having >> > C++ >> > support is definitely a blocker. But as you say, in the absence of >> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One >> > way to specify non-1-bit widths could be overloading vector_size. >> > >> > Also, I think overloading GIMPLE's vector_mask takes us into the >> > earlier-discussed territory of what it should actually mean - it >> > meaning >> > the target truth type in GIMPLE and a generic vector extension in the >> > FE >> > will probably confuse gcc developers more than users. >> > >> >> That said - we're mixing two things here. The desire to have "proper" >> >> svbool (fix: declare in the backend) and the desire to have "packed" >> >> bit-precision vectors (for whatever actual reason) as part of the >> >> GCC vector extension. >> >> >> > >> > If we leave lane-disambiguation of svbool to the backend, the values I >> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards >> > supporting BitInt(N) vectors possibly in the future 2) having a way for >> > targets to define their intrinsics' bool vector types using GNU >> > extensions 3) feature parity with Clang's ext_vector_type? >> > >> > I believe the primary motivation for Clang to support ext_vector_type >> > was to have a way to define target intrinsics' vector bool type using >> > vector extensions. >> > >> >> >> Interestingly, Clang seems to support >> >> typedef struct { >> _Bool i:1; >> } STR; >> >> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) >> * 4))) vec; >> >> >> int foo (vec b) { >> return sizeof b; >> } >> >> I can't find documentation about how it is implemented, but I suspect >> the vector is constructed as an array STR[] i.e. possibly each >> bit-element padded to byte boundary etc. Also, I can't seem to apply >> many operations other than sizeof. >> >> I don't know if we've tried to support such cases in GNU in the past? >> >>> >> >>> Why should we do that? It doesn't make much sense. >> >>> >> >>> single-bit vectors is what _BitInt was invented for. >> >> >> >> Forgive me if I'm misunderstanding - I'm trying to figure out how >> >> _BitInts can be made to have single-bit generic vector semantic
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote: > > On 7/17/24 4:36 PM, Richard Biener wrote: > > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod > > wrote: > >> > >> On 7/15/24 6:05 PM, Richard Biener wrote: > >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod > >>> wrote: > > On 7/15/24 12:16 PM, Tejas Belagod wrote: > > On 7/12/24 6:40 PM, Richard Biener wrote: > >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: > >>> > >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > Padding is only an issue for very small vectors - the obvious choice > is > to disallow vector types that would require any padding. I can > hardly > see where those are faster than using a vector of up to 4 char > elements. > Problematic are 1-bit elements with 4, 2 or one element vectors, > 2-bit elements > with 2 or one element vectors and 4-bit elements with 1 element > vectors. > >>> > >>> I'd really like to avoid having to support something like > >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * > >>> 16))) > >>> _BitInt(2) to say size of long long could be acceptable. > >> > >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic > >> way to say > >> the element should have n (< 8) bits. > >> > I have no idea what the stance of supporting _BitInt in C++ are, > but most certainly diverging support (or even semantics) of the > vector extension in C vs. C++ is undesirable. > >>> > >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason > >>> didn't > >>> look favorably to _BitInt support in C++, so at least until something > >>> like > >>> that is standardized in C++ the answer is probably no. > >> > >> OK, I think that rules out _BitInt use here so while bool is then > >> natural > >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify > >> the > >> number of bits explicitly. There is signed_bool_precision but like > >> vector_mask it's use is restricted to the GIMPLE frontend because > >> interaction with the rest of the language isn't defined. > >> > > > > Thanks for all the suggestions - really insightful (to me) discussions. > > > > Yeah, BitInt seemed like it was best placed for this, but not having C++ > > support is definitely a blocker. But as you say, in the absence of > > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One > > way to specify non-1-bit widths could be overloading vector_size. > > > > Also, I think overloading GIMPLE's vector_mask takes us into the > > earlier-discussed territory of what it should actually mean - it meaning > > the target truth type in GIMPLE and a generic vector extension in the FE > > will probably confuse gcc developers more than users. > > > >> That said - we're mixing two things here. The desire to have "proper" > >> svbool (fix: declare in the backend) and the desire to have "packed" > >> bit-precision vectors (for whatever actual reason) as part of the > >> GCC vector extension. > >> > > > > If we leave lane-disambiguation of svbool to the backend, the values I > > see in supporting 1, 2 and 4 bitsizes are 1) first step towards > > supporting BitInt(N) vectors possibly in the future 2) having a way for > > targets to define their intrinsics' bool vector types using GNU > > extensions 3) feature parity with Clang's ext_vector_type? > > > > I believe the primary motivation for Clang to support ext_vector_type > > was to have a way to define target intrinsics' vector bool type using > > vector extensions. > > > > > Interestingly, Clang seems to support > > typedef struct { > _Bool i:1; > } STR; > > typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) > * 4))) vec; > > > int foo (vec b) { > return sizeof b; > } > > I can't find documentation about how it is implemented, but I suspect > the vector is constructed as an array STR[] i.e. possibly each > bit-element padded to byte boundary etc. Also, I can't seem to apply > many operations other than sizeof. > > I don't know if we've tried to support such cases in GNU in the past? > >>> > >>> Why should we do that? It doesn't make much sense. > >>> > >>> single-bit vectors is what _BitInt was invented for. > >> > >> Forgive me if I'm misunderstanding - I'm trying to figure out how > >> _BitInts can be made to have single-bit generic vector semantics. For > >> eg. If I want to initialize a _BitInt as vector, I can't do: > >> > >>_BitInt (4) a = (_BitInt (4)){1, 0, 1, 1}; > >> > >> as 'a' expects a scalar initialization
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/17/24 4:36 PM, Richard Biener wrote: On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod wrote: On 7/15/24 6:05 PM, Richard Biener wrote: On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote: On 7/15/24 12:16 PM, Tejas Belagod wrote: On 7/12/24 6:40 PM, Richard Biener wrote: On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: Padding is only an issue for very small vectors - the obvious choice is to disallow vector types that would require any padding. I can hardly see where those are faster than using a vector of up to 4 char elements. Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements with 2 or one element vectors and 4-bit elements with 1 element vectors. I'd really like to avoid having to support something like _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) _BitInt(2) to say size of long long could be acceptable. I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say the element should have n (< 8) bits. I have no idea what the stance of supporting _BitInt in C++ are, but most certainly diverging support (or even semantics) of the vector extension in C vs. C++ is undesirable. I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't look favorably to _BitInt support in C++, so at least until something like that is standardized in C++ the answer is probably no. OK, I think that rules out _BitInt use here so while bool is then natural for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the number of bits explicitly. There is signed_bool_precision but like vector_mask it's use is restricted to the GIMPLE frontend because interaction with the rest of the language isn't defined. Thanks for all the suggestions - really insightful (to me) discussions. Yeah, BitInt seemed like it was best placed for this, but not having C++ support is definitely a blocker. But as you say, in the absence of BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One way to specify non-1-bit widths could be overloading vector_size. Also, I think overloading GIMPLE's vector_mask takes us into the earlier-discussed territory of what it should actually mean - it meaning the target truth type in GIMPLE and a generic vector extension in the FE will probably confuse gcc developers more than users. That said - we're mixing two things here. The desire to have "proper" svbool (fix: declare in the backend) and the desire to have "packed" bit-precision vectors (for whatever actual reason) as part of the GCC vector extension. If we leave lane-disambiguation of svbool to the backend, the values I see in supporting 1, 2 and 4 bitsizes are 1) first step towards supporting BitInt(N) vectors possibly in the future 2) having a way for targets to define their intrinsics' bool vector types using GNU extensions 3) feature parity with Clang's ext_vector_type? I believe the primary motivation for Clang to support ext_vector_type was to have a way to define target intrinsics' vector bool type using vector extensions. Interestingly, Clang seems to support typedef struct { _Bool i:1; } STR; typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) * 4))) vec; int foo (vec b) { return sizeof b; } I can't find documentation about how it is implemented, but I suspect the vector is constructed as an array STR[] i.e. possibly each bit-element padded to byte boundary etc. Also, I can't seem to apply many operations other than sizeof. I don't know if we've tried to support such cases in GNU in the past? Why should we do that? It doesn't make much sense. single-bit vectors is what _BitInt was invented for. Forgive me if I'm misunderstanding - I'm trying to figure out how _BitInts can be made to have single-bit generic vector semantics. For eg. If I want to initialize a _BitInt as vector, I can't do: _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1}; as 'a' expects a scalar initialization. Of if I want to convert an int vector to bit vector, I can't do v4si_p = v4si_a > v4si_b; _BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4)); Also semantics of conditionals with _BitInt behave like scalars _BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they behave as scalars. Also, I can't do things like typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt (2)) * 4))); to force it to behave as a vector because _BitInt is disallowed here. All I'm trying to say is that when people want to use vector as a large packed bitfield they can now use _BitInt instead. Of course with a different (but portable) API. > I don't see single-bit element vectors something as especially useful with a "vector API". What's its the use-case? (similar for the two and four bit elements, with or without padding) I'm trying to figure out if we had a portable (generi
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod wrote: > > On 7/15/24 6:05 PM, Richard Biener wrote: > > On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote: > >> > >> On 7/15/24 12:16 PM, Tejas Belagod wrote: > >>> On 7/12/24 6:40 PM, Richard Biener wrote: > On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: > > > > On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > >> Padding is only an issue for very small vectors - the obvious choice is > >> to disallow vector types that would require any padding. I can hardly > >> see where those are faster than using a vector of up to 4 char > >> elements. > >> Problematic are 1-bit elements with 4, 2 or one element vectors, > >> 2-bit elements > >> with 2 or one element vectors and 4-bit elements with 1 element > >> vectors. > > > > I'd really like to avoid having to support something like > > _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * > > 16))) > > _BitInt(2) to say size of long long could be acceptable. > > I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic > way to say > the element should have n (< 8) bits. > > >> I have no idea what the stance of supporting _BitInt in C++ are, > >> but most certainly diverging support (or even semantics) of the > >> vector extension in C vs. C++ is undesirable. > > > > I believe Clang supports it in C++ next to C, GCC doesn't and Jason > > didn't > > look favorably to _BitInt support in C++, so at least until something > > like > > that is standardized in C++ the answer is probably no. > > OK, I think that rules out _BitInt use here so while bool is then natural > for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the > number of bits explicitly. There is signed_bool_precision but like > vector_mask it's use is restricted to the GIMPLE frontend because > interaction with the rest of the language isn't defined. > > >>> > >>> Thanks for all the suggestions - really insightful (to me) discussions. > >>> > >>> Yeah, BitInt seemed like it was best placed for this, but not having C++ > >>> support is definitely a blocker. But as you say, in the absence of > >>> BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One > >>> way to specify non-1-bit widths could be overloading vector_size. > >>> > >>> Also, I think overloading GIMPLE's vector_mask takes us into the > >>> earlier-discussed territory of what it should actually mean - it meaning > >>> the target truth type in GIMPLE and a generic vector extension in the FE > >>> will probably confuse gcc developers more than users. > >>> > That said - we're mixing two things here. The desire to have "proper" > svbool (fix: declare in the backend) and the desire to have "packed" > bit-precision vectors (for whatever actual reason) as part of the > GCC vector extension. > > >>> > >>> If we leave lane-disambiguation of svbool to the backend, the values I > >>> see in supporting 1, 2 and 4 bitsizes are 1) first step towards > >>> supporting BitInt(N) vectors possibly in the future 2) having a way for > >>> targets to define their intrinsics' bool vector types using GNU > >>> extensions 3) feature parity with Clang's ext_vector_type? > >>> > >>> I believe the primary motivation for Clang to support ext_vector_type > >>> was to have a way to define target intrinsics' vector bool type using > >>> vector extensions. > >>> > >> > >> > >> Interestingly, Clang seems to support > >> > >> typedef struct { > >> _Bool i:1; > >> } STR; > >> > >> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) > >> * 4))) vec; > >> > >> > >> int foo (vec b) { > >> return sizeof b; > >> } > >> > >> I can't find documentation about how it is implemented, but I suspect > >> the vector is constructed as an array STR[] i.e. possibly each > >> bit-element padded to byte boundary etc. Also, I can't seem to apply > >> many operations other than sizeof. > >> > >> I don't know if we've tried to support such cases in GNU in the past? > > > > Why should we do that? It doesn't make much sense. > > > > single-bit vectors is what _BitInt was invented for. > > Forgive me if I'm misunderstanding - I'm trying to figure out how > _BitInts can be made to have single-bit generic vector semantics. For > eg. If I want to initialize a _BitInt as vector, I can't do: > > _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1}; > > as 'a' expects a scalar initialization. > > Of if I want to convert an int vector to bit vector, I can't do > >v4si_p = v4si_a > v4si_b; >_BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4)); > > Also semantics of conditionals with _BitInt behave like scalars > >_BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they > behave as scalars. > > Also, I can't do things like > >typedef
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/15/24 6:05 PM, Richard Biener wrote: On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote: On 7/15/24 12:16 PM, Tejas Belagod wrote: On 7/12/24 6:40 PM, Richard Biener wrote: On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: Padding is only an issue for very small vectors - the obvious choice is to disallow vector types that would require any padding. I can hardly see where those are faster than using a vector of up to 4 char elements. Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements with 2 or one element vectors and 4-bit elements with 1 element vectors. I'd really like to avoid having to support something like _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) _BitInt(2) to say size of long long could be acceptable. I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say the element should have n (< 8) bits. I have no idea what the stance of supporting _BitInt in C++ are, but most certainly diverging support (or even semantics) of the vector extension in C vs. C++ is undesirable. I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't look favorably to _BitInt support in C++, so at least until something like that is standardized in C++ the answer is probably no. OK, I think that rules out _BitInt use here so while bool is then natural for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the number of bits explicitly. There is signed_bool_precision but like vector_mask it's use is restricted to the GIMPLE frontend because interaction with the rest of the language isn't defined. Thanks for all the suggestions - really insightful (to me) discussions. Yeah, BitInt seemed like it was best placed for this, but not having C++ support is definitely a blocker. But as you say, in the absence of BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One way to specify non-1-bit widths could be overloading vector_size. Also, I think overloading GIMPLE's vector_mask takes us into the earlier-discussed territory of what it should actually mean - it meaning the target truth type in GIMPLE and a generic vector extension in the FE will probably confuse gcc developers more than users. That said - we're mixing two things here. The desire to have "proper" svbool (fix: declare in the backend) and the desire to have "packed" bit-precision vectors (for whatever actual reason) as part of the GCC vector extension. If we leave lane-disambiguation of svbool to the backend, the values I see in supporting 1, 2 and 4 bitsizes are 1) first step towards supporting BitInt(N) vectors possibly in the future 2) having a way for targets to define their intrinsics' bool vector types using GNU extensions 3) feature parity with Clang's ext_vector_type? I believe the primary motivation for Clang to support ext_vector_type was to have a way to define target intrinsics' vector bool type using vector extensions. Interestingly, Clang seems to support typedef struct { _Bool i:1; } STR; typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) * 4))) vec; int foo (vec b) { return sizeof b; } I can't find documentation about how it is implemented, but I suspect the vector is constructed as an array STR[] i.e. possibly each bit-element padded to byte boundary etc. Also, I can't seem to apply many operations other than sizeof. I don't know if we've tried to support such cases in GNU in the past? Why should we do that? It doesn't make much sense. single-bit vectors is what _BitInt was invented for. Forgive me if I'm misunderstanding - I'm trying to figure out how _BitInts can be made to have single-bit generic vector semantics. For eg. If I want to initialize a _BitInt as vector, I can't do: _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1}; as 'a' expects a scalar initialization. Of if I want to convert an int vector to bit vector, I can't do v4si_p = v4si_a > v4si_b; _BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4)); Also semantics of conditionals with _BitInt behave like scalars _BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they behave as scalars. Also, I can't do things like typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt (2)) * 4))); to force it to behave as a vector because _BitInt is disallowed here. 2-bit and 4-bit element vectors is what's missing, but the scope is narrow and efficient lowering or native support is missing. Vectors of bit elements but with padding is just something stupid to look for as a general feature. Fair enough. Thanks, Tejas. Richard. Thanks, Tejas.
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote: > > On 7/15/24 12:16 PM, Tejas Belagod wrote: > > On 7/12/24 6:40 PM, Richard Biener wrote: > >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: > >>> > >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > Padding is only an issue for very small vectors - the obvious choice is > to disallow vector types that would require any padding. I can hardly > see where those are faster than using a vector of up to 4 char > elements. > Problematic are 1-bit elements with 4, 2 or one element vectors, > 2-bit elements > with 2 or one element vectors and 4-bit elements with 1 element > vectors. > >>> > >>> I'd really like to avoid having to support something like > >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * > >>> 16))) > >>> _BitInt(2) to say size of long long could be acceptable. > >> > >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic > >> way to say > >> the element should have n (< 8) bits. > >> > I have no idea what the stance of supporting _BitInt in C++ are, > but most certainly diverging support (or even semantics) of the > vector extension in C vs. C++ is undesirable. > >>> > >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason > >>> didn't > >>> look favorably to _BitInt support in C++, so at least until something > >>> like > >>> that is standardized in C++ the answer is probably no. > >> > >> OK, I think that rules out _BitInt use here so while bool is then natural > >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the > >> number of bits explicitly. There is signed_bool_precision but like > >> vector_mask it's use is restricted to the GIMPLE frontend because > >> interaction with the rest of the language isn't defined. > >> > > > > Thanks for all the suggestions - really insightful (to me) discussions. > > > > Yeah, BitInt seemed like it was best placed for this, but not having C++ > > support is definitely a blocker. But as you say, in the absence of > > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One > > way to specify non-1-bit widths could be overloading vector_size. > > > > Also, I think overloading GIMPLE's vector_mask takes us into the > > earlier-discussed territory of what it should actually mean - it meaning > > the target truth type in GIMPLE and a generic vector extension in the FE > > will probably confuse gcc developers more than users. > > > >> That said - we're mixing two things here. The desire to have "proper" > >> svbool (fix: declare in the backend) and the desire to have "packed" > >> bit-precision vectors (for whatever actual reason) as part of the > >> GCC vector extension. > >> > > > > If we leave lane-disambiguation of svbool to the backend, the values I > > see in supporting 1, 2 and 4 bitsizes are 1) first step towards > > supporting BitInt(N) vectors possibly in the future 2) having a way for > > targets to define their intrinsics' bool vector types using GNU > > extensions 3) feature parity with Clang's ext_vector_type? > > > > I believe the primary motivation for Clang to support ext_vector_type > > was to have a way to define target intrinsics' vector bool type using > > vector extensions. > > > > > Interestingly, Clang seems to support > > typedef struct { > _Bool i:1; > } STR; > > typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) > * 4))) vec; > > > int foo (vec b) { > return sizeof b; > } > > I can't find documentation about how it is implemented, but I suspect > the vector is constructed as an array STR[] i.e. possibly each > bit-element padded to byte boundary etc. Also, I can't seem to apply > many operations other than sizeof. > > I don't know if we've tried to support such cases in GNU in the past? Why should we do that? It doesn't make much sense. single-bit vectors is what _BitInt was invented for. 2-bit and 4-bit element vectors is what's missing, but the scope is narrow and efficient lowering or native support is missing. Vectors of bit elements but with padding is just something stupid to look for as a general feature. Richard. > Thanks, > Tejas.
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/15/24 12:16 PM, Tejas Belagod wrote: On 7/12/24 6:40 PM, Richard Biener wrote: On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: Padding is only an issue for very small vectors - the obvious choice is to disallow vector types that would require any padding. I can hardly see where those are faster than using a vector of up to 4 char elements. Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements with 2 or one element vectors and 4-bit elements with 1 element vectors. I'd really like to avoid having to support something like _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) _BitInt(2) to say size of long long could be acceptable. I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say the element should have n (< 8) bits. I have no idea what the stance of supporting _BitInt in C++ are, but most certainly diverging support (or even semantics) of the vector extension in C vs. C++ is undesirable. I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't look favorably to _BitInt support in C++, so at least until something like that is standardized in C++ the answer is probably no. OK, I think that rules out _BitInt use here so while bool is then natural for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the number of bits explicitly. There is signed_bool_precision but like vector_mask it's use is restricted to the GIMPLE frontend because interaction with the rest of the language isn't defined. Thanks for all the suggestions - really insightful (to me) discussions. Yeah, BitInt seemed like it was best placed for this, but not having C++ support is definitely a blocker. But as you say, in the absence of BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One way to specify non-1-bit widths could be overloading vector_size. Also, I think overloading GIMPLE's vector_mask takes us into the earlier-discussed territory of what it should actually mean - it meaning the target truth type in GIMPLE and a generic vector extension in the FE will probably confuse gcc developers more than users. That said - we're mixing two things here. The desire to have "proper" svbool (fix: declare in the backend) and the desire to have "packed" bit-precision vectors (for whatever actual reason) as part of the GCC vector extension. If we leave lane-disambiguation of svbool to the backend, the values I see in supporting 1, 2 and 4 bitsizes are 1) first step towards supporting BitInt(N) vectors possibly in the future 2) having a way for targets to define their intrinsics' bool vector types using GNU extensions 3) feature parity with Clang's ext_vector_type? I believe the primary motivation for Clang to support ext_vector_type was to have a way to define target intrinsics' vector bool type using vector extensions. Interestingly, Clang seems to support typedef struct { _Bool i:1; } STR; typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR) * 4))) vec; int foo (vec b) { return sizeof b; } I can't find documentation about how it is implemented, but I suspect the vector is constructed as an array STR[] i.e. possibly each bit-element padded to byte boundary etc. Also, I can't seem to apply many operations other than sizeof. I don't know if we've tried to support such cases in GNU in the past? Thanks, Tejas.
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/12/24 6:40 PM, Richard Biener wrote: On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: Padding is only an issue for very small vectors - the obvious choice is to disallow vector types that would require any padding. I can hardly see where those are faster than using a vector of up to 4 char elements. Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements with 2 or one element vectors and 4-bit elements with 1 element vectors. I'd really like to avoid having to support something like _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) _BitInt(2) to say size of long long could be acceptable. I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say the element should have n (< 8) bits. I have no idea what the stance of supporting _BitInt in C++ are, but most certainly diverging support (or even semantics) of the vector extension in C vs. C++ is undesirable. I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't look favorably to _BitInt support in C++, so at least until something like that is standardized in C++ the answer is probably no. OK, I think that rules out _BitInt use here so while bool is then natural for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the number of bits explicitly. There is signed_bool_precision but like vector_mask it's use is restricted to the GIMPLE frontend because interaction with the rest of the language isn't defined. Thanks for all the suggestions - really insightful (to me) discussions. Yeah, BitInt seemed like it was best placed for this, but not having C++ support is definitely a blocker. But as you say, in the absence of BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One way to specify non-1-bit widths could be overloading vector_size. Also, I think overloading GIMPLE's vector_mask takes us into the earlier-discussed territory of what it should actually mean - it meaning the target truth type in GIMPLE and a generic vector extension in the FE will probably confuse gcc developers more than users. That said - we're mixing two things here. The desire to have "proper" svbool (fix: declare in the backend) and the desire to have "packed" bit-precision vectors (for whatever actual reason) as part of the GCC vector extension. If we leave lane-disambiguation of svbool to the backend, the values I see in supporting 1, 2 and 4 bitsizes are 1) first step towards supporting BitInt(N) vectors possibly in the future 2) having a way for targets to define their intrinsics' bool vector types using GNU extensions 3) feature parity with Clang's ext_vector_type? I believe the primary motivation for Clang to support ext_vector_type was to have a way to define target intrinsics' vector bool type using vector extensions. Thanks, Tejas. Richard. Jakub
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: > > On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > > Padding is only an issue for very small vectors - the obvious choice is > > to disallow vector types that would require any padding. I can hardly > > see where those are faster than using a vector of up to 4 char elements. > > Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit > > elements > > with 2 or one element vectors and 4-bit elements with 1 element vectors. > > I'd really like to avoid having to support something like > _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) > _BitInt(2) to say size of long long could be acceptable. I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say the element should have n (< 8) bits. > > I have no idea what the stance of supporting _BitInt in C++ are, > > but most certainly diverging support (or even semantics) of the > > vector extension in C vs. C++ is undesirable. > > I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't > look favorably to _BitInt support in C++, so at least until something like > that is standardized in C++ the answer is probably no. OK, I think that rules out _BitInt use here so while bool is then natural for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the number of bits explicitly. There is signed_bool_precision but like vector_mask it's use is restricted to the GIMPLE frontend because interaction with the rest of the language isn't defined. That said - we're mixing two things here. The desire to have "proper" svbool (fix: declare in the backend) and the desire to have "packed" bit-precision vectors (for whatever actual reason) as part of the GCC vector extension. Richard. > Jakub >
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > Padding is only an issue for very small vectors - the obvious choice is > to disallow vector types that would require any padding. I can hardly > see where those are faster than using a vector of up to 4 char elements. > Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit > elements > with 2 or one element vectors and 4-bit elements with 1 element vectors. I'd really like to avoid having to support something like _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) _BitInt(2) to say size of long long could be acceptable. > I have no idea what the stance of supporting _BitInt in C++ are, > but most certainly diverging support (or even semantics) of the > vector extension in C vs. C++ is undesirable. I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't look favorably to _BitInt support in C++, so at least until something like that is standardized in C++ the answer is probably no. Jakub
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Fri, Jul 12, 2024 at 12:44 PM Tejas Belagod wrote: > > On 7/12/24 11:46 AM, Richard Biener wrote: > > On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote: > >> > >> On 7/10/24 4:37 PM, Richard Biener wrote: > >>> On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford > >>> wrote: > > Tejas Belagod writes: > > On 7/10/24 2:38 PM, Richard Biener wrote: > >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod > >> wrote: > >>> > >>> On 7/9/24 4:22 PM, Richard Biener wrote: > On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod > wrote: > > > > On 7/8/24 4:45 PM, Richard Biener wrote: > >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod > >> wrote: > >>> > >>> Hi, > >>> > >>> Sorry to have dropped the ball on > >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, > >>> but > >>> here I've tried to pick it up again and write up a strawman > >>> proposal for > >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE. > >>> > >>> > >>> Thanks, > >>> Tejas. > >>> > >>> Motivation > >>> -- > >>> > >>> The idea of packed boolean vectors came about when we wanted to > >>> support > >>> C/C++ operators on SVE ACLE types. The current vector boolean > >>> type that > >>> ACLE specifies does not adequately disambiguate vector lane sizes > >>> which > >>> they were derived off of. Consider this simple, albeit > >>> unrealistic, example: > >>> > >>> bool foo (svint32_t a, svint32_t b) > >>> { > >>> svbool_t p = a > b; > >>> > >>> // Here p[2] is not the same as a[2] > b[2]. > >>> return p[2]; > >>> } > >>> > >>> In the above example, because svbool_t has a fixed > >>> 1-lane-per-byte, p[i] > >>> does not return the bool value corresponding to a[i] > b[i]. This > >>> necessitates a 'typed' vector boolean value that unambiguously > >>> represents results of operations > >>> of the same type. > >>> > >>> __attribute__((vector_mask)) > >>> - > >>> > >>> Note: If interested in historical discussions refer to: > >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > >>> > >>> We define this new attribute which when applied to a base data > >>> vector > >>> produces a new boolean vector type that represents a boolean type > >>> that > >>> is produced as a result of operations on the corresponding base > >>> vector > >>> type. The following is the syntax. > >>> > >>> typedef int v8si __attribute__((vector_size (8 * sizeof > >>> (int))); > >>> typedef v8si v8sib __attribute__((vector_mask)); > >>> > >>> Here the 'base' data vector type is v8si or a vector of 8 > >>> integers. > >>> > >>> Rules > >>> > >>> • The layout/size of the boolean vector type is > >>> implementation-defined > >>> for its base data vector type. > >>> > >>> • Two boolean vector types who's base data vector types have same > >>> number > >>> of elements and lane-width have the same layout and size. > >>> > >>> • Consequently, two boolean vectors who's base data vector types > >>> have > >>> different number of elements or different lane-size have > >>> different layouts. > >>> > >>> This aligns with gnu vector extensions that generate integer > >>> vectors as > >>> a result of comparisons - "The result of the comparison is a > >>> vector of > >>> the same width and number of elements as the comparison operands > >>> with a > >>> signed integral element type." according to > >>> > >>> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > >> > >> Without having the time to re-review this all in detail I think > >> the GNU > >> vector extension does not expose the result of the comparison as > >> the > >> machine would produce it but instead a comparison "decays" to > >> a conditional: > >> > >> typedef int v4si __attribute__((vector_size(16))); > >> > >> v4si a; > >> v4si b; > >> > >> void foo() > >> { > >>auto r = a < b; > >> } > >> > >> produces, with C23: > >> > >>vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 > >> } , { 0, >
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/12/24 11:46 AM, Richard Biener wrote: On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote: On 7/10/24 4:37 PM, Richard Biener wrote: On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford wrote: Tejas Belagod writes: On 7/10/24 2:38 PM, Richard Biener wrote: On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod wrote: On 7/9/24 4:22 PM, Richard Biener wrote: On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: On 7/8/24 4:45 PM, Richard Biener wrote: On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: Hi, Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE. Thanks, Tejas. Motivation -- The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example: bool foo (svint32_t a, svint32_t b) { svbool_t p = a > b; // Here p[2] is not the same as a[2] > b[2]. return p[2]; } In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations of the same type. __attribute__((vector_mask)) - Note: If interested in historical discussions refer to: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax. typedef int v8si __attribute__((vector_size (8 * sizeof (int))); typedef v8si v8sib __attribute__((vector_mask)); Here the 'base' data vector type is v8si or a vector of 8 integers. Rules • The layout/size of the boolean vector type is implementation-defined for its base data vector type. • Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size. • Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts. This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Without having the time to re-review this all in detail I think the GNU vector extension does not expose the result of the comparison as the machine would produce it but instead a comparison "decays" to a conditional: typedef int v4si __attribute__((vector_size(16))); v4si a; v4si b; void foo() { auto r = a < b; } produces, with C23: vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, 0, 0, 0 } > ; In fact on x86_64 with AVX and AVX512 you have two different "machine produced" mask types and the above could either produce a AVX mask with 32bit elements or a AVX512 mask with 1bit elements. Not exposing "native" mask types requires the compiler optimizing subsequent uses and makes generic vectors difficult to combine with for example AVX512 intrinsics (where masks are just 'int'). Across an ABI boundary it's also even more difficult to optimize mask transitions. But it at least allows portable code and it does not suffer from users trying to expose machine representations of masks as input to generic vector code with all the problems of constant folding not only requiring self-consistent code within the compiler but compatibility with user produced constant masks. That said, I somewhat question the need to expose the target mask layout to users for GCCs generic vector extension. Thanks for your feedback. IIUC, I can imagine how having a GNU vector extension exposing the target vector mask layout can pose a challenge - maybe making it a generic GNU vector extension was too ambitious. I wonder if there's value in pursuing these alternate paths? 1. Can implementing this extension in a 'generic' way i.e. possibly not implement it with a target mask, but just a generic int vector, still maintain the consistency of GNU predicate vectors within the compiler? I know it may not seem very different from how boolean vectors are currently implemented (as in your above example), but, having the __attribute__((vector_mask)) as a 'property' of the object makes it useful to optimize its uses to target predicates in subsequent stages of the compiler. 2. Res
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote: > > On 7/10/24 4:37 PM, Richard Biener wrote: > > On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford > > wrote: > >> > >> Tejas Belagod writes: > >>> On 7/10/24 2:38 PM, Richard Biener wrote: > On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod > wrote: > > > > On 7/9/24 4:22 PM, Richard Biener wrote: > >> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod > >> wrote: > >>> > >>> On 7/8/24 4:45 PM, Richard Biener wrote: > On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod > wrote: > > > > Hi, > > > > Sorry to have dropped the ball on > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > > here I've tried to pick it up again and write up a strawman > > proposal for > > elevating __attribute__((vector_mask)) to the FE from GIMPLE. > > > > > > Thanks, > > Tejas. > > > > Motivation > > -- > > > > The idea of packed boolean vectors came about when we wanted to > > support > > C/C++ operators on SVE ACLE types. The current vector boolean type > > that > > ACLE specifies does not adequately disambiguate vector lane sizes > > which > > they were derived off of. Consider this simple, albeit unrealistic, > > example: > > > >bool foo (svint32_t a, svint32_t b) > >{ > > svbool_t p = a > b; > > > > // Here p[2] is not the same as a[2] > b[2]. > > return p[2]; > >} > > > > In the above example, because svbool_t has a fixed 1-lane-per-byte, > > p[i] > > does not return the bool value corresponding to a[i] > b[i]. This > > necessitates a 'typed' vector boolean value that unambiguously > > represents results of operations > > of the same type. > > > > __attribute__((vector_mask)) > > - > > > > Note: If interested in historical discussions refer to: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > > > > We define this new attribute which when applied to a base data > > vector > > produces a new boolean vector type that represents a boolean type > > that > > is produced as a result of operations on the corresponding base > > vector > > type. The following is the syntax. > > > >typedef int v8si __attribute__((vector_size (8 * sizeof > > (int))); > >typedef v8si v8sib __attribute__((vector_mask)); > > > > Here the 'base' data vector type is v8si or a vector of 8 integers. > > > > Rules > > > > • The layout/size of the boolean vector type is > > implementation-defined > > for its base data vector type. > > > > • Two boolean vector types who's base data vector types have same > > number > > of elements and lane-width have the same layout and size. > > > > • Consequently, two boolean vectors who's base data vector types > > have > > different number of elements or different lane-size have different > > layouts. > > > > This aligns with gnu vector extensions that generate integer > > vectors as > > a result of comparisons - "The result of the comparison is a vector > > of > > the same width and number of elements as the comparison operands > > with a > > signed integral element type." according to > > https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > > Without having the time to re-review this all in detail I think the > GNU > vector extension does not expose the result of the comparison as the > machine would produce it but instead a comparison "decays" to > a conditional: > > typedef int v4si __attribute__((vector_size(16))); > > v4si a; > v4si b; > > void foo() > { > auto r = a < b; > } > > produces, with C23: > > vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } > , { 0, > 0, 0, 0 } > ; > > In fact on x86_64 with AVX and AVX512 you have two different "machine > produced" mask types and the above could either produce a AVX mask > with > 32bit elements or a AVX512 mask with 1bit elements. > > Not exposing "native" mask types requires the compiler optimizing > subsequent > uses and makes generic vector
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/10/24 4:37 PM, Richard Biener wrote: On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford wrote: Tejas Belagod writes: On 7/10/24 2:38 PM, Richard Biener wrote: On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod wrote: On 7/9/24 4:22 PM, Richard Biener wrote: On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: On 7/8/24 4:45 PM, Richard Biener wrote: On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: Hi, Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE. Thanks, Tejas. Motivation -- The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example: bool foo (svint32_t a, svint32_t b) { svbool_t p = a > b; // Here p[2] is not the same as a[2] > b[2]. return p[2]; } In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations of the same type. __attribute__((vector_mask)) - Note: If interested in historical discussions refer to: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax. typedef int v8si __attribute__((vector_size (8 * sizeof (int))); typedef v8si v8sib __attribute__((vector_mask)); Here the 'base' data vector type is v8si or a vector of 8 integers. Rules • The layout/size of the boolean vector type is implementation-defined for its base data vector type. • Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size. • Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts. This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Without having the time to re-review this all in detail I think the GNU vector extension does not expose the result of the comparison as the machine would produce it but instead a comparison "decays" to a conditional: typedef int v4si __attribute__((vector_size(16))); v4si a; v4si b; void foo() { auto r = a < b; } produces, with C23: vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, 0, 0, 0 } > ; In fact on x86_64 with AVX and AVX512 you have two different "machine produced" mask types and the above could either produce a AVX mask with 32bit elements or a AVX512 mask with 1bit elements. Not exposing "native" mask types requires the compiler optimizing subsequent uses and makes generic vectors difficult to combine with for example AVX512 intrinsics (where masks are just 'int'). Across an ABI boundary it's also even more difficult to optimize mask transitions. But it at least allows portable code and it does not suffer from users trying to expose machine representations of masks as input to generic vector code with all the problems of constant folding not only requiring self-consistent code within the compiler but compatibility with user produced constant masks. That said, I somewhat question the need to expose the target mask layout to users for GCCs generic vector extension. Thanks for your feedback. IIUC, I can imagine how having a GNU vector extension exposing the target vector mask layout can pose a challenge - maybe making it a generic GNU vector extension was too ambitious. I wonder if there's value in pursuing these alternate paths? 1. Can implementing this extension in a 'generic' way i.e. possibly not implement it with a target mask, but just a generic int vector, still maintain the consistency of GNU predicate vectors within the compiler? I know it may not seem very different from how boolean vectors are currently implemented (as in your above example), but, having the __attribute__((vector_mask)) as a 'property' of the object makes it useful to optimize its uses to target predicates in subsequent stages of the compiler. 2. Restricting __attribute__((vector_mask)) to apply only to target intrinsic types? Eg. On SVE something like: type
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford wrote: > > Tejas Belagod writes: > > On 7/10/24 2:38 PM, Richard Biener wrote: > >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod > >> wrote: > >>> > >>> On 7/9/24 4:22 PM, Richard Biener wrote: > On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod > wrote: > > > > On 7/8/24 4:45 PM, Richard Biener wrote: > >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod > >> wrote: > >>> > >>> Hi, > >>> > >>> Sorry to have dropped the ball on > >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > >>> here I've tried to pick it up again and write up a strawman proposal > >>> for > >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE. > >>> > >>> > >>> Thanks, > >>> Tejas. > >>> > >>> Motivation > >>> -- > >>> > >>> The idea of packed boolean vectors came about when we wanted to > >>> support > >>> C/C++ operators on SVE ACLE types. The current vector boolean type > >>> that > >>> ACLE specifies does not adequately disambiguate vector lane sizes > >>> which > >>> they were derived off of. Consider this simple, albeit unrealistic, > >>> example: > >>> > >>> bool foo (svint32_t a, svint32_t b) > >>> { > >>> svbool_t p = a > b; > >>> > >>> // Here p[2] is not the same as a[2] > b[2]. > >>> return p[2]; > >>> } > >>> > >>> In the above example, because svbool_t has a fixed 1-lane-per-byte, > >>> p[i] > >>> does not return the bool value corresponding to a[i] > b[i]. This > >>> necessitates a 'typed' vector boolean value that unambiguously > >>> represents results of operations > >>> of the same type. > >>> > >>> __attribute__((vector_mask)) > >>> - > >>> > >>> Note: If interested in historical discussions refer to: > >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > >>> > >>> We define this new attribute which when applied to a base data vector > >>> produces a new boolean vector type that represents a boolean type that > >>> is produced as a result of operations on the corresponding base vector > >>> type. The following is the syntax. > >>> > >>> typedef int v8si __attribute__((vector_size (8 * sizeof (int))); > >>> typedef v8si v8sib __attribute__((vector_mask)); > >>> > >>> Here the 'base' data vector type is v8si or a vector of 8 integers. > >>> > >>> Rules > >>> > >>> • The layout/size of the boolean vector type is implementation-defined > >>> for its base data vector type. > >>> > >>> • Two boolean vector types who's base data vector types have same > >>> number > >>> of elements and lane-width have the same layout and size. > >>> > >>> • Consequently, two boolean vectors who's base data vector types have > >>> different number of elements or different lane-size have different > >>> layouts. > >>> > >>> This aligns with gnu vector extensions that generate integer vectors > >>> as > >>> a result of comparisons - "The result of the comparison is a vector of > >>> the same width and number of elements as the comparison operands with > >>> a > >>> signed integral element type." according to > >>>https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > >> > >> Without having the time to re-review this all in detail I think the GNU > >> vector extension does not expose the result of the comparison as the > >> machine would produce it but instead a comparison "decays" to > >> a conditional: > >> > >> typedef int v4si __attribute__((vector_size(16))); > >> > >> v4si a; > >> v4si b; > >> > >> void foo() > >> { > >> auto r = a < b; > >> } > >> > >> produces, with C23: > >> > >> vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { > >> 0, > >> 0, 0, 0 } > ; > >> > >> In fact on x86_64 with AVX and AVX512 you have two different "machine > >> produced" mask types and the above could either produce a AVX mask with > >> 32bit elements or a AVX512 mask with 1bit elements. > >> > >> Not exposing "native" mask types requires the compiler optimizing > >> subsequent > >> uses and makes generic vectors difficult to combine with for example > >> AVX512 > >> intrinsics (where masks are just 'int'). Across an ABI boundary it's > >> also > >> even more difficult to optimize mask transitions. > >> > >> But it at least allows portable code and it does not suffer from users > >> trying to > >> expose machine representations of masks as input to generic vector code > >> with all the problems of constant folding not only requiring >
Re: [RFC] Proposal to support Packed Boolean Vector masks.
Tejas Belagod writes: > On 7/10/24 2:38 PM, Richard Biener wrote: >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod wrote: >>> >>> On 7/9/24 4:22 PM, Richard Biener wrote: On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: > > On 7/8/24 4:45 PM, Richard Biener wrote: >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod >> wrote: >>> >>> Hi, >>> >>> Sorry to have dropped the ball on >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but >>> here I've tried to pick it up again and write up a strawman proposal for >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE. >>> >>> >>> Thanks, >>> Tejas. >>> >>> Motivation >>> -- >>> >>> The idea of packed boolean vectors came about when we wanted to support >>> C/C++ operators on SVE ACLE types. The current vector boolean type that >>> ACLE specifies does not adequately disambiguate vector lane sizes which >>> they were derived off of. Consider this simple, albeit unrealistic, >>> example: >>> >>> bool foo (svint32_t a, svint32_t b) >>> { >>> svbool_t p = a > b; >>> >>> // Here p[2] is not the same as a[2] > b[2]. >>> return p[2]; >>> } >>> >>> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] >>> does not return the bool value corresponding to a[i] > b[i]. This >>> necessitates a 'typed' vector boolean value that unambiguously >>> represents results of operations >>> of the same type. >>> >>> __attribute__((vector_mask)) >>> - >>> >>> Note: If interested in historical discussions refer to: >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html >>> >>> We define this new attribute which when applied to a base data vector >>> produces a new boolean vector type that represents a boolean type that >>> is produced as a result of operations on the corresponding base vector >>> type. The following is the syntax. >>> >>> typedef int v8si __attribute__((vector_size (8 * sizeof (int))); >>> typedef v8si v8sib __attribute__((vector_mask)); >>> >>> Here the 'base' data vector type is v8si or a vector of 8 integers. >>> >>> Rules >>> >>> • The layout/size of the boolean vector type is implementation-defined >>> for its base data vector type. >>> >>> • Two boolean vector types who's base data vector types have same number >>> of elements and lane-width have the same layout and size. >>> >>> • Consequently, two boolean vectors who's base data vector types have >>> different number of elements or different lane-size have different >>> layouts. >>> >>> This aligns with gnu vector extensions that generate integer vectors as >>> a result of comparisons - "The result of the comparison is a vector of >>> the same width and number of elements as the comparison operands with a >>> signed integral element type." according to >>>https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. >> >> Without having the time to re-review this all in detail I think the GNU >> vector extension does not expose the result of the comparison as the >> machine would produce it but instead a comparison "decays" to >> a conditional: >> >> typedef int v4si __attribute__((vector_size(16))); >> >> v4si a; >> v4si b; >> >> void foo() >> { >> auto r = a < b; >> } >> >> produces, with C23: >> >> vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, >> 0, 0, 0 } > ; >> >> In fact on x86_64 with AVX and AVX512 you have two different "machine >> produced" mask types and the above could either produce a AVX mask with >> 32bit elements or a AVX512 mask with 1bit elements. >> >> Not exposing "native" mask types requires the compiler optimizing >> subsequent >> uses and makes generic vectors difficult to combine with for example >> AVX512 >> intrinsics (where masks are just 'int'). Across an ABI boundary it's >> also >> even more difficult to optimize mask transitions. >> >> But it at least allows portable code and it does not suffer from users >> trying to >> expose machine representations of masks as input to generic vector code >> with all the problems of constant folding not only requiring >> self-consistent >> code within the compiler but compatibility with user produced constant >> masks. >> >> That said, I somewhat question the need to expose the target mask layout >> to users for GCCs generic vector extension. >> > > Thanks for your feedback. > > IIUC, I can imagine how having a GNU vector extension exposing the > target vector
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/10/24 2:38 PM, Richard Biener wrote: On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod wrote: On 7/9/24 4:22 PM, Richard Biener wrote: On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: On 7/8/24 4:45 PM, Richard Biener wrote: On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: Hi, Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE. Thanks, Tejas. Motivation -- The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example: bool foo (svint32_t a, svint32_t b) { svbool_t p = a > b; // Here p[2] is not the same as a[2] > b[2]. return p[2]; } In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations of the same type. __attribute__((vector_mask)) - Note: If interested in historical discussions refer to: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax. typedef int v8si __attribute__((vector_size (8 * sizeof (int))); typedef v8si v8sib __attribute__((vector_mask)); Here the 'base' data vector type is v8si or a vector of 8 integers. Rules • The layout/size of the boolean vector type is implementation-defined for its base data vector type. • Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size. • Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts. This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Without having the time to re-review this all in detail I think the GNU vector extension does not expose the result of the comparison as the machine would produce it but instead a comparison "decays" to a conditional: typedef int v4si __attribute__((vector_size(16))); v4si a; v4si b; void foo() { auto r = a < b; } produces, with C23: vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, 0, 0, 0 } > ; In fact on x86_64 with AVX and AVX512 you have two different "machine produced" mask types and the above could either produce a AVX mask with 32bit elements or a AVX512 mask with 1bit elements. Not exposing "native" mask types requires the compiler optimizing subsequent uses and makes generic vectors difficult to combine with for example AVX512 intrinsics (where masks are just 'int'). Across an ABI boundary it's also even more difficult to optimize mask transitions. But it at least allows portable code and it does not suffer from users trying to expose machine representations of masks as input to generic vector code with all the problems of constant folding not only requiring self-consistent code within the compiler but compatibility with user produced constant masks. That said, I somewhat question the need to expose the target mask layout to users for GCCs generic vector extension. Thanks for your feedback. IIUC, I can imagine how having a GNU vector extension exposing the target vector mask layout can pose a challenge - maybe making it a generic GNU vector extension was too ambitious. I wonder if there's value in pursuing these alternate paths? 1. Can implementing this extension in a 'generic' way i.e. possibly not implement it with a target mask, but just a generic int vector, still maintain the consistency of GNU predicate vectors within the compiler? I know it may not seem very different from how boolean vectors are currently implemented (as in your above example), but, having the __attribute__((vector_mask)) as a 'property' of the object makes it useful to optimize its uses to target predicates in subsequent stages of the compiler. 2. Restricting __attribute__((vector_mask)) to apply only to target intrinsic types? Eg. On SVE something like: typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK. On AVX, something like: typedef __m256i __mask32 __attribute__((vector_mask)
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod wrote: > > On 7/9/24 4:22 PM, Richard Biener wrote: > > On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: > >> > >> On 7/8/24 4:45 PM, Richard Biener wrote: > >>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod > >>> wrote: > > Hi, > > Sorry to have dropped the ball on > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > here I've tried to pick it up again and write up a strawman proposal for > elevating __attribute__((vector_mask)) to the FE from GIMPLE. > > > Thanks, > Tejas. > > Motivation > -- > > The idea of packed boolean vectors came about when we wanted to support > C/C++ operators on SVE ACLE types. The current vector boolean type that > ACLE specifies does not adequately disambiguate vector lane sizes which > they were derived off of. Consider this simple, albeit unrealistic, > example: > > bool foo (svint32_t a, svint32_t b) > { > svbool_t p = a > b; > > // Here p[2] is not the same as a[2] > b[2]. > return p[2]; > } > > In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] > does not return the bool value corresponding to a[i] > b[i]. This > necessitates a 'typed' vector boolean value that unambiguously > represents results of operations > of the same type. > > __attribute__((vector_mask)) > - > > Note: If interested in historical discussions refer to: > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > > We define this new attribute which when applied to a base data vector > produces a new boolean vector type that represents a boolean type that > is produced as a result of operations on the corresponding base vector > type. The following is the syntax. > > typedef int v8si __attribute__((vector_size (8 * sizeof (int))); > typedef v8si v8sib __attribute__((vector_mask)); > > Here the 'base' data vector type is v8si or a vector of 8 integers. > > Rules > > • The layout/size of the boolean vector type is implementation-defined > for its base data vector type. > > • Two boolean vector types who's base data vector types have same number > of elements and lane-width have the same layout and size. > > • Consequently, two boolean vectors who's base data vector types have > different number of elements or different lane-size have different > layouts. > > This aligns with gnu vector extensions that generate integer vectors as > a result of comparisons - "The result of the comparison is a vector of > the same width and number of elements as the comparison operands with a > signed integral element type." according to > https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > >>> > >>> Without having the time to re-review this all in detail I think the GNU > >>> vector extension does not expose the result of the comparison as the > >>> machine would produce it but instead a comparison "decays" to > >>> a conditional: > >>> > >>> typedef int v4si __attribute__((vector_size(16))); > >>> > >>> v4si a; > >>> v4si b; > >>> > >>> void foo() > >>> { > >>> auto r = a < b; > >>> } > >>> > >>> produces, with C23: > >>> > >>> vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, > >>> 0, 0, 0 } > ; > >>> > >>> In fact on x86_64 with AVX and AVX512 you have two different "machine > >>> produced" mask types and the above could either produce a AVX mask with > >>> 32bit elements or a AVX512 mask with 1bit elements. > >>> > >>> Not exposing "native" mask types requires the compiler optimizing > >>> subsequent > >>> uses and makes generic vectors difficult to combine with for example > >>> AVX512 > >>> intrinsics (where masks are just 'int'). Across an ABI boundary it's also > >>> even more difficult to optimize mask transitions. > >>> > >>> But it at least allows portable code and it does not suffer from users > >>> trying to > >>> expose machine representations of masks as input to generic vector code > >>> with all the problems of constant folding not only requiring > >>> self-consistent > >>> code within the compiler but compatibility with user produced constant > >>> masks. > >>> > >>> That said, I somewhat question the need to expose the target mask layout > >>> to users for GCCs generic vector extension. > >>> > >> > >> Thanks for your feedback. > >> > >> IIUC, I can imagine how having a GNU vector extension exposing the > >> target vector mask layout can pose a challenge - maybe making it a > >> generic GNU vector extension was too ambitious. I wonder if there's > >> value in pursuing these alternate paths? > >> > >> 1. Can implementing this extension in
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/9/24 4:22 PM, Richard Biener wrote: On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: On 7/8/24 4:45 PM, Richard Biener wrote: On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: Hi, Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE. Thanks, Tejas. Motivation -- The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example: bool foo (svint32_t a, svint32_t b) { svbool_t p = a > b; // Here p[2] is not the same as a[2] > b[2]. return p[2]; } In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations of the same type. __attribute__((vector_mask)) - Note: If interested in historical discussions refer to: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax. typedef int v8si __attribute__((vector_size (8 * sizeof (int))); typedef v8si v8sib __attribute__((vector_mask)); Here the 'base' data vector type is v8si or a vector of 8 integers. Rules • The layout/size of the boolean vector type is implementation-defined for its base data vector type. • Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size. • Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts. This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Without having the time to re-review this all in detail I think the GNU vector extension does not expose the result of the comparison as the machine would produce it but instead a comparison "decays" to a conditional: typedef int v4si __attribute__((vector_size(16))); v4si a; v4si b; void foo() { auto r = a < b; } produces, with C23: vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, 0, 0, 0 } > ; In fact on x86_64 with AVX and AVX512 you have two different "machine produced" mask types and the above could either produce a AVX mask with 32bit elements or a AVX512 mask with 1bit elements. Not exposing "native" mask types requires the compiler optimizing subsequent uses and makes generic vectors difficult to combine with for example AVX512 intrinsics (where masks are just 'int'). Across an ABI boundary it's also even more difficult to optimize mask transitions. But it at least allows portable code and it does not suffer from users trying to expose machine representations of masks as input to generic vector code with all the problems of constant folding not only requiring self-consistent code within the compiler but compatibility with user produced constant masks. That said, I somewhat question the need to expose the target mask layout to users for GCCs generic vector extension. Thanks for your feedback. IIUC, I can imagine how having a GNU vector extension exposing the target vector mask layout can pose a challenge - maybe making it a generic GNU vector extension was too ambitious. I wonder if there's value in pursuing these alternate paths? 1. Can implementing this extension in a 'generic' way i.e. possibly not implement it with a target mask, but just a generic int vector, still maintain the consistency of GNU predicate vectors within the compiler? I know it may not seem very different from how boolean vectors are currently implemented (as in your above example), but, having the __attribute__((vector_mask)) as a 'property' of the object makes it useful to optimize its uses to target predicates in subsequent stages of the compiler. 2. Restricting __attribute__((vector_mask)) to apply only to target intrinsic types? Eg. On SVE something like: typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK. On AVX, something like: typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though this would require more fine-grained defn of lane-size to mask-bits mapping. I think the ta
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod wrote: > > On 7/8/24 4:45 PM, Richard Biener wrote: > > On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: > >> > >> Hi, > >> > >> Sorry to have dropped the ball on > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > >> here I've tried to pick it up again and write up a strawman proposal for > >> elevating __attribute__((vector_mask)) to the FE from GIMPLE. > >> > >> > >> Thanks, > >> Tejas. > >> > >> Motivation > >> -- > >> > >> The idea of packed boolean vectors came about when we wanted to support > >> C/C++ operators on SVE ACLE types. The current vector boolean type that > >> ACLE specifies does not adequately disambiguate vector lane sizes which > >> they were derived off of. Consider this simple, albeit unrealistic, > >> example: > >> > >> bool foo (svint32_t a, svint32_t b) > >> { > >> svbool_t p = a > b; > >> > >> // Here p[2] is not the same as a[2] > b[2]. > >> return p[2]; > >> } > >> > >> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] > >> does not return the bool value corresponding to a[i] > b[i]. This > >> necessitates a 'typed' vector boolean value that unambiguously > >> represents results of operations > >> of the same type. > >> > >> __attribute__((vector_mask)) > >> - > >> > >> Note: If interested in historical discussions refer to: > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > >> > >> We define this new attribute which when applied to a base data vector > >> produces a new boolean vector type that represents a boolean type that > >> is produced as a result of operations on the corresponding base vector > >> type. The following is the syntax. > >> > >> typedef int v8si __attribute__((vector_size (8 * sizeof (int))); > >> typedef v8si v8sib __attribute__((vector_mask)); > >> > >> Here the 'base' data vector type is v8si or a vector of 8 integers. > >> > >> Rules > >> > >> • The layout/size of the boolean vector type is implementation-defined > >> for its base data vector type. > >> > >> • Two boolean vector types who's base data vector types have same number > >> of elements and lane-width have the same layout and size. > >> > >> • Consequently, two boolean vectors who's base data vector types have > >> different number of elements or different lane-size have different layouts. > >> > >> This aligns with gnu vector extensions that generate integer vectors as > >> a result of comparisons - "The result of the comparison is a vector of > >> the same width and number of elements as the comparison operands with a > >> signed integral element type." according to > >> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > > > > Without having the time to re-review this all in detail I think the GNU > > vector extension does not expose the result of the comparison as the > > machine would produce it but instead a comparison "decays" to > > a conditional: > > > > typedef int v4si __attribute__((vector_size(16))); > > > > v4si a; > > v4si b; > > > > void foo() > > { > >auto r = a < b; > > } > > > > produces, with C23: > > > >vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, > > 0, 0, 0 } > ; > > > > In fact on x86_64 with AVX and AVX512 you have two different "machine > > produced" mask types and the above could either produce a AVX mask with > > 32bit elements or a AVX512 mask with 1bit elements. > > > > Not exposing "native" mask types requires the compiler optimizing subsequent > > uses and makes generic vectors difficult to combine with for example AVX512 > > intrinsics (where masks are just 'int'). Across an ABI boundary it's also > > even more difficult to optimize mask transitions. > > > > But it at least allows portable code and it does not suffer from users > > trying to > > expose machine representations of masks as input to generic vector code > > with all the problems of constant folding not only requiring self-consistent > > code within the compiler but compatibility with user produced constant > > masks. > > > > That said, I somewhat question the need to expose the target mask layout > > to users for GCCs generic vector extension. > > > > Thanks for your feedback. > > IIUC, I can imagine how having a GNU vector extension exposing the > target vector mask layout can pose a challenge - maybe making it a > generic GNU vector extension was too ambitious. I wonder if there's > value in pursuing these alternate paths? > > 1. Can implementing this extension in a 'generic' way i.e. possibly not > implement it with a target mask, but just a generic int vector, still > maintain the consistency of GNU predicate vectors within the compiler? I > know it may not seem very different from how boolean vectors are > currently implemented (as in your above example), but, having the > __attribute__((vector_mask)) as a 'property' of the object makes it > useful to op
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/8/24 4:45 PM, Richard Biener wrote: On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: Hi, Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE. Thanks, Tejas. Motivation -- The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example: bool foo (svint32_t a, svint32_t b) { svbool_t p = a > b; // Here p[2] is not the same as a[2] > b[2]. return p[2]; } In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations of the same type. __attribute__((vector_mask)) - Note: If interested in historical discussions refer to: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax. typedef int v8si __attribute__((vector_size (8 * sizeof (int))); typedef v8si v8sib __attribute__((vector_mask)); Here the 'base' data vector type is v8si or a vector of 8 integers. Rules • The layout/size of the boolean vector type is implementation-defined for its base data vector type. • Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size. • Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts. This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Without having the time to re-review this all in detail I think the GNU vector extension does not expose the result of the comparison as the machine would produce it but instead a comparison "decays" to a conditional: typedef int v4si __attribute__((vector_size(16))); v4si a; v4si b; void foo() { auto r = a < b; } produces, with C23: vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, 0, 0, 0 } > ; In fact on x86_64 with AVX and AVX512 you have two different "machine produced" mask types and the above could either produce a AVX mask with 32bit elements or a AVX512 mask with 1bit elements. Not exposing "native" mask types requires the compiler optimizing subsequent uses and makes generic vectors difficult to combine with for example AVX512 intrinsics (where masks are just 'int'). Across an ABI boundary it's also even more difficult to optimize mask transitions. But it at least allows portable code and it does not suffer from users trying to expose machine representations of masks as input to generic vector code with all the problems of constant folding not only requiring self-consistent code within the compiler but compatibility with user produced constant masks. That said, I somewhat question the need to expose the target mask layout to users for GCCs generic vector extension. Thanks for your feedback. IIUC, I can imagine how having a GNU vector extension exposing the target vector mask layout can pose a challenge - maybe making it a generic GNU vector extension was too ambitious. I wonder if there's value in pursuing these alternate paths? 1. Can implementing this extension in a 'generic' way i.e. possibly not implement it with a target mask, but just a generic int vector, still maintain the consistency of GNU predicate vectors within the compiler? I know it may not seem very different from how boolean vectors are currently implemented (as in your above example), but, having the __attribute__((vector_mask)) as a 'property' of the object makes it useful to optimize its uses to target predicates in subsequent stages of the compiler. 2. Restricting __attribute__((vector_mask)) to apply only to target intrinsic types? Eg. On SVE something like: typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK. On AVX, something like: typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though this would require more fine-grained defn of lane-size to mask-bits mapping. Would not be allowed on GNU Vector Extensions: typedef v4si v4sib __attribute__((vector_mask)); // Error - v
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod wrote: > > Hi, > > Sorry to have dropped the ball on > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > here I've tried to pick it up again and write up a strawman proposal for > elevating __attribute__((vector_mask)) to the FE from GIMPLE. > > > Thanks, > Tejas. > > Motivation > -- > > The idea of packed boolean vectors came about when we wanted to support > C/C++ operators on SVE ACLE types. The current vector boolean type that > ACLE specifies does not adequately disambiguate vector lane sizes which > they were derived off of. Consider this simple, albeit unrealistic, example: > >bool foo (svint32_t a, svint32_t b) >{ > svbool_t p = a > b; > > // Here p[2] is not the same as a[2] > b[2]. > return p[2]; >} > > In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] > does not return the bool value corresponding to a[i] > b[i]. This > necessitates a 'typed' vector boolean value that unambiguously > represents results of operations > of the same type. > > __attribute__((vector_mask)) > - > > Note: If interested in historical discussions refer to: > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > > We define this new attribute which when applied to a base data vector > produces a new boolean vector type that represents a boolean type that > is produced as a result of operations on the corresponding base vector > type. The following is the syntax. > >typedef int v8si __attribute__((vector_size (8 * sizeof (int))); >typedef v8si v8sib __attribute__((vector_mask)); > > Here the 'base' data vector type is v8si or a vector of 8 integers. > > Rules > > • The layout/size of the boolean vector type is implementation-defined > for its base data vector type. > > • Two boolean vector types who's base data vector types have same number > of elements and lane-width have the same layout and size. > > • Consequently, two boolean vectors who's base data vector types have > different number of elements or different lane-size have different layouts. > > This aligns with gnu vector extensions that generate integer vectors as > a result of comparisons - "The result of the comparison is a vector of > the same width and number of elements as the comparison operands with a > signed integral element type." according to > https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Without having the time to re-review this all in detail I think the GNU vector extension does not expose the result of the comparison as the machine would produce it but instead a comparison "decays" to a conditional: typedef int v4si __attribute__((vector_size(16))); v4si a; v4si b; void foo() { auto r = a < b; } produces, with C23: vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0, 0, 0, 0 } > ; In fact on x86_64 with AVX and AVX512 you have two different "machine produced" mask types and the above could either produce a AVX mask with 32bit elements or a AVX512 mask with 1bit elements. Not exposing "native" mask types requires the compiler optimizing subsequent uses and makes generic vectors difficult to combine with for example AVX512 intrinsics (where masks are just 'int'). Across an ABI boundary it's also even more difficult to optimize mask transitions. But it at least allows portable code and it does not suffer from users trying to expose machine representations of masks as input to generic vector code with all the problems of constant folding not only requiring self-consistent code within the compiler but compatibility with user produced constant masks. That said, I somewhat question the need to expose the target mask layout to users for GCCs generic vector extension. > Producers and Consumers of PBV > -- > > With GNU vector extensions, comparisons produce boolean vectors; > conditional and bitwise operators consume them. Comparison producers > generate signed integer vectors of the same lane-width as the operands > of the comparison operator. This means conditionals and bitwise > operators cannot be applied to mixed vectors that are a result of > different width operands. Eg. > >v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f) >{ > return a > b || c > d; // error! > return a > b || e < f; // OK - no explicit conversion needed. > return a > b || __builtin_convertvector (c > d, v8si); // OK. > return a | b && c | d; // error! > return a | b && __builtin_convertvector (c | d, v8si); // OK. >} > > __builtin_convertvector () needs to be applied to convert vectors to the > type one wants to do the comparison in. IoW, the integer vectors that > represent boolean vectors are 'strictly-typed'. If we extend these rules > to vector_mask, this will look like: > >typedef v8sib v8si __attribute__((vector_mask)); >typedef v8hib v8hi __attribute__((vector_mask)); >typed
[RFC] Proposal to support Packed Boolean Vector masks.
Hi, Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE. Thanks, Tejas. Motivation -- The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example: bool foo (svint32_t a, svint32_t b) { svbool_t p = a > b; // Here p[2] is not the same as a[2] > b[2]. return p[2]; } In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations of the same type. __attribute__((vector_mask)) - Note: If interested in historical discussions refer to: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax. typedef int v8si __attribute__((vector_size (8 * sizeof (int))); typedef v8si v8sib __attribute__((vector_mask)); Here the 'base' data vector type is v8si or a vector of 8 integers. Rules • The layout/size of the boolean vector type is implementation-defined for its base data vector type. • Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size. • Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts. This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Producers and Consumers of PBV -- With GNU vector extensions, comparisons produce boolean vectors; conditional and bitwise operators consume them. Comparison producers generate signed integer vectors of the same lane-width as the operands of the comparison operator. This means conditionals and bitwise operators cannot be applied to mixed vectors that are a result of different width operands. Eg. v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f) { return a > b || c > d; // error! return a > b || e < f; // OK - no explicit conversion needed. return a > b || __builtin_convertvector (c > d, v8si); // OK. return a | b && c | d; // error! return a | b && __builtin_convertvector (c | d, v8si); // OK. } __builtin_convertvector () needs to be applied to convert vectors to the type one wants to do the comparison in. IoW, the integer vectors that represent boolean vectors are 'strictly-typed'. If we extend these rules to vector_mask, this will look like: typedef v8sib v8si __attribute__((vector_mask)); typedef v8hib v8hi __attribute__((vector_mask)); typedef v8sfb v8sf __attribute__((vector_mask)); v8sib foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f) { v8sib psi = a > b; v8hib phi = c > d; v8sfb psf = e < f; return psi || phi; // error! return psi || psf; // OK - no explicit conversion needed. return psi || __builtin_convertvector (phi, v8sib); // OK. return psi | phi; // error! return psi | __builtin_convertvector (phi, v8sib); // OK. return psi | psf; // OK - no explicit conversion needed. } Now according to the rules explained above, v8sib and v8hib will have different layouts (which is why they can't be used directly without conversion if used as operands of operations). OTOH, the same rules dictate that the layout of, say v8sib and v8sfb, where v8sfb is the float base data vector equivalent of v8sib which when applied ensure that v8sib and v8sfb have the same layout and hence can be used as operands of operators without explicit conversion. This aligns with the GNU vector extensions rules where comparison of 2 v8sf vectors results in a v8si of the same lane-width and number of elements as that would result in comparison of 2 v8si vectors. Application of vector_mask to sizeless types __attribute__((vector_mask)) has the advantage that it can be applied to sizeless types seamlessly. When __attribute__((vector_mask)) is applied to a data vector that is a sizeless type, the resulting vector mask also becomes a sizeless type. Eg. typedef svpre