On Wed, Jul 26, 2023 at 9:21 AM Tejas Belagod <tejas.bela...@arm.com> wrote: > > On 7/17/23 5:46 PM, Richard Biener wrote: > > On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod <tejas.bela...@arm.com> > > wrote: > >> > >> On 7/13/23 4:05 PM, Richard Biener wrote: > >>> On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod <tejas.bela...@arm.com> > >>> wrote: > >>>> > >>>> On 7/3/23 1:31 PM, Richard Biener wrote: > >>>>> On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod <tejas.bela...@arm.com> > >>>>> wrote: > >>>>>> > >>>>>> On 6/29/23 6:55 PM, Richard Biener wrote: > >>>>>>> On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod <tejas.bela...@arm.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> From: Richard Biener <richard.guent...@gmail.com> > >>>>>>>> Date: Tuesday, June 27, 2023 at 12:58 PM > >>>>>>>> To: Tejas Belagod <tejas.bela...@arm.com> > >>>>>>>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> > >>>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > >>>>>>>> > >>>>>>>> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod > >>>>>>>> <tejas.bela...@arm.com> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> From: Richard Biener <richard.guent...@gmail.com> > >>>>>>>>> Date: Monday, June 26, 2023 at 2:23 PM > >>>>>>>>> To: Tejas Belagod <tejas.bela...@arm.com> > >>>>>>>>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> > >>>>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > >>>>>>>>> > >>>>>>>>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches > >>>>>>>>> <gcc-patches@gcc.gnu.org> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> Packed Boolean Vectors > >>>>>>>>>> ---------------------- > >>>>>>>>>> > >>>>>>>>>> I'd like to propose a feature addition to GNU Vector extensions to > >>>>>>>>>> add packed > >>>>>>>>>> boolean vectors (PBV). This has been discussed in the past > >>>>>>>>>> here[1] and a variant has > >>>>>>>>>> been implemented in Clang recently[2]. > >>>>>>>>>> > >>>>>>>>>> With predication features being added to vector architectures > >>>>>>>>>> (SVE, MVE, AVX), > >>>>>>>>>> it is a useful feature to have to model predication on targets. > >>>>>>>>>> This could > >>>>>>>>>> find its use in intrinsics or just used as is as a GNU vector > >>>>>>>>>> extension being > >>>>>>>>>> mapped to underlying target features. For example, the packed > >>>>>>>>>> boolean vector > >>>>>>>>>> could directly map to a predicate register on SVE. > >>>>>>>>>> > >>>>>>>>>> Also, this new packed boolean type GNU extension can be used with > >>>>>>>>>> SVE ACLE > >>>>>>>>>> intrinsics to replace a fixed-length svbool_t. > >>>>>>>>>> > >>>>>>>>>> Here are a few options to represent the packed boolean vector type. > >>>>>>>>> > >>>>>>>>> The GIMPLE frontend uses a new 'vector_mask' attribute: > >>>>>>>>> > >>>>>>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); > >>>>>>>>> typedef v8si v8sib __attribute__((vector_mask)); > >>>>>>>>> > >>>>>>>>> it get's you a vector type that's the appropriate (dependent on the > >>>>>>>>> target) vector > >>>>>>>>> mask type for the vector data type (v8si in this case). > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks Richard. > >>>>>>>>> > >>>>>>>>> Having had a quick look at the implementation, it does seem to tick > >>>>>>>>> the boxes. > >>>>>>>>> > >>>>>>>>> I must admit I haven't dug deep, but if the target hook allows the > >>>>>>>>> mask to be > >>>>>>>>> > >>>>>>>>> defined in way that is target-friendly (and I don't know how much > >>>>>>>>> effort it will > >>>>>>>>> > >>>>>>>>> be to migrate the attribute to more front-ends), it should do the > >>>>>>>>> job nicely. > >>>>>>>>> > >>>>>>>>> Let me go back and dig a bit deeper and get back with questions if > >>>>>>>>> any. > >>>>>>>> > >>>>>>>> > >>>>>>>> Let me add that the advantage of this is the compiler doesn't need > >>>>>>>> to support weird explicitely laid out packed boolean vectors that do > >>>>>>>> not match what the target supports and the user doesn't need to know > >>>>>>>> what the target supports (and thus have an #ifdef maze around > >>>>>>>> explicitely > >>>>>>>> specified layouts). > >>>>>>>> > >>>>>>>> Sorry for the delayed response – I spent a day experimenting with > >>>>>>>> vector_mask. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Yeah, this is what option 4 in the RFC is trying to achieve – be > >>>>>>>> portable enough > >>>>>>>> > >>>>>>>> to avoid having to sprinkle the code with ifdefs. > >>>>>>>> > >>>>>>>> > >>>>>>>> It does remove some flexibility though, for example with -mavx512f > >>>>>>>> -mavx512vl > >>>>>>>> you'll get AVX512 style masks for V4SImode data vectors but of > >>>>>>>> course the > >>>>>>>> target sill supports SSE2/AVX2 style masks as well, but those would > >>>>>>>> not be > >>>>>>>> available as "packed boolean vectors", though they are of course in > >>>>>>>> fact > >>>>>>>> equal to V4SImode data vectors with -1 or 0 values, so in this > >>>>>>>> particular > >>>>>>>> case it might not matter. > >>>>>>>> > >>>>>>>> That said, the vector_mask attribute will get you V4SImode vectors > >>>>>>>> with > >>>>>>>> signed boolean elements of 32 bits for V4SImode data vectors with > >>>>>>>> SSE2/AVX2. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> This sounds very much like what the scenario would be with NEON vs > >>>>>>>> SVE. Coming to think > >>>>>>>> > >>>>>>>> of it, vector_mask resembles option 4 in the proposal with ‘n’ > >>>>>>>> implied by the ‘base’ vector type > >>>>>>>> > >>>>>>>> and a ‘w’ specified for the type. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Given its current implementation, if vector_mask is exposed to the > >>>>>>>> CFE, would there be any > >>>>>>>> > >>>>>>>> major challenges wrt implementation or defining behaviour semantics? > >>>>>>>> I played around with a > >>>>>>>> > >>>>>>>> few examples from the testsuite and wrote some new ones. I mostly > >>>>>>>> tried operations that > >>>>>>>> > >>>>>>>> the new type would have to support (unary, binary bitwise, > >>>>>>>> initializations etc) – with a couple of exceptions > >>>>>>>> > >>>>>>>> most of the ops seem to be supported. I also triggered a couple of > >>>>>>>> ICEs in some tests involving > >>>>>>>> > >>>>>>>> implicit conversions to wider/narrower vector_mask types (will raise > >>>>>>>> reports for these). Correct me > >>>>>>>> > >>>>>>>> if I’m wrong here, but we’d probably have to support a couple of new > >>>>>>>> ops if vector_mask is exposed > >>>>>>>> > >>>>>>>> to the CFE – initialization and subscript operations? > >>>>>>> > >>>>>>> Yes, either that or restrict how the mask vectors can be used, thus > >>>>>>> properly diagnose improper > >>>>>>> uses. > >>>>>> > >>>>>> Indeed. > >>>>>> > >>>>>> A question would be for example how to write common mask test > >>>>>>> operations like > >>>>>>> if (any (mask)) or if (all (mask)). > >>>>>> > >>>>>> I see 2 options here. New builtins could support new types - they'd > >>>>>> provide a target independent way to test any and all conditions. > >>>>>> Another > >>>>>> would be to let the target use its intrinsics to do them in the most > >>>>>> efficient way possible (which the builtins would get lowered down to > >>>>>> anyway). > >>>>>> > >>>>>> > >>>>>> Likewise writing merge operations > >>>>>>> - do those as > >>>>>>> > >>>>>>> a = a | (mask ? b : 0); > >>>>>>> > >>>>>>> thus use ternary ?: for this? > >>>>>> > >>>>>> Yes, like now, the ternary could just translate to > >>>>>> > >>>>>> {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... } > >>>>>> > >>>>>> One thing to flesh out is the semantics. Should we allow this operation > >>>>>> as long as the number of elements are the same even if the mask type if > >>>>>> different i.e. > >>>>>> > >>>>>> v4hib ? v4si : v4si; > >>>>>> > >>>>>> I don't see why this can't be allowed as now we let > >>>>>> > >>>>>> v4si ? v4sf : v4sf; > >>>>>> > >>>>>> > >>>>>> For initialization regular vector > >>>>>>> syntax should work: > >>>>>>> > >>>>>>> mtype mask = (mtype){ -1, -1, 0, 0, ... }; > >>>>>>> > >>>>>>> there's the question of the signedness of the mask elements. GCC > >>>>>>> internally uses signed > >>>>>>> bools with values -1 for true and 0 for false. > >>>>>> > >>>>>> One of the things is the value that represents true. This is largely > >>>>>> target-dependent when it comes to the vector_mask type. When > >>>>>> vector_mask > >>>>>> types are created from GCC's internal representation of bool vectors > >>>>>> (signed ints) the point about implicit/explicit conversions from signed > >>>>>> int vect to mask types in the proposal covers this. So mask in > >>>>>> > >>>>>> v4sib mask = (v4sib){-1, -1, 0, 0, ... } > >>>>>> > >>>>>> will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx > >>>>>> on SVE. On AVX2/SSE they'd still be represented as vector of signed > >>>>>> ints > >>>>>> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications > >>>>>> this > >>>>>> new mask type representations will have in the mid-end while being > >>>>>> converted back and forth to and from GCC's internal representation, but > >>>>>> I'm guessing this is already being handled at some level by the > >>>>>> vector_mask type's current support? > >>>>> > >>>>> Yes, I would guess so. Of course what the middle-end is currently > >>>>> exposed > >>>>> to is simply what the vectorizer generates - once fuzzers discover this > >>>>> feature > >>>>> we'll see "interesting" uses that might run into missed or wrong > >>>>> handling of > >>>>> them. > >>>>> > >>>>> So whatever we do on the side of exposing this to users a good portion > >>>>> of testsuite coverage for the allowed use cases is important. > >>>>> > >>>>> Richard. > >>>>> > >>>> > >>>> Apologies for the long-ish reply, but here's a TLDR and gory details > >>>> follow. > >>>> > >>>> TLDR: > >>>> GIMPLE's vector_mask type semantics seems to be target-dependent, so > >>>> elevating vector_mask to CFE with same semantics is undesirable. OTOH, > >>>> changing vector_mask to have target-independent CFE semantics will cause > >>>> dichotomy between its CFE and GFE behaviours. But vector_mask approach > >>>> scales well for sizeless types. Is the solution to have something like > >>>> vector_mask with defined target-independent type semantics, but call it > >>>> something else to prevent conflation with GIMPLE, a viable option? > >>>> > >>>> Details: > >>>> After some more analysis of the proposed options, here are some > >>>> interesting findings: > >>>> > >>>> vector_mask looked like a very interesting option until I ran into some > >>>> semantic uncertainly. This code: > >>>> > >>>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); > >>>> typedef v8si v8sib __attribute__((vector_mask)); > >>>> > >>>> typedef short v8hi __attribute__((vector_size(8*sizeof(short)))); > >>>> typedef v8hi v8hib __attribute__((vector_mask)); > >>>> > >>>> v8si res; > >>>> v8hi resh; > >>>> > >>>> v8hib __GIMPLE () foo (v8hib x, v8sib y) > >>>> { > >>>> v8hib res; > >>>> > >>>> res = x & y; > >>>> return res; > >>>> } > >>>> > >>>> When compiled on AArch64, produces a type-mismatch error for binary > >>>> expression involving '&' because the 'derived' types 'v8hib' and 'v8sib' > >>>> have a different target-layout. If the layout of these two 'derived' > >>>> types match, then the above code has no issue. Which is the case on > >>>> amdgcn-amdhsa target where it compiles without any error(amdgcn uses a > >>>> scalar DImode mask mode). IoW such code seems to be allowed on some > >>>> targets and not on others. > >>>> > >>>> With the same code, I tried putting casts and it worked fine on AArch64 > >>>> and amdgcn. This target-specific behaviour of vector_mask derived types > >>>> will be difficult to specify once we move it to the CFE - in fact we > >>>> probably don't want target-specific behaviour once it moves to the CFE. > >>>> > >>>> If we expose vector_mask to CFE, we'd have to specify consistent > >>>> semantics for vector_mask types. We'd have to resolve ambiguities like > >>>> 'v4hib & v4sib' clearly to be able to specify the semantics of the type > >>>> system involving vector_mask. If we do this, don't we run the risk of a > >>>> dichotomy between the CFE and GFE semantics of vector_mask? I'm assuming > >>>> we'd want to retain vector_mask semantics as they are in GIMPLE. > >>>> > >>>> If we want to enforce constant semantics for vector_mask in the CFE, one > >>>> way is to treat vector_mask types as distinct if they're 'attached' to > >>>> distinct data vector types. In such a scenario, vector_mask types > >>>> attached to two data vector types with the same lane-width and number of > >>>> lanes would be classified as distinct. For eg: > >>>> > >>>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); > >>>> typedef v8si v8sib __attribute__((vector_mask)); > >>>> > >>>> typedef float v8sf __attribute__((vector_size(8*sizeof(float)))); > >>>> typedef v8sf v8sfb __attribute__((vector_mask)); > >>>> > >>>> v8si foo (v8sf x, v8sf y, v8si i, v8si j) > >>>> { > >>>> (a == b) & (v8sfb)(x == y) ? x : (v8si){0}; > >>>> } > >>>> > >>>> This could be the case for unsigned vs signed int vectors too for eg - > >>>> seems a bit unnecessary tbh. > >>>> > >>>> Though vector_mask's being 'attached' to a type has its drawbacks, it > >>>> does seem to have an advantage when sizeless types are considered. If we > >>>> have to define a sizeless vector boolean type that is implied by the > >>>> lane size, we could do something like > >>>> > >>>> typedef svint32_t svbool32_t __attribute__((vector_mask)); > >>>> > >>>> int32_t foo (svint32_t a, svint32_t b) > >>>> { > >>>> svbool32_t pred = a > b; > >>>> > >>>> return pred[2] ? a[2] : b[2]; > >>>> } > >>>> > >>>> This is harder to do in the other schemes proposed so far as they're > >>>> size-based. > >>>> > >>>> To be able to free the boolean from the base type (not size) and retain > >>>> vector_mask's flexibility to declare sizeless types, we could have an > >>>> attribute that is more flexibly-typed and only 'derives' the lane-size > >>>> and number of lanes from its 'base' type without actually inheriting the > >>>> actual base type(char, short, int etc) or its signedness. This creates a > >>>> purer and stand-alone boolean type without the associated semantics' > >>>> complexity of having to cast between two same-size types with the same > >>>> number of lanes. Eg. > >>>> > >>>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); > >>>> typedef v8si v8b __attribute__((vector_bool)); > >>>> > >>>> However, with differing lane-sizes, there will have to be a cast as the > >>>> 'derived' element size is different which could impact the layout of the > >>>> vector mask. Eg. > >>>> > >>>> v8si foo (v8hi x, v8hi y, v8si i, v8si j) > >>>> { > >>>> (v8sib)(x == y) & (i == j) ? i : (v8si){0}; > >>>> } > >>>> > >>>> Such conversions on targets like AVX512/AMDGCN will be a NOP, but > >>>> non-trivial on SVE (depending on the implemented layout of the bool > >>>> vector). > >>>> > >>>> vector_bool decouples us from having to retain the behaviour of > >>>> vector_mask and provides the flexibility of not having to cast across > >>>> same-element-size vector types. Wrt to sizeless types, it could scale > >>>> well. > >>>> > >>>> typedef svint32_t svbool32_t __attribute__((vector_bool)); > >>>> typedef svint16_t svbool16_t __attribute__((vector_bool)); > >>>> > >>>> int32_t foo (svint32_t a, svint32_t b) > >>>> { > >>>> svbool32_t pred = a > b; > >>>> > >>>> return pred[2] ? a[2] : b[2]; > >>>> } > >>>> > >>>> int16_t bar (svint16_t a, svint16_t b) > >>>> { > >>>> svbool16_t pred = a > b; > >>>> > >>>> return pred[2] ? a[2] : b[2]; > >>>> } > >>>> > >>>> On SVE, pred[2] refers to bit 4 for svint16_t and bit 8 for svint32_t on > >>>> the target predicate. > >>>> > >>>> Thoughts? > >>> > >>> The GIMPLE frontend accepts just what is valid on the target here. Any > >>> "plumbing" such as implicit conversions (if we do not want to require > >>> explicit ones even when NOP) need to be done/enforced by the C frontend. > >>> > >> > >> Sorry, I'm not sure I follow - correct me if I'm wrong here. > >> > >> If we desire to define/allow operations like implicit/explicit > >> conversion on vector_mask types in CFE, don't we have to start from a > >> position of defining what makes vector_mask types distinct and therefore > >> require implicit/explicit conversions? > > > > We need to look at which operations we want to produce vector masks and > > which operations consume them and what operations operate on them. > > > > In GIMPLE comparisons produce them, conditionals consume them and > > we allow bitwise ops to operate on them directly (GIMPLE doesn't have > > logical && it just has bitwise &). > > > > Thanks for your thoughts - after I spent more cycles researching and > experimenting, I think I understand the driving factors here. Comparison > producers generate signed integer vectors of the same lane-width as the > comparison operands. This means mixed type vectors can't be applied to > conditional consumers or bitwise operators eg: > > v8hi foo (v8si a, v8si b, v8hi c, v8hi d) > { > return a > b || c > d; // error! > return a > b || __builtin_convertvector (c > d, v8si); // OK. > return a | b && c | d; // error! > return a | b && __builtin_convertvector (c | d, v8si); // OK. > } > > Similarly, if we extend these 'stricter-typing' rules to vector_mask, it > could look like: > > typedef v4sib v4si __attribute__((vector_mask)); > typedef v4hib v4hi __attribute__((vector_mask)); > > v8sib foo (v8si a, v8si b, v8hi c, v8hi d) > { > v8sib psi = a > b; > v8hib phi = c > d; > > return psi || phi; // error! > return psi || __builtin_convertvector (phi, v8sib); // OK. > return psi | phi; // error! > return psi | __builtin_convertvector (phi, v8sib); // OK. > } > > At GIMPLE stage, on targets where the layout allows it (eg AMDGCN), > expressions like > psi | __builtin_convertvector (phi, v8sib) > can be optimized to > psi | phi > because __builtin_convertvector (phi, v8sib) is a NOP. > > I think this could make vector_mask more portable across targets. If one > wants to take CFE vector_mask code and run it on the GFE, it should > work; while the reverse won't as CFE vector_mask rules are more restrictive. > > Does this look like a sensible approach for progress?
Yes, that looks good. > >> IIUC, GFE's distinctness of vector_mask types depends on how the mask > >> mode is implemented on the target. If implemented in CFE, vector_mask > >> types' distinctness probably shouldn't be based on target layout and > >> could be based on the type they're 'attached' to. > > > > But since we eventually run on the target the layout should ideally > > match that of the target. Now, the question is whether that's ever > > OK behavior - it effectively makes the mask somewhat opaque and > > only "observable" by probing it in defined manners. > > > >> Wouldn't that diverge from target-specific GFE behaviour - or are you > >> suggesting its OK for vector_mask type semantics to be different in CFE > >> and GFE? > > > > It's definitely undesirable but as said I'm not sure it has to differ > > [the layout]. > > > > I agree it is best to have a consistent layout of vector_mask across CFE > and GFE and also implement it to match the target layout for optimal > code quality. > > For observability, I think it makes sense to allow operations that are > relevant and have a consistent meaning irrespective of that target. Eg. > 'vector_mask & 2' might not mean the same thing on all targets, but > vector_mask[2] does. Therefore, I think the opaqueness is useful and > necessary to some extent. Yes. The main question regarding to observability will be things like sizeof or alignof and putting masks into addressable storage. I think IBM folks have introduced some "opaque" types for their matrix-multiplication accelerator where intrinsics need something to work with but the observability of many aspect is restricted. In the middle-end we have OPAQUE_TYPE and MODE_OPAQUE (but IIRC there can only be a single kind of that at the moment). Interestingly an OPAQUE_TYPE does have a size. Note one way out would be to make vector_mask types "decay" to a value vector type. Thus any time you try to observe it you get a vector bool (a vector of actual 8 bit bool data elements) and when you use a vector data type in mask context you get a "mask conversion" aka vector bool != 0. It would then be up to the compiler to elide round-trips between mask and data. That would mean sizeof (vector_mask) would be sizeof (vector bool) even when for example the hardware would produce V4SImode mask from a V4SFmode compare or when it would produce a QImode 4-bit integer from the same? Richard. > Thanks, > Tejas. > > >>> There's one issue I can see that wasn't mentioned yet - GCC currently > >>> accepts > >>> > >>> typedef long gv1024di __attribute__((vector_size(1024*8))); > >>> > >>> even if there's no underlying support on the target which either has > >>> support > >>> only for smaller vectors or no vectors at all. Currently vector_mask will > >>> simply fail to produce sth desirable here. What's your idea of making > >>> that not target dependent? GCC will later lower operations with such > >>> vectors, possibly splitting them up into sizes supported by the hardware > >>> natively, possibly performing elementwise operations. For the former > >>> one would need to guess the "decomposition type" and based on that > >>> select the mask type [layout]? > >>> > >>> One idea would be to specify the mask layout follows the largest vector > >>> kind supported by the target and if there is none follow the layout > >>> of (signed?) _Bool [n]? When there's no target support for vectors > >>> GCC will generally use elementwise operations apart from some > >>> special-cases. > >>> > >> > >> That is a very good point - thanks for raising it. For when GCC chooses > >> to lower to a vector type supported by the target, my initial thought > >> would be to, as you say, choose a mask that has enough bits to represent > >> the largest vector size with the smallest lane-width. The actual layout > >> of the mask will depend on how the target implements its mask mode. > >> Decomposition of vector_mask ought to follow the decomposition of the > >> GNU vectors type and each decomposed vector_mask type ought to have > >> enough bits to represent the decomposed GNU vector shape. It sounds nice > >> on paper, but I haven't really worked through a design for this. Do you > >> see any gotchas here? > > > > Not really. In the end it comes down to what the C writer is allowed to > > do with a vector mask. I would for example expect that I could do > > > > auto m = v1 < v2; > > _mm512_mask_sub_epi32 (a, m, b, c); > > > > so generic masks should inter-operate with intrinsics (when the appropriate > > ISA is enabled). That works for the data vectors themselves for example > > (quite some intrinsics are implemented with GCCs generic vector code). > > > > I for example can't do > > > > _Bool lane2 = m[2]; > > > > to inspect lane two of a maks with AVX512. I can do m & 2 but I wouldn't > > expect > > that to work (should I?) with a vector_mask mask (it's at least not > > valid directly > > in GIMPLE). There's _mm512_int2mask and _mm512_mask2int which transfer > > between mask and int (but the mask types are really just typedefd to > > integer typeS). > > > >>> While using a different name than vector_mask is certainly possible > >>> it wouldn't me to decide that, but I'm also not yet convinced it's > >>> really necessary. As said, what the GIMPLE frontend accepts > >>> or not shouldn't limit us here - just the actual chosen layout of the > >>> boolean vectors. > >>> > >> > >> I'm just concerned about creating an alternate vector_mask functionality > >> in the CFE and risk not being consistent with GFE. > > > > I think it's more important to double-check usablilty from the users side. > > If the implementation necessarily diverges from GIMPLE then we can > > choose a different attribute name but then it will also inevitably have > > code-generation (quality) issues as GIMPLE matches what the hardware > > can do. > > > > Richard. > > > >> Thanks, > >> Tejas. > >> > >>> Richard. > >>> > >>>> Thanks, > >>>> Tejas. > >>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Tejas. > >>>>>> > >>>>>>> > >>>>>>> Richard. > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> Tejas. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Richard. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> Tejas. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> 1. __attribute__((vector_size (n))) where n represents bytes > >>>>>>>>>> > >>>>>>>>>> typedef bool vbool __attribute__ ((vector_size (1))); > >>>>>>>>>> > >>>>>>>>>> In this approach, the shape of the boolean vector is unclear. IoW, > >>>>>>>>>> it is not > >>>>>>>>>> clear if each bit in 'n' controls a byte or an element. On targets > >>>>>>>>>> like SVE, it would be natural to have each bit control a byte of > >>>>>>>>>> the target > >>>>>>>>>> vector (therefore resulting in an 'unpacked' layout of the PBV) > >>>>>>>>>> and on AVX, each > >>>>>>>>>> bit would control one element/lane on the target vector(therefore > >>>>>>>>>> resulting in a > >>>>>>>>>> 'packed' layout with all significant bits at the LSB). > >>>>>>>>>> > >>>>>>>>>> 2. __attribute__((vector_size (n))) where n represents num of lanes > >>>>>>>>>> > >>>>>>>>>> typedef int v4si __attribute__ ((vector_size (4 * sizeof > >>>>>>>>>> (int))); > >>>>>>>>>> typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / > >>>>>>>>>> sizeof (v4si){0}[0]))); > >>>>>>>>>> > >>>>>>>>>> Here the 'n' in the vector_size attribute represents the number of > >>>>>>>>>> bits that > >>>>>>>>>> is needed to represent a vector quantity. In this case, this > >>>>>>>>>> packed boolean > >>>>>>>>>> vector can represent upto 'n' vector lanes. The size of the type is > >>>>>>>>>> rounded up the nearest byte. For example, the sizeof v4bi in the > >>>>>>>>>> above > >>>>>>>>>> example is 1. > >>>>>>>>>> > >>>>>>>>>> In this approach, because of the nature of the representation, the > >>>>>>>>>> n bits required > >>>>>>>>>> to represent the n lanes of the vector are packed at the LSB. This > >>>>>>>>>> does not naturally > >>>>>>>>>> align with the SVE approach of each bit representing a byte of the > >>>>>>>>>> target vector > >>>>>>>>>> and PBV therefore having an 'unpacked' layout. > >>>>>>>>>> > >>>>>>>>>> More importantly, another drawback here is that the change in > >>>>>>>>>> units for vector_size > >>>>>>>>>> might be confusing to programmers. The units will have to be > >>>>>>>>>> interpreted based on the > >>>>>>>>>> base type of the typedef. It does not offer any flexibility in > >>>>>>>>>> terms of the layout of > >>>>>>>>>> the bool vector - it is fixed. > >>>>>>>>>> > >>>>>>>>>> 3. Combination of 1 and 2. > >>>>>>>>>> > >>>>>>>>>> Combining the best of 1 and 2, we can introduce extra parameters > >>>>>>>>>> to vector_size that will > >>>>>>>>>> unambiguously represent the layout of the PBV. Consider > >>>>>>>>>> > >>>>>>>>>> typedef bool vbool __attribute__((vector_size (s, n[, w]))); > >>>>>>>>>> > >>>>>>>>>> where 's' is size in bytes, 'n' is the number of lanes and an > >>>>>>>>>> optional 3rd parameter 'w' > >>>>>>>>>> is the number of bits of the PBV that represents a lane of the > >>>>>>>>>> target vector. 'w' would > >>>>>>>>>> allow a target to force a certain layout of the PBV. > >>>>>>>>>> > >>>>>>>>>> The 2-parameter form of vector_size allows the target to have an > >>>>>>>>>> implementation-defined layout of the PBV. The target is free to > >>>>>>>>>> choose the 'w' > >>>>>>>>>> if it is not specified to mirror the target layout of predicate > >>>>>>>>>> registers. For > >>>>>>>>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n. > >>>>>>>>>> > >>>>>>>>>> As an example, to represent the result of a comparison on 2 > >>>>>>>>>> int16x8_t, we'd need > >>>>>>>>>> 8 lanes of boolean which could be represented by > >>>>>>>>>> > >>>>>>>>>> typedef bool v8b __attribute__ ((vector_size (2, 8))); > >>>>>>>>>> > >>>>>>>>>> SVE would implement v8b layout to make every 2nd bit significant > >>>>>>>>>> i.e. w == 2 > >>>>>>>>>> > >>>>>>>>>> and AVX would choose a layout where all 8 consecutive bits packed > >>>>>>>>>> at LSB would > >>>>>>>>>> be significant i.e. w == 1. > >>>>>>>>>> > >>>>>>>>>> This scheme would accomodate more than 1 target to effectively > >>>>>>>>>> represent vector > >>>>>>>>>> bools that mirror the target properties. > >>>>>>>>>> > >>>>>>>>>> 4. A new attribite > >>>>>>>>>> > >>>>>>>>>> This is based on a suggestion from Richard S in [3]. The idea is > >>>>>>>>>> to introduce a new > >>>>>>>>>> attribute to define the PBV and make it general enough to > >>>>>>>>>> > >>>>>>>>>> * represent all targets flexibly (SVE, AVX etc) > >>>>>>>>>> * represent sub-byte length predicates > >>>>>>>>>> * have no change in units of vector_size/no new vector_size > >>>>>>>>>> signature > >>>>>>>>>> * not have the number of bytes constrain representation > >>>>>>>>>> > >>>>>>>>>> If we call the new attribute 'bool_vec' (for lack of a better > >>>>>>>>>> name), consider > >>>>>>>>>> > >>>>>>>>>> typedef bool vbool __attribute__((bool_vec (n[, w]))) > >>>>>>>>>> > >>>>>>>>>> where 'n' represents number of lanes/elements and the optional 'w' > >>>>>>>>>> is bits-per-lane. > >>>>>>>>>> > >>>>>>>>>> If 'w' is not specified, it and bytes-per-predicate are > >>>>>>>>>> implementation-defined based on target. > >>>>>>>>>> If 'w' is specified, sizeof (vbool) will be ceil (n*w/8). > >>>>>>>>>> > >>>>>>>>>> 5. Behaviour of the packed vector boolean type. > >>>>>>>>>> > >>>>>>>>>> Taking the example of one of the options above, following is an > >>>>>>>>>> illustration of it's behavior > >>>>>>>>>> > >>>>>>>>>> * ABI > >>>>>>>>>> > >>>>>>>>>> New ABI rules will need to be defined for this type - eg > >>>>>>>>>> alignment, PCS, > >>>>>>>>>> mangling etc > >>>>>>>>>> > >>>>>>>>>> * Initialization: > >>>>>>>>>> > >>>>>>>>>> Packed Boolean Vectors(PBV) can be initialized like so: > >>>>>>>>>> > >>>>>>>>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 4))); > >>>>>>>>>> v4bi p = {false, true, false, false}; > >>>>>>>>>> > >>>>>>>>>> Each value in the initizlizer constant is of type bool. The > >>>>>>>>>> lowest numbered > >>>>>>>>>> element in the const array corresponds to the LSbit of p, > >>>>>>>>>> element 1 is > >>>>>>>>>> assigned to bit 4 etc. > >>>>>>>>>> > >>>>>>>>>> p is effectively a 2-byte bitmask with value 0x0010 > >>>>>>>>>> > >>>>>>>>>> With a different layout > >>>>>>>>>> > >>>>>>>>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 1))); > >>>>>>>>>> v4bi p = {false, true, false, false}; > >>>>>>>>>> > >>>>>>>>>> p is effectively a 2-byte bitmask with value 0x0002 > >>>>>>>>>> > >>>>>>>>>> * Operations: > >>>>>>>>>> > >>>>>>>>>> Packed Boolean Vectors support the following operations: > >>>>>>>>>> . unary ~ > >>>>>>>>>> . unary ! > >>>>>>>>>> . binary&,|andˆ > >>>>>>>>>> . assignments &=, |= and ˆ= > >>>>>>>>>> . comparisons <, <=, ==, !=, >= and > > >>>>>>>>>> . Ternary operator ?: > >>>>>>>>>> > >>>>>>>>>> Operations are defined as applied to the individual elements > >>>>>>>>>> i.e the bits > >>>>>>>>>> that are significant in the PBV. Whether the PBVs are > >>>>>>>>>> treated as bitmasks > >>>>>>>>>> or otherwise is implementation-defined. > >>>>>>>>>> > >>>>>>>>>> Insignificant bits could affect results of comparisons or > >>>>>>>>>> ternary operators. > >>>>>>>>>> In such cases, it is implementation defined how the unused > >>>>>>>>>> bits are treated. > >>>>>>>>>> > >>>>>>>>>> . Subscript operator [] > >>>>>>>>>> > >>>>>>>>>> For the subscript operator, the packed boolean vector acts > >>>>>>>>>> like a array of > >>>>>>>>>> elements - the first or the 0th indexed element being the > >>>>>>>>>> LSbit of the PBV. > >>>>>>>>>> Subscript operator yields a scalar boolean value. > >>>>>>>>>> For example: > >>>>>>>>>> > >>>>>>>>>> typedef bool v8b __attribute__ ((vector_size (2, 8, 2))); > >>>>>>>>>> > >>>>>>>>>> // Subscript operator result yields a boolean value. > >>>>>>>>>> // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x. > >>>>>>>>>> bool foo (v8b p, int n) { p[3] = true; return p[1]; } > >>>>>>>>>> > >>>>>>>>>> Out of bounds access: OOB access can be determined at > >>>>>>>>>> compile time given the > >>>>>>>>>> strong typing of the PBVs. > >>>>>>>>>> > >>>>>>>>>> PBV does not support address of operator(&) for elements of > >>>>>>>>>> PBVs. > >>>>>>>>>> > >>>>>>>>>> . Implicit conversion from integer vectors to PBVs > >>>>>>>>>> > >>>>>>>>>> We would like to support the output of comparison operations > >>>>>>>>>> to be PBVs. This > >>>>>>>>>> requires us to define the implicit conversion from an > >>>>>>>>>> integer vector to PBV > >>>>>>>>>> as the result of vector comparisons are integer vectors. > >>>>>>>>>> > >>>>>>>>>> To define this operation: > >>>>>>>>>> > >>>>>>>>>> bool_vector = vector <cmpop> vector > >>>>>>>>>> > >>>>>>>>>> There is no change in how vector <cmpop> vector behavior > >>>>>>>>>> i.e. this comparison > >>>>>>>>>> would still produce an int_vector type as it does now. > >>>>>>>>>> > >>>>>>>>>> temp_int_vec = vector <cmpop> vector > >>>>>>>>>> bool_vec = temp_int_vec // Implicit conversion from > >>>>>>>>>> int_vec to bool_vec > >>>>>>>>>> > >>>>>>>>>> The implicit conversion from int_vec to bool I'd define > >>>>>>>>>> simply to be: > >>>>>>>>>> > >>>>>>>>>> bool_vec[n] = (_Bool) int_vec[n] > >>>>>>>>>> > >>>>>>>>>> where the C11 standard rules apply > >>>>>>>>>> 6.3.1.2 Boolean type When any scalar value is converted to > >>>>>>>>>> _Bool, the result > >>>>>>>>>> is 0 if the value compares equal to 0; otherwise, the result > >>>>>>>>>> is 1. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html > >>>>>>>>>> [2] https://reviews.llvm.org/D88905 > >>>>>>>>>> [3] https://reviews.llvm.org/D81083 > >>>>>>>>>> > >>>>>>>>>> Thoughts? > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Tejas. > >>>>>> > >>>> > >> >