Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

Richard Biener via Gcc-patches Mon, 03 Jul 2023 01:04:44 -0700

On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod <[email protected]> wrote:
>
> On 6/29/23 6:55 PM, Richard Biener wrote:
> > On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod <[email protected]> wrote:
> >>
> >>
> >>
> >>
> >>
> >> From: Richard Biener <[email protected]>
> >> Date: Tuesday, June 27, 2023 at 12:58 PM
> >> To: Tejas Belagod <[email protected]>
> >> Cc: [email protected] <[email protected]>
> >> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >>
> >> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod <[email protected]> 
> >> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Richard Biener <[email protected]>
> >>> Date: Monday, June 26, 2023 at 2:23 PM
> >>> To: Tejas Belagod <[email protected]>
> >>> Cc: [email protected] <[email protected]>
> >>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >>>
> >>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
> >>> <[email protected]> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Packed Boolean Vectors
> >>>> ----------------------
> >>>>
> >>>> I'd like to propose a feature addition to GNU Vector extensions to add 
> >>>> packed
> >>>> boolean vectors (PBV).  This has been discussed in the past here[1] and 
> >>>> a variant has
> >>>> been implemented in Clang recently[2].
> >>>>
> >>>> With predication features being added to vector architectures (SVE, MVE, 
> >>>> AVX),
> >>>> it is a useful feature to have to model predication on targets.  This 
> >>>> could
> >>>> find its use in intrinsics or just used as is as a GNU vector extension 
> >>>> being
> >>>> mapped to underlying target features.  For example, the packed boolean 
> >>>> vector
> >>>> could directly map to a predicate register on SVE.
> >>>>
> >>>> Also, this new packed boolean type GNU extension can be used with SVE 
> >>>> ACLE
> >>>> intrinsics to replace a fixed-length svbool_t.
> >>>>
> >>>> Here are a few options to represent the packed boolean vector type.
> >>>
> >>> The GIMPLE frontend uses a new 'vector_mask' attribute:
> >>>
> >>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> >>> typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> it get's you a vector type that's the appropriate (dependent on the
> >>> target) vector
> >>> mask type for the vector data type (v8si in this case).
> >>>
> >>>
> >>>
> >>> Thanks Richard.
> >>>
> >>> Having had a quick look at the implementation, it does seem to tick the 
> >>> boxes.
> >>>
> >>> I must admit I haven't dug deep, but if the target hook allows the mask 
> >>> to be
> >>>
> >>> defined in way that is target-friendly (and I don't know how much effort 
> >>> it will
> >>>
> >>> be to migrate the attribute to more front-ends), it should do the job 
> >>> nicely.
> >>>
> >>> Let me go back and dig a bit deeper and get back with questions if any.
> >>
> >>
> >> Let me add that the advantage of this is the compiler doesn't need
> >> to support weird explicitely laid out packed boolean vectors that do
> >> not match what the target supports and the user doesn't need to know
> >> what the target supports (and thus have an #ifdef maze around explicitely
> >> specified layouts).
> >>
> >> Sorry for the delayed response – I spent a day experimenting with 
> >> vector_mask.
> >>
> >>
> >>
> >> Yeah, this is what option 4 in the RFC is trying to achieve – be portable 
> >> enough
> >>
> >> to avoid having to sprinkle the code with ifdefs.
> >>
> >>
> >> It does remove some flexibility though, for example with -mavx512f 
> >> -mavx512vl
> >> you'll get AVX512 style masks for V4SImode data vectors but of course the
> >> target sill supports SSE2/AVX2 style masks as well, but those would not be
> >> available as "packed boolean vectors", though they are of course in fact
> >> equal to V4SImode data vectors with -1 or 0 values, so in this particular
> >> case it might not matter.
> >>
> >> That said, the vector_mask attribute will get you V4SImode vectors with
> >> signed boolean elements of 32 bits for V4SImode data vectors with
> >> SSE2/AVX2.
> >>
> >>
> >>
> >> This sounds very much like what the scenario would be with NEON vs SVE. 
> >> Coming to think
> >>
> >> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by 
> >> the ‘base’ vector type
> >>
> >> and a ‘w’ specified for the type.
> >>
> >>
> >>
> >> Given its current implementation, if vector_mask is exposed to the CFE, 
> >> would there be any
> >>
> >> major challenges wrt implementation or defining behaviour semantics? I 
> >> played around with a
> >>
> >> few examples from the testsuite and wrote some new ones. I mostly tried 
> >> operations that
> >>
> >> the new type would have to support (unary, binary bitwise, initializations 
> >> etc) – with a couple of exceptions
> >>
> >> most of the ops seem to be supported. I also triggered a couple of ICEs in 
> >> some tests involving
> >>
> >> implicit conversions to wider/narrower vector_mask types (will raise 
> >> reports for these). Correct me
> >>
> >> if I’m wrong here, but we’d probably have to support a couple of new ops 
> >> if vector_mask is exposed
> >>
> >> to the CFE – initialization and subscript operations?
> >
> > Yes, either that or restrict how the mask vectors can be used, thus
> > properly diagnose improper
> > uses.
>
> Indeed.
>
>   A question would be for example how to write common mask test
> > operations like
> > if (any (mask)) or if (all (mask)).
>
> I see 2 options here. New builtins could support new types - they'd
> provide a target independent way to test any and all conditions. Another
> would be to let the target use its intrinsics to do them in the most
> efficient way possible (which the builtins would get lowered down to
> anyway).
>
>
>   Likewise writing merge operations
> > - do those as
> >
> >   a = a | (mask ? b : 0);
> >
> > thus use ternary ?: for this?
>
> Yes, like now, the ternary could just translate to
>
>    {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... }
>
> One thing to flesh out is the semantics. Should we allow this operation
> as long as the number of elements are the same even if the mask type if
> different i.e.
>
>    v4hib ? v4si : v4si;
>
> I don't see why this can't be allowed as now we let
>
>    v4si ? v4sf : v4sf;
>
>
> For initialization regular vector
> > syntax should work:
> >
> > mtype mask = (mtype){ -1, -1, 0, 0, ... };
> >
> > there's the question of the signedness of the mask elements.  GCC
> > internally uses signed
> > bools with values -1 for true and 0 for false.
>
> One of the things is the value that represents true. This is largely
> target-dependent when it comes to the vector_mask type. When vector_mask
> types are created from GCC's internal representation of bool vectors
> (signed ints) the point about implicit/explicit conversions from signed
> int vect to mask types in the proposal covers this. So mask in
>
>    v4sib mask = (v4sib){-1, -1, 0, 0, ... }
>
> will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx
> on SVE. On AVX2/SSE they'd still be represented as vector of signed ints
> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications this
> new mask type representations will have in the mid-end while being
> converted back and forth to and from GCC's internal representation, but
> I'm guessing this is already being handled at some level by the
> vector_mask type's current support?


Yes, I would guess so.  Of course what the middle-end is currently exposed
to is simply what the vectorizer generates - once fuzzers discover this feature
we'll see "interesting" uses that might run into missed or wrong handling of
them.

So whatever we do on the side of exposing this to users a good portion
of testsuite coverage for the allowed use cases is important.

Richard.

>
> Thanks,
> Tejas.
>
> >
> > Richard.
> >
> >>
> >>
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Tejas.
> >>
> >>
> >>
> >>
> >>
> >> Richard.
> >>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Tejas.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> 1. __attribute__((vector_size (n))) where n represents bytes
> >>>>
> >>>>    typedef bool vbool __attribute__ ((vector_size (1)));
> >>>>
> >>>> In this approach, the shape of the boolean vector is unclear. IoW, it is 
> >>>> not
> >>>> clear if each bit in 'n' controls a byte or an element. On targets
> >>>> like SVE, it would be natural to have each bit control a byte of the 
> >>>> target
> >>>> vector (therefore resulting in an 'unpacked' layout of the PBV) and on 
> >>>> AVX, each
> >>>> bit would control one element/lane on the target vector(therefore 
> >>>> resulting in a
> >>>> 'packed' layout with all significant bits at the LSB).
> >>>>
> >>>> 2. __attribute__((vector_size (n))) where n represents num of lanes
> >>>>
> >>>>    typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
> >>>>    typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof 
> >>>> (v4si){0}[0])));
> >>>>
> >>>> Here the 'n' in the vector_size attribute represents the number of bits 
> >>>> that
> >>>> is needed to represent a vector quantity.  In this case, this packed 
> >>>> boolean
> >>>> vector can represent upto 'n' vector lanes. The size of the type is
> >>>> rounded up the nearest byte.  For example, the sizeof v4bi in the above
> >>>> example is 1.
> >>>>
> >>>> In this approach, because of the nature of the representation, the n 
> >>>> bits required
> >>>> to represent the n lanes of the vector are packed at the LSB. This does 
> >>>> not naturally
> >>>> align with the SVE approach of each bit representing a byte of the 
> >>>> target vector
> >>>> and PBV therefore having an 'unpacked' layout.
> >>>>
> >>>> More importantly, another drawback here is that the change in units for 
> >>>> vector_size
> >>>> might be confusing to programmers.  The units will have to be 
> >>>> interpreted based on the
> >>>> base type of the typedef. It does not offer any flexibility in terms of 
> >>>> the layout of
> >>>> the bool vector - it is fixed.
> >>>>
> >>>> 3. Combination of 1 and 2.
> >>>>
> >>>> Combining the best of 1 and 2, we can introduce extra parameters to 
> >>>> vector_size that will
> >>>> unambiguously represent the layout of the PBV. Consider
> >>>>
> >>>>    typedef bool vbool __attribute__((vector_size (s, n[, w])));
> >>>>
> >>>> where 's' is size in bytes, 'n' is the number of lanes and an optional 
> >>>> 3rd parameter 'w'
> >>>> is the number of bits of the PBV that represents a lane of the target 
> >>>> vector. 'w' would
> >>>> allow a target to force a certain layout of the PBV.
> >>>>
> >>>> The 2-parameter form of vector_size allows the target to have an
> >>>> implementation-defined layout of the PBV. The target is free to choose 
> >>>> the 'w'
> >>>> if it is not specified to mirror the target layout of predicate 
> >>>> registers. For
> >>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
> >>>>
> >>>> As an example, to represent the result of a comparison on 2 int16x8_t, 
> >>>> we'd need
> >>>> 8 lanes of boolean which could be represented by
> >>>>
> >>>>    typedef bool v8b __attribute__ ((vector_size (2, 8)));
> >>>>
> >>>> SVE would implement v8b layout to make every 2nd bit significant i.e. w 
> >>>> == 2
> >>>>
> >>>> and AVX would choose a layout where all 8 consecutive bits packed at LSB 
> >>>> would
> >>>> be significant i.e. w == 1.
> >>>>
> >>>> This scheme would accomodate more than 1 target to effectively represent 
> >>>> vector
> >>>> bools that mirror the target properties.
> >>>>
> >>>> 4. A new attribite
> >>>>
> >>>> This is based on a suggestion from Richard S in [3]. The idea is to 
> >>>> introduce a new
> >>>> attribute to define the PBV and make it general enough to
> >>>>
> >>>> * represent all targets flexibly (SVE, AVX etc)
> >>>> * represent sub-byte length predicates
> >>>> * have no change in units of vector_size/no new vector_size signature
> >>>> * not have the number of bytes constrain representation
> >>>>
> >>>> If we call the new attribute 'bool_vec' (for lack of a better name), 
> >>>> consider
> >>>>
> >>>>    typedef bool vbool __attribute__((bool_vec (n[, w])))
> >>>>
> >>>> where 'n' represents number of lanes/elements and the optional 'w' is 
> >>>> bits-per-lane.
> >>>>
> >>>> If 'w' is not specified, it and bytes-per-predicate are 
> >>>> implementation-defined based on target.
> >>>> If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).
> >>>>
> >>>> 5. Behaviour of the packed vector boolean type.
> >>>>
> >>>> Taking the example of one of the options above, following is an 
> >>>> illustration of it's behavior
> >>>>
> >>>> * ABI
> >>>>
> >>>>    New ABI rules will need to be defined for this type - eg alignment, 
> >>>> PCS,
> >>>>    mangling etc
> >>>>
> >>>> * Initialization:
> >>>>
> >>>>    Packed Boolean Vectors(PBV) can be initialized like so:
> >>>>
> >>>>      typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
> >>>>      v4bi p = {false, true, false, false};
> >>>>
> >>>>    Each value in the initizlizer constant is of type bool. The lowest 
> >>>> numbered
> >>>>    element in the const array corresponds to the LSbit of p, element 1 is
> >>>>    assigned to bit 4 etc.
> >>>>
> >>>>    p is effectively a 2-byte bitmask with value 0x0010
> >>>>
> >>>>    With a different layout
> >>>>
> >>>>      typedef bool v4bi __attribute__ ((vector_size (2, 4, 1)));
> >>>>      v4bi p = {false, true, false, false};
> >>>>
> >>>>    p is effectively a 2-byte bitmask with value 0x0002
> >>>>
> >>>> * Operations:
> >>>>
> >>>>    Packed Boolean Vectors support the following operations:
> >>>>    . unary ~
> >>>>    . unary !
> >>>>    . binary&,|andˆ
> >>>>    . assignments &=, |= and ˆ=
> >>>>    . comparisons <, <=, ==, !=, >= and >
> >>>>    . Ternary operator ?:
> >>>>
> >>>>    Operations are defined as applied to the individual elements i.e the 
> >>>> bits
> >>>>    that are significant in the PBV. Whether the PBVs are treated as 
> >>>> bitmasks
> >>>>    or otherwise is implementation-defined.
> >>>>
> >>>>    Insignificant bits could affect results of comparisons or ternary 
> >>>> operators.
> >>>>    In such cases, it is implementation defined how the unused bits are 
> >>>> treated.
> >>>>
> >>>>    . Subscript operator []
> >>>>
> >>>>    For the subscript operator, the packed boolean vector acts like a 
> >>>> array of
> >>>>    elements - the first or the 0th indexed element being the LSbit of 
> >>>> the PBV.
> >>>>    Subscript operator yields a scalar boolean value.
> >>>>    For example:
> >>>>
> >>>>      typedef bool v8b __attribute__ ((vector_size (2, 8, 2)));
> >>>>
> >>>>      // Subscript operator result yields a boolean value.
> >>>>      // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x.
> >>>>      bool foo (v8b p, int n) { p[3] = true; return p[1]; }
> >>>>
> >>>>    Out of bounds access: OOB access can be determined at compile time 
> >>>> given the
> >>>>    strong typing of the PBVs.
> >>>>
> >>>>    PBV does not support address of operator(&) for elements of PBVs.
> >>>>
> >>>>    . Implicit conversion from integer vectors to PBVs
> >>>>
> >>>>    We would like to support the output of comparison operations to be 
> >>>> PBVs. This
> >>>>    requires us to define the implicit conversion from an integer vector 
> >>>> to PBV
> >>>>    as the result of vector comparisons are integer vectors.
> >>>>
> >>>>    To define this operation:
> >>>>
> >>>>      bool_vector = vector <cmpop> vector
> >>>>
> >>>>    There is no change in how vector <cmpop> vector behavior i.e. this 
> >>>> comparison
> >>>>    would still produce an int_vector type as it does now.
> >>>>
> >>>>      temp_int_vec = vector <cmpop> vector
> >>>>      bool_vec = temp_int_vec // Implicit conversion from int_vec to 
> >>>> bool_vec
> >>>>
> >>>>    The implicit conversion from int_vec to bool I'd define simply to be:
> >>>>
> >>>>      bool_vec[n] = (_Bool) int_vec[n]
> >>>>
> >>>>    where the C11 standard rules apply
> >>>>    6.3.1.2 Boolean type  When any scalar value is converted to _Bool, 
> >>>> the result
> >>>>    is 0 if the value compares equal to 0; otherwise, the result is 1.
> >>>>
> >>>>
> >>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
> >>>> [2] https://reviews.llvm.org/D88905
> >>>> [3] https://reviews.llvm.org/D81083
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Thanks,
> >>>> Tejas.
>

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

Reply via email to