[RFC] Proposal to support Packed Boolean Vector masks.

Tejas Belagod Mon, 08 Jul 2024 02:27:28 -0700

Hi,

Sorry to have dropped the ball onhttps://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, buthere I've tried to pick it up again and write up a strawman proposal forelevating __attribute__((vector_mask)) to the FE from GIMPLE.



Thanks,
Tejas.

Motivation
----------

The idea of packed boolean vectors came about when we wanted to supportC/C++ operators on SVE ACLE types. The current vector boolean type thatACLE specifies does not adequately disambiguate vector lane sizes whichthey were derived off of. Consider this simple, albeit unrealistic, example:


  bool foo (svint32_t a, svint32_t b)
  {
    svbool_t p = a > b;

    // Here p[2] is not the same as a[2] > b[2].
    return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]does not return the bool value corresponding to a[i] > b[i]. Thisnecessitates a 'typed' vector boolean value that unambiguouslyrepresents results of operations

of the same type.

__attribute__((vector_mask))
-----------------------------

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vectorproduces a new boolean vector type that represents a boolean type thatis produced as a result of operations on the corresponding base vectortype. The following is the syntax.


  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-definedfor its base data vector type.

• Two boolean vector types who's base data vector types have same numberof elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types havedifferent number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors asa result of comparisons - "The result of the comparison is a vector ofthe same width and number of elements as the comparison operands with asigned integral element type." according to

   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.

Producers and Consumers of PBV
------------------------------

With GNU vector extensions, comparisons produce boolean vectors;conditional and bitwise operators consume them. Comparison producersgenerate signed integer vectors of the same lane-width as the operandsof the comparison operator. This means conditionals and bitwiseoperators cannot be applied to mixed vectors that are a result ofdifferent width operands. Eg.


  v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
    return a > b || c > d; // error!
    return a > b || e < f; // OK - no explicit conversion needed.
    return a > b || __builtin_convertvector (c > d, v8si); // OK.
    return a | b && c | d; // error!
    return a | b && __builtin_convertvector (c | d, v8si); // OK.
  }

__builtin_convertvector () needs to be applied to convert vectors to thetype one wants to do the comparison in. IoW, the integer vectors thatrepresent boolean vectors are 'strictly-typed'. If we extend these rulesto vector_mask, this will look like:


  typedef v8sib v8si __attribute__((vector_mask));
  typedef v8hib v8hi __attribute__((vector_mask));
  typedef v8sfb v8sf __attribute__((vector_mask));

  v8sib foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
    v8sib psi = a > b;
    v8hib phi = c > d;
    v8sfb psf = e < f;

    return psi || phi; // error!
    return psi || psf; // OK - no explicit conversion needed.
    return psi || __builtin_convertvector (phi, v8sib); // OK.
    return psi | phi; // error!
    return psi | __builtin_convertvector (phi, v8sib); // OK.
    return psi | psf; // OK - no explicit conversion needed.
  }

Now according to the rules explained above, v8sib and v8hib will havedifferent layouts (which is why they can't be used directly withoutconversion if used as operands of operations). OTOH, the same rulesdictate that the layout of, say v8sib and v8sfb, where v8sfb is thefloat base data vector equivalent of v8sib which when applied ensurethat v8sib and v8sfb have the same layout and hence can be used asoperands of operators without explicit conversion. This aligns with theGNU vector extensions rules where comparison of 2 v8sf vectors resultsin a v8si of the same lane-width and number of elements as that wouldresult in comparison of 2 v8si vectors.


Application of vector_mask to sizeless types
--------------------------------------------

__attribute__((vector_mask)) has the advantage that it can be applied tosizeless types seamlessly. When __attribute__((vector_mask)) is appliedto a data vector that is a sizeless type, the resulting vector mask alsobecomes a sizeless type.

Eg.

  typedef svpred16_t svint16_t __attribute__((vector_mask));

This is equivalent of

  typedef vNhib vNhi __attribute__((vector_mask));

where N could be 8, 16, 32 etc.

The resulting type is a scalable boolean vector type, i.e svint8_t. The

resulting boolean vector type has the same behavior as the scalar typesvint8_t. While svint8_t can represent a scalable bool vector, we need ascalable scalar type to represent the bit-mask variant of the opaquetype that represents the bool vector. I haven't thought this through,but I suspect it will be implemented as a 'typed' variant of svbool_t.


ABI
---

Given the new opaque type, it needs rules that define PCS, storagelayout in aggregates and alignment.


PCS
---

GNU vector extension type parameters are always passed on the stack.Similarly vector_mask applied to GNU base data vector type parameterswill also be passed on the stack. The format to pass on the stack willalways be a canonical format - an opaque type where the internalrepresentation can be implementation-defined.

The canonical form of the argument could be a boolean vector. Thisboolean vector will be passed on the stack just like other GNU vectors.vector bool is convenient for a callee to synthesize into a predicate(irrespective of the target i.e. NEON, SVE, AVX) using target instructions.

If the base data vector is an ACLE type, if the canonical bool vector wechoose is svint8_t or a typed svbool_t we could apply the same rules asABI for the said type.


Alignment
---------

For boolean vector in memory, their alignment will be the naturalalignment as defined by the AAPCS64 i.e. 8 and 16 bytes for ShortVectors and 16 bytes for scalable vectors.


Aggregates
----------

For fixed size vectors, the type resulting from applying

__attribute__((vector_mask)) is a vector of booleans IoW a vNqi.Therefore the same rules apply as would apply to a GNU vector with 8-bitelements of the same size in an aggregate. For scalable GNU booleanvectors in aggregates, it acts as a Pure scalable type svint8_t and theABI rules from Section 5.10 of AAPCS64 apply.


Operation Semantics
-------------------

What should be the data structure of the vector mask type? This seems tobe the main consideration. As suggested by Richard inhttps://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html the ideais to have an opaque type to have control over operations andobservability. This means that the internal representation can be a bitmask, but based on the operator being applied to it, the mask can'decay' to another operator-friendly data structure.

vector_mask has 2 forms that is chosen based on the context. It lives asa mask and a vector bool. Here we describe its behaviour in variouscontexts.


Arithmetic ops
--------------

These don't apply as the values are essentially binary.

Bitwise ops -  &, ^, |, ~, >>, <<
---------------------------------

Here vector_mask acts as a scalar bitmask. Applying bitwise ops is likeanother scalar operation.


If p1 and p2 are vector_mask types of type:

        typedef v8sib v8si __attribute__((vector_mask));

Bitwise &, | and ^
------------------

  p1 & p2

Here p1 and p2 act as integer type bitmasks where each bit represents avector lane of the data vector type. LSBit representing the lowestnumbered lane and MSBit representing the highest numbered lane.


  p1 & <scalar immediate>

Here the immediate scalar is implicitly cast to a vector_mask type andthe binary op is applied accordingly.


Bitwise ~:

  ~p1

Treats p1 as a bitmask and inverts all the bits of the bitmask.

Bitwise >>, << :

  p1 >> <scalar immediate>
  p1 >> <scalar int32 variable>

Treats p1 as a bitmask. The shifter operand has to be a signed int32immediate. If the immediate is negative, the direction of the shift isinverted. Behaviour for any value outside the range of 0..nelems-1 isundefined.


  p1 >> p2 or p1 << p2

is not allowed.

Logical ops - ==, !=, >, <, >=, <=
----------------------------------

The following ops treat vector_mask as bitmask:
  p1 == p2
  p1 != p2
  p1 == <scalar immediate>
  p1 != <scalar immediate>

The result of these operations is a bool. Note that the scalarimmediates will be implicitly converted to the LHS type of p1. Eg. if p1is v8sib,


  p1 == 0x3

will mean that 0x3 will represent lower numbered 2 lanes of v8sib aretrue and the rest are false.


>, <, >=, <= do not apply to the vector_mask.

Ternary operator ?:
-------------------

  p1 <logicalop> p2 ? s1 : s2;

is allowed and p1 and p2 are treated as bitmasks.

Conditional operators ||, && !
------------------------------

Here vector_mask is used as a bitmask scalar. So

  p1 != 0 || p2 == 0

treats p1 and p2 as scalar bitmasks. Similarly for && and !.

Assignment ops =, <<=, >>=
--------------------------

The assignment operator is straightforward - it does a copy of the RHSinto a p1. Eg.


  p1 = p2

Copies the value of p2 into p1. If the types are different, there is noimplicit conversion from one to the other (except in cases mentionedbelow). One will have to explicitly convert using__builtin_convertvector (). So if p1 and p2 are different and if onewants to copy p2 to p1, one has to write


  p1 = __builtin_convertvector (p2, typeof (p1));

__builtin_convertvector is implementation-defined. It is essential tonote p1 and p2 must have the same number of lanes irrespective of thelane-size. Also, explicit conversion is not required if the lane-sizesare the same for p1 and p2 along with the same number of elements. Sofor eg. if p1 is v8sib and p2 is v8sfb, there is no explicit conversionrequired. Same for v8sib and v8uib.


<<= and >>= have similar operations.

Increment Ops ++, --
---------------------

NA

Address-of &
------------

Taking address of a vector_mask returns (vector bool *).

sizeof ()
--------

sizeof (vector_mask) = sizeof (vector bool)

alignof ()
----------

See Alignment section above

Typecast and implicit conversions
---------------------------------

typecast from one vector_mask type to another vector_mask type is onlypossible using __builtin_convertvector () if, as explained above, thelane-size are different. It is not possible to convert between vectorsof different nelems either way.

Implicit conversions between two same-nelem vector_masks are possibleonly if the lane-sizes are same.


Literals and Initialization
---------------------------

There are two ways to initialize vector_mask objects - bitmask form andconstant array form. Eg. typedef v4si v4si __attribute__((vector_mask));


  void foo ()
  {
    v4sib p1 = 0xf;

    /* Do something. */

    p1 = {1, 1, 1, 0};

    ...
  }

The behaviour is undefined values other than 1 or 0 are used in theconstant array initializer.


C++:
---

static_cast<target_type> (<source_expression>)

LLVM allows static_cast<> where both vector sizes are same, but thesemantics are equal to reinterpret_cast<>. GNU does not allowstatic_cast<> irrespective of source and target shapes.


To be consistent, leave it unsupported for vector_mask too.

dynamic_cast <target_type> (<source_expr>)
NA

reinterpret_cast<>

Semantics are same as Clang's static_cast<> i.e. reinterpret the typesif both source and target type vectors are same size.


const_cast<>

Applies constness to a vector mask type pointer.

  #include <inttypes.h>

  typedef int32_t v16si __attribute__((__vector_size__(64)));
  typedef v16si v16sib __attribute__((vector_mask));

  __attribute__((noinline))
  const v16sib * foo (v16sib * a)
  {
    return const_cast<v16sib *> (a);
  }

new & delete

For new, vector_mask types will return a pointer to vector_mask type and

allocate sizeof (vector bool) depending on the size of the vector boolarray in

bytes. For eg.  typedef v16sib v16si __attribute__((vector_mask));

  v16sib * foo()
  {
    return new v16sib;
  }

foo returns sizeof (vector bool (16))) i.e. 16 bytes.

__attribute__((vector_mask)) 's conflation with GIMPLE
------------------------------------------------------

__attribute__((vector_mask)) is a feature that has been elevated fromGIMPLE to the FE. In GIMPLE, the semantics are loosely-typed andtarget-dependent i.e. different-shared vector mask types are allowed towork with binary ops depending on which target we're compiling for. Eg.


  typedef v8sib v8si __attribute__((vector_mask));
  typedef v8hib v8hi __attribute__((vector_mask));

  __GIMPLE v8sib foo (v8si a, v8si b, v8hi c, v8hi d)
  {
    v8sib psi = a > b;
    v8hib phi = c > d;

    return psi | phi; // OK on amdgcn, but errors on aarch64!
  }

This dichotomy is acceptable as long as GIMPLE semantics don't changeand because the FE semantics are proposed to be more restrictive, itsbecomes a subset of the functionality of GIMPLE semantics. This is thecurrent starting point, but going forward if there are scenarios wherewe have to diverge from GIMPLE semantics, we have to discuss that on acase-by-case basis.

[RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to