Hi,

Sorry to have dropped the ball on https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but here I've tried to pick it up again and write up a strawman proposal for elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
----------

The idea of packed boolean vectors came about when we wanted to support C/C++ operators on SVE ACLE types. The current vector boolean type that ACLE specifies does not adequately disambiguate vector lane sizes which they were derived off of. Consider this simple, albeit unrealistic, example:

  bool foo (svint32_t a, svint32_t b)
  {
    svbool_t p = a > b;

    // Here p[2] is not the same as a[2] > b[2].
    return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] does not return the bool value corresponding to a[i] > b[i]. This necessitates a 'typed' vector boolean value that unambiguously represents results of operations
of the same type.

__attribute__((vector_mask))
-----------------------------

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector produces a new boolean vector type that represents a boolean type that is produced as a result of operations on the corresponding base vector type. The following is the syntax.

  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined for its base data vector type.

• Two boolean vector types who's base data vector types have same number of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as a result of comparisons - "The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type." according to
   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.

Producers and Consumers of PBV
------------------------------

With GNU vector extensions, comparisons produce boolean vectors; conditional and bitwise operators consume them. Comparison producers generate signed integer vectors of the same lane-width as the operands of the comparison operator. This means conditionals and bitwise operators cannot be applied to mixed vectors that are a result of different width operands. Eg.

  v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
    return a > b || c > d; // error!
    return a > b || e < f; // OK - no explicit conversion needed.
    return a > b || __builtin_convertvector (c > d, v8si); // OK.
    return a | b && c | d; // error!
    return a | b && __builtin_convertvector (c | d, v8si); // OK.
  }

__builtin_convertvector () needs to be applied to convert vectors to the type one wants to do the comparison in. IoW, the integer vectors that represent boolean vectors are 'strictly-typed'. If we extend these rules to vector_mask, this will look like:

  typedef v8sib v8si __attribute__((vector_mask));
  typedef v8hib v8hi __attribute__((vector_mask));
  typedef v8sfb v8sf __attribute__((vector_mask));

  v8sib foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
    v8sib psi = a > b;
    v8hib phi = c > d;
    v8sfb psf = e < f;

    return psi || phi; // error!
    return psi || psf; // OK - no explicit conversion needed.
    return psi || __builtin_convertvector (phi, v8sib); // OK.
    return psi | phi; // error!
    return psi | __builtin_convertvector (phi, v8sib); // OK.
    return psi | psf; // OK - no explicit conversion needed.
  }

Now according to the rules explained above, v8sib and v8hib will have different layouts (which is why they can't be used directly without conversion if used as operands of operations). OTOH, the same rules dictate that the layout of, say v8sib and v8sfb, where v8sfb is the float base data vector equivalent of v8sib which when applied ensure that v8sib and v8sfb have the same layout and hence can be used as operands of operators without explicit conversion. This aligns with the GNU vector extensions rules where comparison of 2 v8sf vectors results in a v8si of the same lane-width and number of elements as that would result in comparison of 2 v8si vectors.

Application of vector_mask to sizeless types
--------------------------------------------

__attribute__((vector_mask)) has the advantage that it can be applied to sizeless types seamlessly. When __attribute__((vector_mask)) is applied to a data vector that is a sizeless type, the resulting vector mask also becomes a sizeless type.
Eg.

  typedef svpred16_t svint16_t __attribute__((vector_mask));

This is equivalent of

  typedef vNhib vNhi __attribute__((vector_mask));

where N could be 8, 16, 32 etc.

The resulting type is a scalable boolean vector type, i.e svint8_t. The
resulting boolean vector type has the same behavior as the scalar type svint8_t. While svint8_t can represent a scalable bool vector, we need a scalable scalar type to represent the bit-mask variant of the opaque type that represents the bool vector. I haven't thought this through, but I suspect it will be implemented as a 'typed' variant of svbool_t.

ABI
---

Given the new opaque type, it needs rules that define PCS, storage layout in aggregates and alignment.

PCS
---

GNU vector extension type parameters are always passed on the stack. Similarly vector_mask applied to GNU base data vector type parameters will also be passed on the stack. The format to pass on the stack will always be a canonical format - an opaque type where the internal representation can be implementation-defined.

The canonical form of the argument could be a boolean vector. This boolean vector will be passed on the stack just like other GNU vectors. vector bool is convenient for a callee to synthesize into a predicate (irrespective of the target i.e. NEON, SVE, AVX) using target instructions.

If the base data vector is an ACLE type, if the canonical bool vector we choose is svint8_t or a typed svbool_t we could apply the same rules as ABI for the said type.

Alignment
---------

For boolean vector in memory, their alignment will be the natural alignment as defined by the AAPCS64 i.e. 8 and 16 bytes for Short Vectors and 16 bytes for scalable vectors.

Aggregates
----------

For fixed size vectors, the type resulting from applying
__attribute__((vector_mask)) is a vector of booleans IoW a vNqi. Therefore the same rules apply as would apply to a GNU vector with 8-bit elements of the same size in an aggregate. For scalable GNU boolean vectors in aggregates, it acts as a Pure scalable type svint8_t and the ABI rules from Section 5.10 of AAPCS64 apply.

Operation Semantics
-------------------

What should be the data structure of the vector mask type? This seems to be the main consideration. As suggested by Richard in https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html the idea is to have an opaque type to have control over operations and observability. This means that the internal representation can be a bit mask, but based on the operator being applied to it, the mask can 'decay' to another operator-friendly data structure.

vector_mask has 2 forms that is chosen based on the context. It lives as a mask and a vector bool. Here we describe its behaviour in various contexts.

Arithmetic ops
--------------

These don't apply as the values are essentially binary.

Bitwise ops -  &, ^, |, ~, >>, <<
---------------------------------

Here vector_mask acts as a scalar bitmask. Applying bitwise ops is like another scalar operation.

If p1 and p2 are vector_mask types of type:

        typedef v8sib v8si __attribute__((vector_mask));

Bitwise &, | and ^
------------------

  p1 & p2

Here p1 and p2 act as integer type bitmasks where each bit represents a vector lane of the data vector type. LSBit representing the lowest numbered lane and MSBit representing the highest numbered lane.

  p1 & <scalar immediate>

Here the immediate scalar is implicitly cast to a vector_mask type and the binary op is applied accordingly.

Bitwise ~:

  ~p1

Treats p1 as a bitmask and inverts all the bits of the bitmask.

Bitwise >>, << :

  p1 >> <scalar immediate>
  p1 >> <scalar int32 variable>

Treats p1 as a bitmask. The shifter operand has to be a signed int32 immediate. If the immediate is negative, the direction of the shift is inverted. Behaviour for any value outside the range of 0..nelems-1 is undefined.

  p1 >> p2 or p1 << p2

is not allowed.

Logical ops - ==, !=, >, <, >=, <=
----------------------------------

The following ops treat vector_mask as bitmask:
  p1 == p2
  p1 != p2
  p1 == <scalar immediate>
  p1 != <scalar immediate>

The result of these operations is a bool. Note that the scalar immediates will be implicitly converted to the LHS type of p1. Eg. if p1 is v8sib,

  p1 == 0x3

will mean that 0x3 will represent lower numbered 2 lanes of v8sib are true and the rest are false.

>, <, >=, <= do not apply to the vector_mask.

Ternary operator ?:
-------------------

  p1 <logicalop> p2 ? s1 : s2;

is allowed and p1 and p2 are treated as bitmasks.

Conditional operators ||, && !
------------------------------

Here vector_mask is used as a bitmask scalar. So

  p1 != 0 || p2 == 0

treats p1 and p2 as scalar bitmasks. Similarly for && and !.

Assignment ops =, <<=, >>=
--------------------------

The assignment operator is straightforward - it does a copy of the RHS into a p1. Eg.

  p1 = p2

Copies the value of p2 into p1. If the types are different, there is no implicit conversion from one to the other (except in cases mentioned below). One will have to explicitly convert using __builtin_convertvector (). So if p1 and p2 are different and if one wants to copy p2 to p1, one has to write

  p1 = __builtin_convertvector (p2, typeof (p1));

__builtin_convertvector is implementation-defined. It is essential to note p1 and p2 must have the same number of lanes irrespective of the lane-size. Also, explicit conversion is not required if the lane-sizes are the same for p1 and p2 along with the same number of elements. So for eg. if p1 is v8sib and p2 is v8sfb, there is no explicit conversion required. Same for v8sib and v8uib.

<<= and >>= have similar operations.

Increment Ops ++, --
---------------------

NA

Address-of &
------------

Taking address of a vector_mask returns (vector bool *).

sizeof ()
--------

sizeof (vector_mask) = sizeof (vector bool)

alignof ()
----------

See Alignment section above

Typecast and implicit conversions
---------------------------------

typecast from one vector_mask type to another vector_mask type is only possible using __builtin_convertvector () if, as explained above, the lane-size are different. It is not possible to convert between vectors of different nelems either way.

Implicit conversions between two same-nelem vector_masks are possible only if the lane-sizes are same.

Literals and Initialization
---------------------------

There are two ways to initialize vector_mask objects - bitmask form and constant array form. Eg. typedef v4si v4si __attribute__((vector_mask));

  void foo ()
  {
    v4sib p1 = 0xf;

    /* Do something. */

    p1 = {1, 1, 1, 0};

    ...
  }

The behaviour is undefined values other than 1 or 0 are used in the constant array initializer.

C++:
---

static_cast<target_type> (<source_expression>)

LLVM allows static_cast<> where both vector sizes are same, but the semantics are equal to reinterpret_cast<>. GNU does not allow static_cast<> irrespective of source and target shapes.

To be consistent, leave it unsupported for vector_mask too.

dynamic_cast <target_type> (<source_expr>)
NA

reinterpret_cast<>
Semantics are same as Clang's static_cast<> i.e. reinterpret the types if both source and target type vectors are same size.

const_cast<>

Applies constness to a vector mask type pointer.

  #include <inttypes.h>

  typedef int32_t v16si __attribute__((__vector_size__(64)));
  typedef v16si v16sib __attribute__((vector_mask));

  __attribute__((noinline))
  const v16sib * foo (v16sib * a)
  {
    return const_cast<v16sib *> (a);
  }

new & delete

For new, vector_mask types will return a pointer to vector_mask type and
allocate sizeof (vector bool) depending on the size of the vector bool array in
bytes. For eg.  typedef v16sib v16si __attribute__((vector_mask));

  v16sib * foo()
  {
    return new v16sib;
  }

foo returns sizeof (vector bool (16))) i.e. 16 bytes.

__attribute__((vector_mask)) 's conflation with GIMPLE
------------------------------------------------------

__attribute__((vector_mask)) is a feature that has been elevated from GIMPLE to the FE. In GIMPLE, the semantics are loosely-typed and target-dependent i.e. different-shared vector mask types are allowed to work with binary ops depending on which target we're compiling for. Eg.

  typedef v8sib v8si __attribute__((vector_mask));
  typedef v8hib v8hi __attribute__((vector_mask));

  __GIMPLE v8sib foo (v8si a, v8si b, v8hi c, v8hi d)
  {
    v8sib psi = a > b;
    v8hib phi = c > d;

    return psi | phi; // OK on amdgcn, but errors on aarch64!
  }

This dichotomy is acceptable as long as GIMPLE semantics don't change and because the FE semantics are proposed to be more restrictive, its becomes a subset of the functionality of GIMPLE semantics. This is the current starting point, but going forward if there are scenarios where we have to diverge from GIMPLE semantics, we have to discuss that on a case-by-case basis.


Reply via email to