Producers and Consumers of PBV
------------------------------
With GNU vector extensions, comparisons produce boolean vectors;
conditional and bitwise operators consume them. Comparison producers
generate signed integer vectors of the same lane-width as the operands
of the comparison operator. This means conditionals and bitwise
operators cannot be applied to mixed vectors that are a result of
different width operands. Eg.
v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
{
return a > b || c > d; // error!
return a > b || e < f; // OK - no explicit conversion needed.
return a > b || __builtin_convertvector (c > d, v8si); // OK.
return a | b && c | d; // error!
return a | b && __builtin_convertvector (c | d, v8si); // OK.
}
__builtin_convertvector () needs to be applied to convert vectors to the
type one wants to do the comparison in. IoW, the integer vectors that
represent boolean vectors are 'strictly-typed'. If we extend these rules
to vector_mask, this will look like:
typedef v8sib v8si __attribute__((vector_mask));
typedef v8hib v8hi __attribute__((vector_mask));
typedef v8sfb v8sf __attribute__((vector_mask));
v8sib foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
{
v8sib psi = a > b;
v8hib phi = c > d;
v8sfb psf = e < f;
return psi || phi; // error!
return psi || psf; // OK - no explicit conversion needed.
return psi || __builtin_convertvector (phi, v8sib); // OK.
return psi | phi; // error!
return psi | __builtin_convertvector (phi, v8sib); // OK.
return psi | psf; // OK - no explicit conversion needed.
}
Now according to the rules explained above, v8sib and v8hib will have
different layouts (which is why they can't be used directly without
conversion if used as operands of operations). OTOH, the same rules
dictate that the layout of, say v8sib and v8sfb, where v8sfb is the
float base data vector equivalent of v8sib which when applied ensure
that v8sib and v8sfb have the same layout and hence can be used as
operands of operators without explicit conversion. This aligns with the
GNU vector extensions rules where comparison of 2 v8sf vectors results
in a v8si of the same lane-width and number of elements as that would
result in comparison of 2 v8si vectors.
Application of vector_mask to sizeless types
--------------------------------------------
__attribute__((vector_mask)) has the advantage that it can be applied to
sizeless types seamlessly. When __attribute__((vector_mask)) is applied
to a data vector that is a sizeless type, the resulting vector mask also
becomes a sizeless type.
Eg.
typedef svpred16_t svint16_t __attribute__((vector_mask));
This is equivalent of
typedef vNhib vNhi __attribute__((vector_mask));
where N could be 8, 16, 32 etc.
The resulting type is a scalable boolean vector type, i.e svint8_t. The
resulting boolean vector type has the same behavior as the scalar type
svint8_t. While svint8_t can represent a scalable bool vector, we need a
scalable scalar type to represent the bit-mask variant of the opaque
type that represents the bool vector. I haven't thought this through,
but I suspect it will be implemented as a 'typed' variant of svbool_t.
ABI
---
Given the new opaque type, it needs rules that define PCS, storage
layout in aggregates and alignment.
PCS
---
GNU vector extension type parameters are always passed on the stack.
Similarly vector_mask applied to GNU base data vector type parameters
will also be passed on the stack. The format to pass on the stack will
always be a canonical format - an opaque type where the internal
representation can be implementation-defined.
The canonical form of the argument could be a boolean vector. This
boolean vector will be passed on the stack just like other GNU vectors.
vector bool is convenient for a callee to synthesize into a predicate
(irrespective of the target i.e. NEON, SVE, AVX) using target instructions.
If the base data vector is an ACLE type, if the canonical bool vector we
choose is svint8_t or a typed svbool_t we could apply the same rules as
ABI for the said type.
Alignment
---------
For boolean vector in memory, their alignment will be the natural
alignment as defined by the AAPCS64 i.e. 8 and 16 bytes for Short
Vectors and 16 bytes for scalable vectors.
Aggregates
----------
For fixed size vectors, the type resulting from applying
__attribute__((vector_mask)) is a vector of booleans IoW a vNqi.
Therefore the same rules apply as would apply to a GNU vector with 8-bit
elements of the same size in an aggregate. For scalable GNU boolean
vectors in aggregates, it acts as a Pure scalable type svint8_t and the
ABI rules from Section 5.10 of AAPCS64 apply.
Operation Semantics
-------------------
What should be the data structure of the vector mask type? This seems to
be the main consideration. As suggested by Richard in
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html the idea
is to have an opaque type to have control over operations and
observability. This means that the internal representation can be a bit
mask, but based on the operator being applied to it, the mask can
'decay' to another operator-friendly data structure.
vector_mask has 2 forms that is chosen based on the context. It lives as
a mask and a vector bool. Here we describe its behaviour in various
contexts.
Arithmetic ops
--------------
These don't apply as the values are essentially binary.
Bitwise ops - &, ^, |, ~, >>, <<
---------------------------------
Here vector_mask acts as a scalar bitmask. Applying bitwise ops is like
another scalar operation.
If p1 and p2 are vector_mask types of type:
typedef v8sib v8si __attribute__((vector_mask));
Bitwise &, | and ^
------------------
p1 & p2
Here p1 and p2 act as integer type bitmasks where each bit represents a
vector lane of the data vector type. LSBit representing the lowest
numbered lane and MSBit representing the highest numbered lane.
p1 & <scalar immediate>
Here the immediate scalar is implicitly cast to a vector_mask type and
the binary op is applied accordingly.
Bitwise ~:
~p1
Treats p1 as a bitmask and inverts all the bits of the bitmask.
Bitwise >>, << :
p1 >> <scalar immediate>
p1 >> <scalar int32 variable>
Treats p1 as a bitmask. The shifter operand has to be a signed int32
immediate. If the immediate is negative, the direction of the shift is
inverted. Behaviour for any value outside the range of 0..nelems-1 is
undefined.
p1 >> p2 or p1 << p2
is not allowed.
Logical ops - ==, !=, >, <, >=, <=
----------------------------------
The following ops treat vector_mask as bitmask:
p1 == p2
p1 != p2
p1 == <scalar immediate>
p1 != <scalar immediate>
The result of these operations is a bool. Note that the scalar
immediates will be implicitly converted to the LHS type of p1. Eg. if p1
is v8sib,
p1 == 0x3
will mean that 0x3 will represent lower numbered 2 lanes of v8sib are
true and the rest are false.
>, <, >=, <= do not apply to the vector_mask.
Ternary operator ?:
-------------------
p1 <logicalop> p2 ? s1 : s2;
is allowed and p1 and p2 are treated as bitmasks.
Conditional operators ||, && !
------------------------------
Here vector_mask is used as a bitmask scalar. So
p1 != 0 || p2 == 0
treats p1 and p2 as scalar bitmasks. Similarly for && and !.
Assignment ops =, <<=, >>=
--------------------------
The assignment operator is straightforward - it does a copy of the RHS
into a p1. Eg.
p1 = p2
Copies the value of p2 into p1. If the types are different, there is no
implicit conversion from one to the other (except in cases mentioned
below). One will have to explicitly convert using
__builtin_convertvector (). So if p1 and p2 are different and if one
wants to copy p2 to p1, one has to write
p1 = __builtin_convertvector (p2, typeof (p1));
__builtin_convertvector is implementation-defined. It is essential to
note p1 and p2 must have the same number of lanes irrespective of the
lane-size. Also, explicit conversion is not required if the lane-sizes
are the same for p1 and p2 along with the same number of elements. So
for eg. if p1 is v8sib and p2 is v8sfb, there is no explicit conversion
required. Same for v8sib and v8uib.
<<= and >>= have similar operations.
Increment Ops ++, --
---------------------
NA
Address-of &
------------
Taking address of a vector_mask returns (vector bool *).
sizeof ()
--------
sizeof (vector_mask) = sizeof (vector bool)
alignof ()
----------
See Alignment section above
Typecast and implicit conversions
---------------------------------
typecast from one vector_mask type to another vector_mask type is only
possible using __builtin_convertvector () if, as explained above, the
lane-size are different. It is not possible to convert between vectors
of different nelems either way.
Implicit conversions between two same-nelem vector_masks are possible
only if the lane-sizes are same.
Literals and Initialization
---------------------------
There are two ways to initialize vector_mask objects - bitmask form and
constant array form. Eg. typedef v4si v4si __attribute__((vector_mask));
void foo ()
{
v4sib p1 = 0xf;
/* Do something. */
p1 = {1, 1, 1, 0};
...
}
The behaviour is undefined values other than 1 or 0 are used in the
constant array initializer.
C++:
---
static_cast<target_type> (<source_expression>)
LLVM allows static_cast<> where both vector sizes are same, but the
semantics are equal to reinterpret_cast<>. GNU does not allow
static_cast<> irrespective of source and target shapes.
To be consistent, leave it unsupported for vector_mask too.
dynamic_cast <target_type> (<source_expr>)
NA
reinterpret_cast<>
Semantics are same as Clang's static_cast<> i.e. reinterpret the types
if both source and target type vectors are same size.
const_cast<>
Applies constness to a vector mask type pointer.
#include <inttypes.h>
typedef int32_t v16si __attribute__((__vector_size__(64)));
typedef v16si v16sib __attribute__((vector_mask));
__attribute__((noinline))
const v16sib * foo (v16sib * a)
{
return const_cast<v16sib *> (a);
}
new & delete
For new, vector_mask types will return a pointer to vector_mask type and
allocate sizeof (vector bool) depending on the size of the vector bool
array in
bytes. For eg. typedef v16sib v16si __attribute__((vector_mask));
v16sib * foo()
{
return new v16sib;
}
foo returns sizeof (vector bool (16))) i.e. 16 bytes.
__attribute__((vector_mask)) 's conflation with GIMPLE
------------------------------------------------------
__attribute__((vector_mask)) is a feature that has been elevated from
GIMPLE to the FE. In GIMPLE, the semantics are loosely-typed and
target-dependent i.e. different-shared vector mask types are allowed to
work with binary ops depending on which target we're compiling for. Eg.
typedef v8sib v8si __attribute__((vector_mask));
typedef v8hib v8hi __attribute__((vector_mask));
__GIMPLE v8sib foo (v8si a, v8si b, v8hi c, v8hi d)
{
v8sib psi = a > b;
v8hib phi = c > d;
return psi | phi; // OK on amdgcn, but errors on aarch64!
}
This dichotomy is acceptable as long as GIMPLE semantics don't change
and because the FE semantics are proposed to be more restrictive, its
becomes a subset of the functionality of GIMPLE semantics. This is the
current starting point, but going forward if there are scenarios where
we have to diverge from GIMPLE semantics, we have to discuss that on a
case-by-case basis.