Re: [RFC] Proposal to support Packed Boolean Vector masks.

Tejas Belagod Thu, 11 Jul 2024 21:17:42 -0700

On 7/10/24 4:37 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:


Tejas Belagod <tejas.bela...@arm.com> writes:

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod <tejas.bela...@arm.com> wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod <tejas.bela...@arm.com> wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod <tejas.bela...@arm.com> wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
----------

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

       bool foo (svint32_t a, svint32_t b)
       {
         svbool_t p = a > b;

         // Here p[2] is not the same as a[2] > b[2].
         return p[2];
       }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-----------------------------

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

       typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
       typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
        https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
      auto r = a < b;
}

produces, with C23:

      vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.


Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
this would require more fine-grained defn of lane-size to mask-bits mapping.


I think the target should be able to register builtin types already which
intrinsics could use.  There is already the vector_mask attribute but only
for GIMPLE and it has the same limitation of querying the target for the
actual mode being used - for AVX vs AVX512 one might be able to
combine this with a mode attribute.  Not sure if on arm you can parse
__attribute__((mode("Vx4BI4"))) or how the modes are called.

But when you are talking about intrinsics I'd really suggest to leave the
type creation to the target rather than trying to do a typedef in a header?


Yeah, thinking about this a bit more, makes sense to keep intrinsic type
creation in the target realm.

Just to clarify if I understand your point about exposing masks' machine
representations, would representing vector_mask types using opaque
types/modes have the same challenges with compatibility with generic
vector constants as it essentially would be a parallel type system, and
would be unaffected by constant-folding etc due to their opacity? I ask
because opacity might give the representation the flexibility of
'decaying' to a type based on the context it is used in.


I also thought about using an opaque type but I wonder if it really suits
here?


Sorry, yes using opaque type was your idea from last year's thread - I
merely reiterated it here. :-)

Or would the target then need to decay a mask[i] into something

that's later recognizable?


I think that would depend on the usage, wouldn't it - it could lower
down to target insn(s) based on how whether, for eg, its used as a test
or read as a scalar value?

So I guess the answer is you'd have to try.


Thanks for your feedback so far - much appreciated. If it helps, I will
try to write up a prototype to test the idea - might help clear the mist
further.


Just to note that one of the original motivations (that applies more
to option 3 from last year's proposal) was to add support for general
packed vector boolean types to the GNU vector extension, as a feature
independent of the target's "native" format(s).  Clang already supports
this via ext_vector_type and it seemed like there might be value in
providing something similar for the GNU extensions.


But that's more for data, aka vector bool, not for what's produced by
targets from vector comparisons?  So yes, I suppose that's reasonable
but representation would then be fully defined by the extension
rather than by however the target computes the actual comparison
result vector.


Sorry for the slow response.

Thanks RichardS for your timely comment. Sorry, I might have gottenambitious with the original vector bool proposal and went down the routeof supporting 'native' formats with vector_mask, but scaling myambitions back to a boolean vector of a certain representation that isindependent of the target's native format and defined by the extensionitself is a more realistic proposition.


To reiterate option 3 from last year's proposal, currently we don't support

 typedef bool vbool __attribute__((__vector_size__(64)));

But if we did, could we support a more layout-friendly form i.e.

  typedef bool vbool __attribute__((vector_size (s, n[, w])));

where 's' is size in bytes, 'n' is the number of lanes and an optional3rd parameter 'w' is the number of bits of the PBV that represents alane of the target vector? 'w' would allow a target to force a certainlayout of the PBV.


I don't know if overloading vector_size is a good idea though...

Thanks,
Tejas.

Richard.

Thanks,
Richard

Re: [RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to