Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

Richard Biener Fri, 28 Jun 2024 05:16:46 -0700

On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
<[email protected]> wrote:
>
>
>
> > Am 28.06.2024 um 10:27 schrieb Richard Sandiford 
> > <[email protected]>:
> >
> > Richard Biener <[email protected]> writes:
> >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> >>> <[email protected]> wrote:
> >>>
> >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <[email protected]> wrote:
> >>>>
> >>>> for the testcase in the PR115406, here is part of the dump.
> >>>>
> >>>>  char D.4882;
> >>>>  vector(1) <signed-boolean:8> _1;
> >>>>  vector(1) signed char _2;
> >>>>  char _5;
> >>>>
> >>>>  <bb 2> :
> >>>>  _1 = { -1 };
> >>>>
> >>>> When assign { -1 } to vector(1} {signed-boolean:8},
> >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> >>>> with each vector elemnet. But i think the bit setting should only apply 
> >>>> for
> >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
> >>>> differently from vector<1> <signed-boolean:16>?
> >>>>
> >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>>> Ok for trunk?
> >>>
> >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> >>> code should work for 8 bit
> >>> entities as well, it seems we only set the LSB of each element in the
> >>> "mask".  ISTR that SVE
> >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> >>> maybe that's why
> >>> <= BITS_PER_UNIT.
> >
> > Yeah.
>
> So is it necessary that only one bit is set for SVE?
>
> >>> So maybe instead of just setting one bit in
> >>>
> >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> >>>
> >>> we should set elt_bits bits, aka (without testing)
> >>>
> >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> >>> % BITS_PER_UNIT);
> >>>
> >>> ?
> >>
> >> Alternatively
> >>
> >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> >>
> >> should be amended with
> >>
> >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> >
> > How about:
> >
> >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
> >    {
> >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> >
> > ?
>
> Note the path is also necessary for avx512 and gcn mask modes which are 
> integer modes.
>
> > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> > since it can change with the target attribute?  (At least TYPE_MODE_RAW
> > would be stable.)
>
> That’s a good question and also related to GCC vector extension which can 
> result in both BLKmode and integer modes to be used.  But I’m not sure how we 
> expose masks to the middle end here.  A too large vector bool could be 
> lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of 
> BLKmode vectors and make sure to never assign integer modes to vector bools 
> (if the target didn’t specify that mode)?
>
> I guess some test coverage would be nice here.


To continue on that, we do not currently have a way to capture a
vector comparison output
but the C++ frontend has vector ?:

typedef int v8si __attribute__((vector_size(32)));

void foo (v8si *a, v8si *b, v8si *c)
{
  *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
}

with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
then even with AVX512 enabled I see <signed-boolean:32> here and
vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
just a missed optimization).  But note that to decompose this into two
AVX512 vectors the temporary would have to change from <signed-boolean:32>
elements to <signed-boolean:1>.

The not supported vector bool types have BLKmode sofar.

But for example on i?86-linux with -mno-sse (like -march=i586) for

typedef short v2hi __attribute__((vector_size(4)));

void foo (v2hi *a, v2hi *b, v2hi *c)
{
  *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
}

we get a SImode vector <signed-boolean:16> as I feared.  That means
<signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
between SVE (bool for a 8byte data vector) and emulated vectors
("word-mode" vectors; for 1byte data vectors).

And without knowing that SVE would have used VnBImode given that
AVX512 uses an integer mode.

Aside from the too large vector and AVX512 issue above I think we can use
MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
can assert the mode is a scalar integer mode (AVX512 or GCN)?

Richard.


> >> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
> >> mask for a int128 element vector
> >> we'd have 16bit mask elements, encoding that differently would be
> >> inconsistent as well
> >> (but of course 16bit elements are not handled by the code right now).
> >
> > Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
> > had to add support for them.
> >
> > Richard

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

Reply via email to