Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

Richard Sandiford Fri, 05 Jul 2024 02:09:22 -0700

Richard Sandiford <[email protected]> writes:
> Richard Biener <[email protected]> writes:
>> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
>> <[email protected]> wrote:
>>>
>>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>>> <[email protected]> wrote:
>>> >
>>> >
>>> >
>>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford 
>>> > > <[email protected]>:
>>> > >
>>> > > Richard Biener <[email protected]> writes:
>>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>>> > >>> <[email protected]> wrote:
>>> > >>>
>>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <[email protected]> 
>>> > >>> wrote:
>>> > >>>>
>>> > >>>> for the testcase in the PR115406, here is part of the dump.
>>> > >>>>
>>> > >>>>  char D.4882;
>>> > >>>>  vector(1) <signed-boolean:8> _1;
>>> > >>>>  vector(1) signed char _2;
>>> > >>>>  char _5;
>>> > >>>>
>>> > >>>>  <bb 2> :
>>> > >>>>  _1 = { -1 };
>>> > >>>>
>>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
>>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of 
>>> > >>>> dest
>>> > >>>> with each vector elemnet. But i think the bit setting should only 
>>> > >>>> apply for
>>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>>> > >>>> differently from vector<1> <signed-boolean:16>?
>>> > >>>>
>>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>> > >>>> Ok for trunk?
>>> > >>>
>>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>>> > >>> code should work for 8 bit
>>> > >>> entities as well, it seems we only set the LSB of each element in the
>>> > >>> "mask".  ISTR that SVE
>>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
>>> > >>> maybe that's why
>>> > >>> <= BITS_PER_UNIT.
>>> > >
>>> > > Yeah.
>>> >
>>> > So is it necessary that only one bit is set for SVE?
>
> TBH I can't remember now.  It matches what SVE instructions produce, and
> lines up with the associated RTL code (which at the time was SVE-specific).
> But when dealing with multibyte elements, upper predicate element bits
> are ignored on read, so matching the instructions might not matter.
>
>>> > >>> So maybe instead of just setting one bit in
>>> > >>>
>>> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>> > >>>
>>> > >>> we should set elt_bits bits, aka (without testing)
>>> > >>>
>>> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>>> > >>> % BITS_PER_UNIT);
>>> > >>>
>>> > >>> ?
>>> > >>
>>> > >> Alternatively
>>> > >>
>>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>>> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>>> > >>
>>> > >> should be amended with
>>> > >>
>>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
>>> > >
>>> > > How about:
>>> > >
>>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>>> > >    {
>>> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
>>> > >
>>> > > ?
>>> >
>>> > Note the path is also necessary for avx512 and gcn mask modes which are 
>>> > integer modes.
>>> >
>>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
>>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
>>> > > would be stable.)
>>> >
>>> > That’s a good question and also related to GCC vector extension which can 
>>> > result in both BLKmode and integer modes to be used.  But I’m not sure 
>>> > how we expose masks to the middle end here.  A too large vector bool 
>>> > could be lowered to AVX512 mode.  Maybe we should simply reject 
>>> > interpret/encode of BLKmode vectors and make sure to never assign integer 
>>> > modes to vector bools (if the target didn’t specify that mode)?
>>> >
>>> > I guess some test coverage would be nice here.
>>>
>>> To continue on that, we do not currently have a way to capture a
>>> vector comparison output
>>> but the C++ frontend has vector ?:
>>>
>>> typedef int v8si __attribute__((vector_size(32)));
>>>
>>> void foo (v8si *a, v8si *b, v8si *c)
>>> {
>>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
>>> }
>>>
>>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
>>> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
>>> then even with AVX512 enabled I see <signed-boolean:32> here and
>>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
>>> just a missed optimization).  But note that to decompose this into two
>>> AVX512 vectors the temporary would have to change from <signed-boolean:32>
>>> elements to <signed-boolean:1>.
>>>
>>> The not supported vector bool types have BLKmode sofar.
>>>
>>> But for example on i?86-linux with -mno-sse (like -march=i586) for
>>>
>>> typedef short v2hi __attribute__((vector_size(4)));
>>>
>>> void foo (v2hi *a, v2hi *b, v2hi *c)
>>> {
>>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
>>> }
>>>
>>> we get a SImode vector <signed-boolean:16> as I feared.  That means
>>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
>>> between SVE (bool for a 8byte data vector) and emulated vectors
>>> ("word-mode" vectors; for 1byte data vectors).
>>>
>>> And without knowing that SVE would have used VnBImode given that
>>> AVX512 uses an integer mode.
>>>
>>> Aside from the too large vector and AVX512 issue above I think we can use
>>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
>>> can assert the mode is a scalar integer mode (AVX512 or GCN)?
>>
>> So like the attached?
>
> Can you give me a few days to see whether swapping to -1 : 0 masks
> works for SVE?  I guess it would make things easier in the long run.


It took longer than expected, but FWIW, the SVE results look good
with the attached.

I suppose it does raise the question whether:

          if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1))

is the right way to test a boolean vector element at the tree level.
For SVE it is, but I suppose for other targets 0b10 should be treated
as true?

The same question doesn't really arise on the RTL side because
the SVE predicate elements are single bits.  So perhaps one way
would be to base the test above on GET_MODE_UNIT_PRECISION for
boolean vector modes, and test for nonzero otherwise.  Doesn't feel
very clean though...

Alternatively, we could say that the semantics of boolean vectors
at the tree level are independent of the target, and try to fix
things up at the gimple->rtl boundary.

Thanks,
Richard


gcc/
        * fold-const.cc (native_encode_vector_part): Set all bits of
        a boolean element.
        * simplify-rtx.cc (native_encode_rtx): Likewise.
---
 gcc/fold-const.cc   | 2 +-
 gcc/simplify-rtx.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 710d697c021..8b7aaca2500 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8106,7 +8106,7 @@ native_encode_vector_part (const_tree expr, unsigned char 
*ptr, int len,
          if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1))
            {
              unsigned int bit = i * elt_bits;
-             ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
+             ptr[bit / BITS_PER_UNIT] |= ((1 << elt_bits) - 1) << (bit % 
BITS_PER_UNIT);
            }
        }
       return extract_bytes;
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 35ba54c6292..bfc33bd2f19 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7232,7 +7232,7 @@ native_encode_rtx (machine_mode mode, rtx x, 
vec<target_unit> &bytes,
              target_unit value = 0;
              for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
                {
-                 value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
+                 value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) ? mask : 0) << j;
                  elt += 1;
                }
              bytes.quick_push (value);
-- 
2.25.1

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

Reply via email to