Richard Biener <rguent...@suse.de> writes:
> On Mon, 29 Jul 2024, Richard Sandiford wrote:
>
>> Richard Biener <rguent...@suse.de> writes:
>> > On Mon, 29 Jul 2024, Jakub Jelinek wrote:
>> >> And, for the GET_MODE_INNER, I also meant it for Aarch64/RISC-V VL 
>> >> vectors,
>> >> I think those should be considered as true by the hook, not false
>> >> because maybe_ne.
>> >
>> > I don't think relevant modes will have size/precision mismatches
>> > and maybe_ne should work here.  Richard?
>> 
>> Yeah, I think that's true for AArch64 at least (not sure about RVV).
>> 
>> One wrinkle is that VNx16BI (every bit of a predicate) is technically
>> suitable for memcpy, even though it would be a bad choice performance-wise.
>> But VNx8BI (every even bit of a predicate) wouldn't, since the odd bits
>> are undefined on read.
>> 
>> Arguably, this means that VNx8BI has the wrong precision, but like you
>> say, we don't (AFAIK) support bitsize != precision for vector modes.
>> Instead, the information that there is only one meaningful bit per
>> boolean is represented by having an inner mode of BI.  Both VNx16BI
>> and VNx8BI have an inner mode of BI, which means that VNx8BI's
>> precision is not equal the its nunits * its unit precision.
>> 
>> So I suppose:
>> 
>>   maybe_ne (GET_MODE_BITSIZE (mode),
>>             GET_MODE_UNIT_PRECISION (mode) * GET_MODE_NUNITS (mode))
>> 
>> would capture this.
>
> OK, I'll adjust like this.
>
>> Targets that want a vector bool mode with 2 meaningful bits per boolean
>> are expected to define a 2-bit scalar boolean mode and use that as the
>> inner mode.  So I think the condition above would (correctly) continue
>> to allow those.
>
> Hmm, but I think SVE mask registers could be used to transfer bits?
> I tried the following
>
> typedef svint64_t v4dfm __attribute__((vector_mask));
>
> void __GIMPLE(ssa) foo(void *p)
> {
>   v4dfm _2;
>
> __BB(2):
>   _2 = __MEM <v4dfm> ((v4dfm *)p);
>   __MEM <v4dfm> ((v4dfm *)p + 128) = _2;
>   return;
> }
>
> and it produces
>
>         ldr     p15, [x0]
>         add     x0, x0, 128
>         str     p15, [x0]
>
> exactly the same code as if using svint8_t which gets
> signed-boolean:1 vs signed-boolean:8, so that mask producing
> instructions get you undefined bits doesn't mean that
> reg<->mem moves do the same since the predicate registers
> don't know what modes they operate in?

Yes, in practice, VNx2BI is likely to produce the same load/store code
as VNx16BI.  But when comparing VNx2BI for equality, say, only every
eighth bit matters.  So if the optimisers were ultimately able to
determine which store feeds the VNx2BI load, there's a theoretical
possibility that they could do something that changes the other bits
of the value.

That's not very likely to happen.  But it'd be a valid thing to do.

> It might of course be prohibitive to copy memory like this
> and there might not be GPR <-> predicate reg moves.
>
> But technically ... for SVE predicates there aren't even any
> types less than 8 bits in size (as there are for GCN and AVX512).

I guess its architected bits vs payload.  The underlying registers
have 2N bytes for a 128N-bit VL, and so 2N bytes will be loaded by
LDR and stored by STR.  But when GCC uses the registers as VNx2BI,
only 2N bits are payload, and so the optimisers only guarantee to
preserve those 2N bits.

Thanks,
Richard

Reply via email to