https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336

--- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Follow-up on this one:  My workaround of emitting a vmv.v.i v[0-9],0 before any
(potentially) offending masked load is not going to work universally.
That's because on several instances we make use of the fact that a load has a
built-in EEW and will not load what the last vset specified (SEW) but rather
the ratio the built-in EEW specifies.

So for a case like

vsetivli zero,zero,e32,m8,tu,mu
vle8.v  v4,0(a0),v0.t

we would insert a vmv.v.i v4,0

vsetivli zero,zero,e32,m8,tu,mu
vmv.v.i v4,0
vle8.v  v4,0(a0),v0.t

but vmv.v.i observes the SEW/LMUL and thus doesn't touch the same set of
registers vle would.  vmv is indeed not even allowed to use register v4
directly due to LMUL.

I'm still experimenting with only adding a vmv to specific patterns but all
that can only ever be a stopgap solution and we definitely need middle-end
rework to properly fix it.  Actually we're lucky this wasn't triggered before. 

My other idea of adjusting the respective expanders is also not going to work
because we implicitly assume that a masked load "overwrites" its entire
destination and therefore optimizing away anything that writes the destination
before.  That's a bit of a semantics mismatch for RVV unfortunately.

I'm going to start with some middle-end work on this.  Hopefully it's not as
bad as I imagine it is.

Reply via email to