https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336
--- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> --- Follow-up on this one: My workaround of emitting a vmv.v.i v[0-9],0 before any (potentially) offending masked load is not going to work universally. That's because on several instances we make use of the fact that a load has a built-in EEW and will not load what the last vset specified (SEW) but rather the ratio the built-in EEW specifies. So for a case like vsetivli zero,zero,e32,m8,tu,mu vle8.v v4,0(a0),v0.t we would insert a vmv.v.i v4,0 vsetivli zero,zero,e32,m8,tu,mu vmv.v.i v4,0 vle8.v v4,0(a0),v0.t but vmv.v.i observes the SEW/LMUL and thus doesn't touch the same set of registers vle would. vmv is indeed not even allowed to use register v4 directly due to LMUL. I'm still experimenting with only adding a vmv to specific patterns but all that can only ever be a stopgap solution and we definitely need middle-end rework to properly fix it. Actually we're lucky this wasn't triggered before. My other idea of adjusting the respective expanders is also not going to work because we implicitly assume that a masked load "overwrites" its entire destination and therefore optimizing away anything that writes the destination before. That's a bit of a semantics mismatch for RVV unfortunately. I'm going to start with some middle-end work on this. Hopefully it's not as bad as I imagine it is.