https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102421
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- OK, so the issue is that we have at alignment analysis time a group of three stmts: # VUSE <.MEM_107> _132 = .MASK_LOAD (_39, 64, _42); # VUSE <.MEM_135> _147 = .MASK_LOAD (_14, 64, _33); # VUSE <.MEM_150> _162 = .MASK_LOAD (_88, 64, _111); but that gets split up in vect_dissolve_slp_only_groups - I don't remember why we have such thing but this definitely wrecks the alignment logic. So we seem to have vect_dissolve_slp_only_groups because we form the masked load group for SLP analysis only, allowing different masks there while for non-SLP vectorization we only handle the case of equal masks. So to not feed "invalid" groups to non-SLP we dissolve the groups. But note that we'll generate quite awkward code, treating it as three separate single-element interleaving chains. Instead the proper way to code generate this would be to interleave the masks (as if we'd "store" them) and properly vectorize this with a 3-element interleaving chain. That's going to be tricky with the way we do interleaving vectorization though (stmt processing order). As a stop-gap solution we can of course re-analyze (or "split") alignment when we split the DR groups late but that does feel quite awkward. The issue is latent again now. I'm testing a patch to vect_dissolve_slp_only_groups to copy&adjust alignment info.