https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63594
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Created attachment 33763 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33763&action=edit gcc5-pr63594-wip2.patch Updated WIP patch, which attempts to generate better code using inter-unit moves, but have also memory as an alternative, so it allows RA to choose what is best. This still generates non-perfect code for V2DI/V4DI loads from GPRs without -mavx512f (but e.g. vec_concatv2di uses Yi constraint). And, for AVX512-{F,BW,VL}, I'm surprised that the broadcasts from gprs are done as different instructions from broadcasts from memory or vector reg, I would have thought that must have been done using a single insn with alternatives.