https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456
--- Comment #12 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
The problem is this declaration in rs6000.h, which forces unaligned vector
stores to be scalarized during expand:
/* Define this macro to be the value 1 if unaligned accesses have a cost
many times greater than aligned accesses, for example if they are
emulated in a trap handler. */
/* Altivec vector memory instructions simply ignore the low bits; SPE vector
memory instructions trap on unaligned accesses; VSX memory instructions are
aligned to 4 or 8 bytes. */
#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) \
(STRICT_ALIGNMENT \
|| (((MODE) == SFmode || (MODE) == DFmode || (MODE) == TFmode \
|| (MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode) \
&& (ALIGN) < 32) \
|| (VECTOR_MODE_P ((MODE)) && (((int)(ALIGN)) < VECTOR_ALIGN (MODE))))
The last condition needs to be relaxed for POWER8 hardware.