https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70130

--- Comment #1 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
It's not clear to me from the report whether you have run this only on
big-endian systems, or whether little-endian has been tried for Power8 (with
-mcpu=power8).  Can you please clarify?

I ask because the -mcpu=power7 causes the versioned loop to use
__builtin_altivec_mask_for_load to do the lvx/lvx/lvsl/vperm trick, whereas
with -mcpu=power8 we would just have done unaligned loads.  If there is a
difference in endian behavior with -mcpu=power8 for BE and LE, that might be a
clue to a back end problem.

Reply via email to