http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

--- Comment #1 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
2011-12-13 09:07:38 UTC ---
At least part of the problem here is the uninitialised
variable in the vld4 call.  GCC tries to create a zero
initialisation of "x" before the vld4, so that the other
lanes have defined values.  Obviously we could be doing
that much better than we are, and perhaps we should have
some kind of special case so that uninitialised NEON vectors
are never zero-initialised (e.g. use a plain clobber instead).
But uninitialised variables aren't really ideal either way.
Something like:

  x = vld4_dup_u8(src);

  y.val[0][0] = x.val[1][0];
  y.val[1][0] = x.val[2][0];

  vst2_lane_u8(dst, y, 0);

would be better in principle.  Unfortunately, we don't
generate good code for that either.  Part of the problem
is introduced by lower-subreg, but it's not good even
with -fno-split-wide-types.

Reply via email to