https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64731

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-01-22
                 CC|                            |rguenth at gcc dot gnu.org
            Summary|poor code when using        |vector lowering should
                   |vector_size((32)) for sse2  |split loads and stores
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, the issue is "simple" - veclower doesn't split the loads/stores itself but
the
registers:

  <bb 3>:
  # ivtmp.11_24 = PHI <ivtmp.11_23(3), 0(2)>
  _8 = MEM[base: a_6(D), index: ivtmp.11_24, offset: 0B];
  _11 = MEM[base: b_9(D), index: ivtmp.11_24, offset: 0B];
  _17 = BIT_FIELD_REF <_8, 128, 0>;
  _4 = BIT_FIELD_REF <_11, 128, 0>;
  _5 = _4 + _17;
  _29 = BIT_FIELD_REF <_8, 128, 128>;
  _28 = BIT_FIELD_REF <_11, 128, 128>;
  _14 = _28 + _29;
  _12 = {_5, _14};
  MEM[base: a_6(D), index: ivtmp.11_24, offset: 0B] = _12;
  ivtmp.11_23 = ivtmp.11_24 + 32;
  if (ivtmp.11_23 != 8192)
    goto <bb 3>;
  else
    goto <bb 4>;

in this case it would also have a moderately hard time to split the loads/store
as it is faced with TARGET_MEM_REFs already.

Nothing combines this back into a sane form.  I've recently added code that
handles exactly the same situation but only for complex arithmetic
(in tree-ssa-forwprop.c for PR64568).

I wonder why with only -msse2 IVOPTs produces TARGET_MEM_REFs for the loads.
For sure x86_64 cannot load V4DF in one instruction...

Reply via email to