https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Blocks| |53947
Depends on| |65832
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The latter is because of 'convert' leaving us with
_1 = BIT_FIELD_REF <x_32(D), 32, 0>;
_2 = (double) _1;
_3 = BIT_FIELD_REF <x_32(D), 32, 32>;
_4 = (double) _3;
_5 = BIT_FIELD_REF <x_32(D), 32, 64>;
_6 = (double) _5;
_7 = BIT_FIELD_REF <x_32(D), 32, 96>;
_8 = (double) _7;
_9 = {_2, _4, _6, _8};
rather than
vect__1.83_46 = x;
vect__2.84_47 = [vec_unpack_lo_expr] vect__1.83_46;
vect__2.84_48 = [vec_unpack_hi_expr] vect__1.83_46;
MEM[(vector(4) double *)&dx] = vect__2.84_47;
MEM[(vector(4) double *)&dx + 16B] = vect__2.84_48;
(which is in itself not optimal because not being in SSA form).
This means generic vector support lacks widening/shortening and thus you
have to jump through hoops with things like 'convert'. And SLP vectorization
doesn't "vectorize" with vector CONSTRUCTORs as root (a possible enhancement I
think).
For the original testcase it's a duplicate of PR65832 as we get
<bb 2>:
_1 = *x_5(D);
_7 = BIT_FIELD_REF <_1, 128, 0>;
_9 = _7 + _7;
_10 = BIT_FIELD_REF <_1, 128, 128>;
_12 = _10 + _10;
_14 = _7 + _9;
_16 = _10 + _12;
_3 = {_14, _16};
*x_5(D) = _3;
w/o fixing PR65832 this can be improved by "combining" the loads with the
extracts and the CONSTRUCTOR with the store.
I have done sth similar for COMPLEX_EXPR in tree-ssa-forwprop.c ... (not
that I am very proud of that - heh).
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65832
[Bug 65832] Inefficient vector construction