------- Comment #5 from rguenth at gcc dot gnu dot org 2010-01-17 21:05 ------- Well, indeed - it would be far more useful to have a less convoluted testcase where unrelated functions and source make analysis hard.
Please provide a testcase with a _single_ computation kernel applying it in a single way (I'm trying to follow op_and ...). >From an inlining perspective it doesn't look so bad - early inlining turns the innermost loop into <bb 7>: D.15257_17 = thisD.13894_4(D)->m_DataD.13729; D.15256_19 = iD.15246_18 * 16; D.15255_20 = D.15257_17 + D.15256_19; D.15254_21 = v2D.13892_5(D)->m_DataD.13729; D.15256_22 = iD.15246_18 * 16; D.15253_23 = D.15254_21 + D.15256_22; D.15252_24 = *D.15253_23; D.15251_25 = *D.15236_2; D.15256_26 = iD.15246_18 * 16; D.15250_27 = D.15251_25 + D.15256_26; D.15249_28 = *D.15250_27; D.15247_29 = __builtin_ia32_pand128D.1150 (D.15249_28, D.15252_24); *D.15255_20 = D.15247_29; iD.15246_30 = iD.15246_18 + 1; <bb 8>: # iD.15246_18 = PHI <0(6), iD.15246_30(7)> if (max_idxD.15245_16 != iD.15246_18) goto <bb 7>; already (_ZN10bit_vector6op_andERKS_S1_). Now the main issue why the redundant loads are not hoisted is that all data pointers are ref-all: chunk_typeD.13721 * {ref-all} D.15255; thus you tell the compiler that the store *D.15255_20 = D.15247_29; might possibly clobber all loads in that loop. Of course chunk_type is just __m128i and I always complained that this is ref-all which makes optimization of pointers to __m128i practically useless. This is really a target header problem. -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rth at gcc dot gnu dot org, | |rguenth at gcc dot gnu dot | |org, hjl at gcc dot gnu dot | |org Status|UNCONFIRMED |NEW Component|tree-optimization |target Ever Confirmed|0 |1 Keywords| |alias, missed-optimization Last reconfirmed|0000-00-00 00:00:00 |2010-01-17 21:05:55 date| | Summary|[C++0x] Variadic templates +|[C++0x] Variadic templates + |lambdas = extremely poor |lambdas = extremely poor |code quality |code quality, __m128i and | |aliasing sucks http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42779