https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104106
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Summary|Fail to remove some useless |Fail to remove stores to |loop |VLA inside loops Last reconfirmed| |2022-01-18 --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- If we change the VLA to a normal array, GCC is able to optimize f and g. h is almost done: <bb 2> [local count: 118111600]: if (n_8(D) > 0) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 105119324]: _18 = (unsigned int) n_8(D); _6 = (sizetype) _18; __builtin_memcpy (&tmp_a, a_11(D), _6); <bb 4> [local count: 118111600]: _4 = tmp_a[0]; GCC could do a PRE for the load of tmp_a[0] to a_11[0] and unspecified. This is true even with the VLA. As for i, that requires loop fision which GCC does not implement yet (there is another bug about that even).