https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91019
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2019-06-27 CC| |law at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. If you adjust the testcase to store only 7 bytes in encode_v2 I would have expeccted DSE to trim the memcpy from seeing the second store. With 8 (and also 2) the memcpy are folded to plain assignments early so we have encode_v1 (unsigned char * buf, long unsigned int a1, short unsigned int a2) { <bb 2> [local count: 1073741824]: __builtin_memcpy (buf_2(D), &a1, 6); MEM[(char * {ref-all})buf_2(D) + 6B] = a2_5(D); return; encode_v2 (unsigned char * buf, long unsigned int a1, short unsigned int a2) { <bb 2> [local count: 1073741824]: MEM[(char * {ref-all})buf_2(D)] = a1_5(D); MEM[(char * {ref-all})buf_2(D) + 6B] = a2_6(D); return; I'm not sure where it would best fit to _extend_ the earlier memcpy. We have to be careful to not introduce store data races. Eventually store-merging can come to the rescue... OTOH how realistic is this testcase?