https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85757
Bug ID: 85757 Summary: tree optimizers fail to fully clean up fixed-size memcpy Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- This is minimized from one of suboptimal stack consumption issues in gcc_qsort. gcc_qsort uses code similar to this to move potentially-unaligned data: void f(int n, char *p0, char *p1, char *p2, char *o) { int t0, t1; __builtin_memcpy(&t0, p0, 1); __builtin_memcpy(&t1, p1, 1); if (n==3) __builtin_memcpy(o+2, p2, 1); __builtin_memcpy(o+0, &t0, 1); __builtin_memcpy(o+1, &t1, 1); } Note the mismatch between memcpy size (1) and temporaries' size (4). If the sizes match, there's no problem. If not, tree optimizers fail to fully clean up the copies (and, unlike in this minimal testcase, in full gcc_qsort RTL optimizers can't clean it up either and we get dead stack stores). The .optimized dump reads (note dead writes to t0 and t1 in BB 2): f (int n, char * p0, char * p1, char * p2, char * o) { int t1; int t0; unsigned char _4; unsigned char _7; unsigned char _12; <bb 2> [local count: 1073741825]: _4 = MEM[(char * {ref-all})p0_3(D)]; MEM[(char * {ref-all})&t0] = _4; _7 = MEM[(char * {ref-all})p1_6(D)]; MEM[(char * {ref-all})&t1] = _7; if (n_9(D) == 3) goto <bb 3>; [34.00%] else goto <bb 4>; [66.00%] <bb 3> [local count: 365072220]: _12 = MEM[(char * {ref-all})p2_11(D)]; MEM[(char * {ref-all})o_10(D) + 2B] = _12; <bb 4> [local count: 1073741825]: MEM[(char * {ref-all})o_10(D)] = _4; MEM[(char * {ref-all})o_10(D) + 1B] = _7; t0 ={v} {CLOBBER}; t1 ={v} {CLOBBER}; return; }