https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85757

            Bug ID: 85757
           Summary: tree optimizers fail to fully clean up fixed-size
                    memcpy
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

This is minimized from one of suboptimal stack consumption issues in gcc_qsort.

gcc_qsort uses code similar to this to move potentially-unaligned data:

void f(int n, char *p0, char *p1, char *p2, char *o)
{
    int t0, t1;
    __builtin_memcpy(&t0, p0, 1);
    __builtin_memcpy(&t1, p1, 1);
    if (n==3) __builtin_memcpy(o+2, p2, 1);
    __builtin_memcpy(o+0, &t0, 1);
    __builtin_memcpy(o+1, &t1, 1);
}

Note the mismatch between memcpy size (1) and temporaries' size (4).

If the sizes match, there's no problem. If not, tree optimizers fail to fully
clean up the copies (and, unlike in this minimal testcase, in full gcc_qsort
RTL optimizers can't clean it up either and we get dead stack stores). The
.optimized dump reads (note dead writes to t0 and t1 in BB 2):

f (int n, char * p0, char * p1, char * p2, char * o)
{
  int t1;
  int t0;
  unsigned char _4;
  unsigned char _7;
  unsigned char _12;

  <bb 2> [local count: 1073741825]:
  _4 = MEM[(char * {ref-all})p0_3(D)];
  MEM[(char * {ref-all})&t0] = _4;
  _7 = MEM[(char * {ref-all})p1_6(D)];
  MEM[(char * {ref-all})&t1] = _7;
  if (n_9(D) == 3)
    goto <bb 3>; [34.00%]
  else
    goto <bb 4>; [66.00%]

  <bb 3> [local count: 365072220]:
  _12 = MEM[(char * {ref-all})p2_11(D)];
  MEM[(char * {ref-all})o_10(D) + 2B] = _12;

  <bb 4> [local count: 1073741825]:
  MEM[(char * {ref-all})o_10(D)] = _4;
  MEM[(char * {ref-all})o_10(D) + 1B] = _7;
  t0 ={v} {CLOBBER};
  t1 ={v} {CLOBBER};
  return;

}

Reply via email to