https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93558

            Bug ID: 93558
           Summary: missing mempcpy folding defeats strlen optimization
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

The two functions below create the same string but f1() is micro-optimized to
avoid copying the terminating nul in "1234" before immediately appending "5678"
(whether or not such micro-optimization ever makes sense is a separate issue). 
Yet GCC ultimately optimizes f0() better because in f1() it doesn't exploit the
basic property of mempcpy(d, ..., 4): that it returns d + 4.

It seems that if it's profitable to (as far as I can see) unconditionally
transform stpcpy(D, S) to strcpy(D, S)/memcpy(D, S, N) + N (when N is the known
length of S), it should likewise be profitable to transform mempcpy to memcpy +
N.  Either way, GCC should emit equivalently efficient code for both functions
below.

$ cat a.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout a.c
void f0 (char *d, const char *s)
{
  char *t = __builtin_stpcpy (d, "1234");
  __builtin_strcpy (t, "5678");
  if (__builtin_strlen (d) != 8)   // folded to false
    __builtin_abort ();
}

void f1 (char *d, const char *s)
{
  char *t = __builtin_mempcpy (d, "1234", 4);
  __builtin_strcpy (t, "5678");
  if (__builtin_strlen (d) != 8)   // not folded
    __builtin_abort ();
}

;; Function f0 (f0, funcdef_no=0, decl_uid=3479, cgraph_uid=1, symbol_order=0)

f0 (char * d, const char * s)
{
  char * t;

  <bb 2> [local count: 1073741824]:
  __builtin_memcpy (d_3(D), "1234", 4);
  t_5 = d_3(D) + 4;
  __builtin_memcpy (t_5, "5678", 5); [tail call]
  return;

}



;; Function f1 (f1, funcdef_no=1, decl_uid=3484, cgraph_uid=2, symbol_order=1)

f1 (char * d, const char * s)
{
  char * t;
  long unsigned int _1;

  <bb 2> [local count: 1073741824]:
  t_5 = __builtin_mempcpy (d_3(D), "1234", 4);
  __builtin_memcpy (t_5, "5678", 5);
  _1 = __builtin_strlen (d_3(D));
  if (_1 != 8)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  __builtin_abort ();

  <bb 4> [local count: 1073741824]:
  return;

}

Reply via email to