https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93558
Bug ID: 93558 Summary: missing mempcpy folding defeats strlen optimization Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: msebor at gcc dot gnu.org Target Milestone: --- The two functions below create the same string but f1() is micro-optimized to avoid copying the terminating nul in "1234" before immediately appending "5678" (whether or not such micro-optimization ever makes sense is a separate issue). Yet GCC ultimately optimizes f0() better because in f1() it doesn't exploit the basic property of mempcpy(d, ..., 4): that it returns d + 4. It seems that if it's profitable to (as far as I can see) unconditionally transform stpcpy(D, S) to strcpy(D, S)/memcpy(D, S, N) + N (when N is the known length of S), it should likewise be profitable to transform mempcpy to memcpy + N. Either way, GCC should emit equivalently efficient code for both functions below. $ cat a.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout a.c void f0 (char *d, const char *s) { char *t = __builtin_stpcpy (d, "1234"); __builtin_strcpy (t, "5678"); if (__builtin_strlen (d) != 8) // folded to false __builtin_abort (); } void f1 (char *d, const char *s) { char *t = __builtin_mempcpy (d, "1234", 4); __builtin_strcpy (t, "5678"); if (__builtin_strlen (d) != 8) // not folded __builtin_abort (); } ;; Function f0 (f0, funcdef_no=0, decl_uid=3479, cgraph_uid=1, symbol_order=0) f0 (char * d, const char * s) { char * t; <bb 2> [local count: 1073741824]: __builtin_memcpy (d_3(D), "1234", 4); t_5 = d_3(D) + 4; __builtin_memcpy (t_5, "5678", 5); [tail call] return; } ;; Function f1 (f1, funcdef_no=1, decl_uid=3484, cgraph_uid=2, symbol_order=1) f1 (char * d, const char * s) { char * t; long unsigned int _1; <bb 2> [local count: 1073741824]: t_5 = __builtin_mempcpy (d_3(D), "1234", 4); __builtin_memcpy (t_5, "5678", 5); _1 = __builtin_strlen (d_3(D)); if (_1 != 8) goto <bb 3>; [0.00%] else goto <bb 4>; [100.00%] <bb 3> [count: 0]: __builtin_abort (); <bb 4> [local count: 1073741824]: return; }