Hi, On 2020-04-19 15:07:22 -0700, Jeff Davis wrote: > I brought up an issue where GCC in combination with FORTIFY_SOURCE[2] > causes a perf regression for logical tapes after introducing > LogicalTapeSetExtend()[3]. Unfortunately, FORTIFY_SOURCE is used by > default on ubuntu. I have not observed the problem with clang. > > There is no reason why the change should trigger the regression, but it > does. The slowdown is due to GCC switching to an inlined version of > memcpy() for LogicalTapeWrite() at logtape.c:768. The change[3] seems > to have little if anything to do with that.
FWIW, with gcc 10 and glibc 2.30 I don't see such a switch. Taking a profile shows me: │ nthistime = TapeBlockPayloadSize - lt->pos; │ if (nthistime > size) 3.01 │1 b0: cmp %rdx,%r12 1.09 │ cmovbe %r12,%rdx │ memcpy(): │ │ __fortify_function void * │ __NTH (memcpy (void *__restrict __dest, const void *__restrict __src, │ size_t __len)) │ { │ return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); 2.44 │ mov %r13,%rsi │ LogicalTapeWrite(): │ nthistime = size; │ Assert(nthistime > 0); │ │ memcpy(lt->buffer + lt->pos, ptr, nthistime); 2.49 │ add 0x28(%rbx),%rdi 0.28 │ mov %rdx,%r15 │ memcpy(): 4.65 │ → callq memcpy@plt │ LogicalTapeWrite(): I.e. normal memcpy is getting called. That's with -D_FORTIFY_SOURCE=2 With which compiler / libc versions did you encounter this? Greetings, Andres Freund