https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88809

            Bug ID: 88809
           Summary: do not use rep-scasb for inline strlen/memchr
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Performance difference between libc strlen and x86 rep-scasb has grown too
large and seems unlikely to improve anytime soon.

On most x86 cores, microcode for rep-scasb is not too sophisticated and runs at
0.5 bytes per cycle or worse (according to Agner Fog's research; with SkylakeX
managing 1 b/c), plus some overhead for entering/leaving the microcode loop (I
think on the order of 20 cycles, but don't have exact info).

Whereas libc strlen typically has small overhead for short strings and uses
register-wide operations on long strings, sustaining on the order of 4-8 b/c
only with integer registers or even in the ballpark of 16-64 b/c with SSE/AVX
(sorry, don't have exact figures here).

A call to strlen is also shorter by itself (rep-scasb needs extra instructions
to setup %rax and fixup %rcx).
(although to be fair, a call to strlen prevents use of redzone and clobbers
more registers)

Therefore I suggest we don't use rep-scasb for inline strlen anymore by default
(we currently do at -Os). This is in part motivated by PR 88793 and the Redhat
bug referenced from there.

Reply via email to