https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100769

            Bug ID: 100769
           Summary: [D] memcmp() == 0 for small constant strings not
                    folded
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: d
          Assignee: ibuclaw at gdcproject dot org
          Reporter: witold.baryluk+gcc at gmail dot com
  Target Milestone: ---

I expect this D code to be quite optimal, but it isn't.

```
extern(C) int memcmp(const void *s1, const void *s2, size_t n);

int recognize3(const char* s) {
    return memcmp(s, "stract class", 12) == 0;
}
```

https://godbolt.org/z/vx17WK9rs


It produces a call to memcmp, instead of inlining and specializing the code for
this specific case.

int example.recognize3(const(char*)):
        sub     rsp, 8
        mov     edx, 12
        mov     esi, OFFSET FLAT:.LC0
        call    memcmp
        test    eax, eax
        sete    al
        add     rsp, 8
        movzx   eax, al
        ret



ldc2 1.24.0 (for D) and clang 11.0.1-2 (for C and C++), and gcc 10.2.1 (for C
and C++) produce close to optimal codes. Similarly ldc2 1.26.0 (for D), and gcc
11.1 (for C and C++):

int example.recognize3(const(char*)):
        movabs  rcx, 7142836979195081843
        xor     rcx, qword ptr [rdi]
        mov     edx, dword ptr [rdi + 8]
        xor     rdx, 1936941420
        xor     eax, eax
        or      rdx, rcx
        sete    al
        ret

and

recognize3:
        movabs  rax, 7142836979195081843
        cmp     QWORD PTR [rdi], rax
        je      .L6
.L2:
        mov     eax, 1
        xor     eax, 1
        ret
.L6:
        xor     eax, eax
        cmp     DWORD PTR [rdi+8], 1936941420
        jne     .L2
        xor     eax, 1
        ret


Notice, how both gcc, clang and ldc2, compare first 8 bytes of input, then 4
bytes of input. clang and ldc2 just xor/or the result, then return, with no
conditional jumps. gcc does a bit poorer, with more conditionals and more
jumps, but still pretty good and same idea.

gdc however, calls the generic memcmp, that does looping and does about 12
jumps and/or 13 exists.

Reply via email to