https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91019

            Bug ID: 91019
           Summary: Missed optimization on sequential memcpy calls
           Product: gcc
           Version: 9.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mserdarsanli at gmail dot com
  Target Milestone: ---

#include <stdint.h>
#include <string.h>

void encode_v1(uint8_t *buf, uint64_t a1, uint16_t a2) {
    memcpy(buf, &a1, 6);
    memcpy(buf+6, &a2, 2);
}

void encode_v2(uint8_t *buf, uint64_t a1, uint16_t a2) {
    memcpy(buf, &a1, 8);
    memcpy(buf+6, &a2, 2);
}


Two functions above should be equivalent, packing arguments into buffer. 

`encode_v1` copies 6 bytes, then 2 bytes.
`encode_v2` copies 8 bytes, then replaces last two bytes.

Functionally they are the same, while v2 generates better assembly.


This is the assembly with -O3 (https://godbolt.org/z/i6TMiY)

encode_v1(unsigned char*, unsigned long, unsigned short):
        mov     eax, esi
        shr     rsi, 32
        mov     WORD PTR [rdi+6], dx
        mov     DWORD PTR [rdi], eax
        mov     WORD PTR [rdi+4], si
        ret
encode_v2(unsigned char*, unsigned long, unsigned short):
        mov     QWORD PTR [rdi], rsi
        mov     WORD PTR [rdi+6], dx
        ret

Reply via email to