https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91019
Bug ID: 91019 Summary: Missed optimization on sequential memcpy calls Product: gcc Version: 9.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mserdarsanli at gmail dot com Target Milestone: --- #include <stdint.h> #include <string.h> void encode_v1(uint8_t *buf, uint64_t a1, uint16_t a2) { memcpy(buf, &a1, 6); memcpy(buf+6, &a2, 2); } void encode_v2(uint8_t *buf, uint64_t a1, uint16_t a2) { memcpy(buf, &a1, 8); memcpy(buf+6, &a2, 2); } Two functions above should be equivalent, packing arguments into buffer. `encode_v1` copies 6 bytes, then 2 bytes. `encode_v2` copies 8 bytes, then replaces last two bytes. Functionally they are the same, while v2 generates better assembly. This is the assembly with -O3 (https://godbolt.org/z/i6TMiY) encode_v1(unsigned char*, unsigned long, unsigned short): mov eax, esi shr rsi, 32 mov WORD PTR [rdi+6], dx mov DWORD PTR [rdi], eax mov WORD PTR [rdi+4], si ret encode_v2(unsigned char*, unsigned long, unsigned short): mov QWORD PTR [rdi], rsi mov WORD PTR [rdi+6], dx ret