https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124624

            Bug ID: 124624
           Summary: Missed optimization on arrangement of code
                    blocks/branches (-Os mode)
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

This example code is from my personal project.

```c
/* <https://gitlab.com/Explorer09/utf_convert> */
#include <stdint.h>
uint32_t utf16_decode_code_unit(unsigned int *prev_surrogate, unsigned int
code_unit) {
    if (*prev_surrogate == 0) {
        if ((code_unit >> 10) == (0xD800 >> 10)) {
            *prev_surrogate = code_unit;
            return (uint32_t)-2;
        }
        if ((code_unit >> 10) == (0xDC00 >> 10)) {
            return (uint32_t)-1;
        }
        if ((code_unit >> 16) != 0) {
            return (uint32_t)-1;
        }
        return code_unit;
    }
    *prev_surrogate = 0;
    if ((code_unit >> 10) != (0xDC00 >> 10)) {
        return (uint32_t)-1;
    }
    return (((uint32_t)*prev_surrogate << 10) + code_unit -
        ((uint32_t)0xD800 << 10) + 0x10000 - 0xDC00);
}
```

Compiler Explorer link: https://godbolt.org/z/KMeTanerd

Take note on the two conditionals `(code_unit >> 10) == (0xDC00 >> 10)` and
`(code_unit >> 16) != 0`.

x86-64 gcc 15.2 with -Os option generates the following for the two
conditionals:

```assembly
.L3:
        cmpl    $55, %edx
        jne     .L5
.L6:
        movl    $-1, %eax
        ret
.L5:
        movl    %esi, %ecx
        shrl    $16, %ecx
        je      .L4
        jmp     .L6
# ... skipped ...
.L4:
        ret
```

The code can technically be smaller (one less JMP instruction) by swapping the
.L5 and .L6 blocks, resulting in the following:

```assembly
.L3:
        cmpl    $55, %edx
        je      .L6
        movl    %esi, %ecx
        shrl    $16, %ecx
        je      .L4
.L6:
        movl    $-1, %eax
        ret
# ... skipped ...
.L4:
        ret
```

If I swap the two conditionals in the example code (as a workaround), GCC can
make the code size I desired.

Reply via email to