https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110481

            Bug ID: 110481
           Summary: Possible improvements in dense switch statement
                    returning values
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Putting this provisionally into tree-optimization, although there may
be other aspects.

Consider

unsigned int foo(unsigned int a)
{
  switch ((a >> 10) & 3)
    {
    case 0:
      return 8;
    case 1:
      return 16;
    case 2:
      return 32;
    case 3:
      return 64;
    }
}

unsigned int bar(unsigned int a)
{
  return 1u << (((a >> 10) & 3) + 3);
}

unsigned int baz (unsigned int a)
{
  switch (a & (3 << 10))
    {
    case 0:
      return 8;
    case 1 << 10:
      return 16;
    case 2 << 10:
      return 32;
    case 3 << 10:
      return 64;
    }
}

which all do the same thing.

The code for bar is

bar:
.LFB1:
        .cfi_startproc
        shrl    $10, %edi
        movl    $1, %eax
        movl    %edi, %ecx
        andl    $3, %ecx
        addl    $3, %ecx
        sall    %cl, %eax
        ret

which is optimum except for the register move (submitted as PR110479).

The compiler does not recognize that foo or baz are equivalent to bar,
but that may be too much of a special case to really consider. 

The code for foo is

foo:
.LFB0:
        .cfi_startproc
        shrl    $10, %edi
        movl    $8, %eax
        andl    $3, %edi
        decl    %edi
        cmpl    $2, %edi
        ja      .L1
        movzbl  CSWTCH.1(%rdi), %eax
.L1:
        ret
        .cfi_endproc

[...]

CSWTCH.1:
        .byte   16
        .byte   32
        .byte   64

where it seems strange that there is a comparison and conditional
jump around the load.  A look at *.optimized shows

<bb 2> [local count: 1073741824]:
  _1 = a_4(D) >> 10;
  _2 = _1 & 3;
  _8 = _2 + 4294967295;
  if (_8 <= 2)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  _6 = CSWTCH.1[_8];
  _5 = (unsigned int) _6;

  <bb 4> [local count: 1073741824]:
  # _3 = PHI <_5(3), 8(2)>
  return _3;

which assigns a probability of 50% to (a>>10)& 3 being zero.
Where this comes from is unclear to me.  A straightforward load
from a table which also includes the 8 seems more logical to me
(especially with -Os).

Finally, baz generates

baz:
.LFB2:
        .cfi_startproc
        andl    $3072, %edi
        movl    $32, %eax
        cmpl    $2048, %edi
        je      .L6
        ja      .L8
        movl    $8, %eax
        testl   %edi, %edi
        je      .L6
        movl    $16, %eax
        ret
.L8:
        movl    $64, %eax
.L6:
        ret

when transforming into something equivalent to foo (or even bar)
would seem advantageous.

Reply via email to