http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53669

             Bug #: 53669
           Summary: suboptimal small switch - 3-way jump with only 1
                    comparison
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: vermaelen.wou...@gmail.com


Hi,

The program below generates sub-optimal code for small (+dense) switch
statements (small enough so that they are not implemented via jump-tables). The
function foo() is the smallest example I could come up with that shows the
problem. The function bar() shows the same problem in a slightly more realistic
context.

The function foo() also shows a missed opportunity for jump-threading(?),
should I file a separate bug report for this?

Tested on linux x86_64 with SVN revision thrunk@188550. Below I show the code
generated with -Os, but -O2 and -O3 show the same missed optimization.

Wouter


-------8<---------8<---------8<--------8<--------8<-------

void f0();
void f1();
void f2();
void f3();


// Toy example to demonstrate the problem
void foo(int x) {
    switch (x) {
        case 0:  f0(); break;
        case 1:  f1(); break;
        default: f2(); break;
    }
}

// This is the code generated by 'g++-thrunk@188550 -Os'
// _Z3fooi: testl   %edi, %edi  <-- 1
//          je      .L5
//          decl    %edi        <-- 2 comparisons
//          jne     .L7
//          jmp     .L6         <-- why not immediately jump to _Z2f1v?
// .L5:     jmp     _Z2f0v
// .L6:     jmp     _Z2f1v
// .L7:     jmp     _Z2f2v

// This generates optimal(?) code
void my_foo(int x) {
    asm goto (
        "cmpl $1,%[x];"  // only 1 comparison
        "je   %l[l1];"
        "jb   %l[l0];"
        :: [x] "r" (x)
        :: l0, l1
    );
l2:    f2(); return;
l0:    f0(); return;
l1:    f1(); return;
}


// Bigger example in a slightly more realistic context:
//  e.g. a main loop that handles 4 elements per iteration and a switch
//       like this after that loop to handle the remaining elements.
void bar(int x) {
    switch (x & 3) {
        case 0: f0(); break;
        case 1: f1(); break;
        case 2: f2(); break;
        case 3: f3(); break;
    }
}

// This is the code generated by 'g++-thrunk@188550 -Os'
// _Z3bari: andl    $3, %edi
//          cmpl    $2, %edi    <-- 1
//          je      .L11
//          cmpl    $3, %edi    <-- 2
//          je      .L12
//          decl    %edi        <-- 3 comparisons
//          je      .L10
//          jmp     _Z2f0v
// .L10:    jmp     _Z2f1v
// .L11:    jmp     _Z2f2v
// .L12:    jmp     _Z2f3v

// This generates optimal(?) code
void my_bar(int x) {
    asm goto (
        "andl $3, %[x];" // implicit comparison with 0
        "je   %l[l0];"
        "cmpl $2, %[x];" // 1 explicit comparison
        "je   %l[l2];"
        "jb   %l[l1];"
        :: [x] "r" (x)
        :: l0, l1, l2
    );
l3:    f3(); return;
l0:    f0(); return;
l1:    f1(); return;
l2:    f2(); return;
}

Reply via email to