http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #72 from Oleg Endo <olegendo at gcc dot gnu.org> ---
The original test case in PR 59343 is an interesting one with regard to T bit
optimizations (or the lack thereof):

void validate_number (char **numbertext)
{
  char *ptr = *numbertext;
  int valid = (ptr != 0) && (*ptr);

  for ( ; valid && *ptr; ++ptr)
    valid = (*ptr >= '0');

  if (!valid)
    *numbertext = 0;
}

with -Os -m4 -mb it is compiled to:

_validate_number:
        mov.l   @r4,r2    // [bb 2]
        tst     r2,r2
        bt/s    .L2
        mov     #0,r1


        mov.b   @r2,r1    // [bb 3]
        tst     r1,r1
        mov     #-1,r1
        negc    r1,r1

.L2:                      // [bb 4]
        mov     #47,r3

.L3:                      // [bb 5]
        tst     r1,r1
        bt      .L4

        mov.b   @r2+,r1   // [bb 6]
        tst     r1,r1
        bt/s    .L8

        cmp/gt  r3,r1     // [bb 7]

        bra     .L3
        movt    r1

.L4:
        mov.l   r1,@r4   // [bb 8]
.L8:
        rts
        nop


The basic block starting with L3 (bb 5) has three different r1 inputs from [bb
2], [bb 3] and [bb 7].  When sh_treg_combine tries to trace r1 starting in [bb
5]:

tracing (reg/v:SI 1 r1 [orig:185 valid ] [185])

[bb 5]
set of reg not found.  empty BB?

[bb 4]
set of reg not found (cstore)
set not found - aborting trace

Instead it should skip [bb 4] as it doesn't modify r1 or T bit and check [bb 3]
and [bb 2].  Because the setcc insns are not the same in [bb 2], [bb 3] and [bb
7], it would try to eliminate the cstores.  However, in [bb 2] there is no real
cstore but a constant load, which can be replaced with a clrt or sett insn
respectively.  The resulting code could be something like:

        mov.l   @r4,r2
        mov     #0,r1
        tst     r2,r2
        bt/s    .L2     // (*)
        clrt

        mov.b   @r2,r1
        tst     r1,r1
        movt    r1
        tst     r1,r1    // T = !T
.L2:
        mov     #47,r3
.L3:
        bf      .L4

        mov.b   @r2+,r1
        tst     r1,r1
        bt/s    .L8
        bra     .L3
        cmp/gt  r3,r1
.L4:
    mov.l   r1,@r4
.L8:
    rts
    nop

(*) The clrt insn actually has to be inserted before the conditional branch,
which is impossible as it modifies the branch condition.  Putting it into the
delay slot however is OK, which is usually done by the DBR pass.  A special
"branch and set/clear T" pseudo insn would be required (requires SH2+) which
produces the sequence above.  A more complicated way would be to create new
basic blocks.

The basic block reordering or similar RTL pass and the clrt/sett optimization
pass should then be able to simplify the code further to:

        mov.l   @r4,r2
        tst     r2,r2
        bf/s    .L4
        mov     #0,r1

        mov.b   @r2,r1
        tst     r1,r1
        bt/s    .L4
        mov     #47,r3
.L3:
        mov.b   @r2+,r1
        tst     r1,r1
        bt/s    .L8
        cmp/gt  r3,r1
        bt      .L3
.L4:
        mov.l   r1,@r4
.L8:
        rts
        nop

Reply via email to