http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244
--- Comment #72 from Oleg Endo <olegendo at gcc dot gnu.org> --- The original test case in PR 59343 is an interesting one with regard to T bit optimizations (or the lack thereof): void validate_number (char **numbertext) { char *ptr = *numbertext; int valid = (ptr != 0) && (*ptr); for ( ; valid && *ptr; ++ptr) valid = (*ptr >= '0'); if (!valid) *numbertext = 0; } with -Os -m4 -mb it is compiled to: _validate_number: mov.l @r4,r2 // [bb 2] tst r2,r2 bt/s .L2 mov #0,r1 mov.b @r2,r1 // [bb 3] tst r1,r1 mov #-1,r1 negc r1,r1 .L2: // [bb 4] mov #47,r3 .L3: // [bb 5] tst r1,r1 bt .L4 mov.b @r2+,r1 // [bb 6] tst r1,r1 bt/s .L8 cmp/gt r3,r1 // [bb 7] bra .L3 movt r1 .L4: mov.l r1,@r4 // [bb 8] .L8: rts nop The basic block starting with L3 (bb 5) has three different r1 inputs from [bb 2], [bb 3] and [bb 7]. When sh_treg_combine tries to trace r1 starting in [bb 5]: tracing (reg/v:SI 1 r1 [orig:185 valid ] [185]) [bb 5] set of reg not found. empty BB? [bb 4] set of reg not found (cstore) set not found - aborting trace Instead it should skip [bb 4] as it doesn't modify r1 or T bit and check [bb 3] and [bb 2]. Because the setcc insns are not the same in [bb 2], [bb 3] and [bb 7], it would try to eliminate the cstores. However, in [bb 2] there is no real cstore but a constant load, which can be replaced with a clrt or sett insn respectively. The resulting code could be something like: mov.l @r4,r2 mov #0,r1 tst r2,r2 bt/s .L2 // (*) clrt mov.b @r2,r1 tst r1,r1 movt r1 tst r1,r1 // T = !T .L2: mov #47,r3 .L3: bf .L4 mov.b @r2+,r1 tst r1,r1 bt/s .L8 bra .L3 cmp/gt r3,r1 .L4: mov.l r1,@r4 .L8: rts nop (*) The clrt insn actually has to be inserted before the conditional branch, which is impossible as it modifies the branch condition. Putting it into the delay slot however is OK, which is usually done by the DBR pass. A special "branch and set/clear T" pseudo insn would be required (requires SH2+) which produces the sequence above. A more complicated way would be to create new basic blocks. The basic block reordering or similar RTL pass and the clrt/sett optimization pass should then be able to simplify the code further to: mov.l @r4,r2 tst r2,r2 bf/s .L4 mov #0,r1 mov.b @r2,r1 tst r1,r1 bt/s .L4 mov #47,r3 .L3: mov.b @r2+,r1 tst r1,r1 bt/s .L8 cmp/gt r3,r1 bt .L3 .L4: mov.l r1,@r4 .L8: rts nop