5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

gcc-bugzilla at mkarcher dot dialup.fu-berlin.de Sun, 16 Nov 2014 02:02:39 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783


--- Comment #7 from Michael Karcher <gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de> ---
(In reply to Oleg Endo from comment #6)
> > For the transformation to be valid, you would need a logical not instruction
> > instead of the bitwise not instruction that sets the desination register to
> > zero if the source register contains any non-zero value, not just if the
> > source register contains -1.
> 
> There is a 2 insn sequence to do that on SH:
>    tst  rm,rm
>    movt rn
> 
> As far as I can see it, emitting this sequence instead of the bitwise not
> insn would fix the problem.  This can be done by changing the function
> sh_treg_combine::make_not_reg_insn.  It should be safe to emit that sequence
> (which clobbers the T reg) instead of the bitwise not.

This seems definitely true. But I wonder whether this is acutally needed, or
already caught perfectly in try_eliminate_cstores. Let's have a look:

The buggy branch in try_combine_comparisons applies only if (first loop)
 - All comparisons are the same type
 - The second operands are either all the same or all registers
 - Removal of those instruction is safe
Furthermore, there need to be mixed cstores and
 - All minority comparisons need to be comparisons
 - All minority comparisons need to be against the integral constant zero

So if one combines what applies to minority stores only in the test for
applicability of the "make_not_reg_insn" idea with the general applicability
check of the try_combine_comparisons pass, the end result is:

The buggy branch only applies if
 - All comparisons are equality comparisons
 - The second operand needs to be the integral constant zero.

Now this is exactly what you correctly commented in "example 1":

 [bb 3]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 (set (reg:SI 167) (xor:SI (reg:SI 147 t) (const_int 1)))
 -> bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 177) (const_int 0)))
 (set (reg:SI 167) (reg:SI 147 t))
 -> bb 5

 [bb 5]
 (set (reg:SI 147 t) (eq:SI (reg:SI 167) (const_int 0)))
 (set (pc) (if_then_else (ne (reg:SI 147 t) (const_int 0))
                        (label_ref:SI 50) (pc)))

You are going to transform this to, assuming bb 4 is considered minority. (The
first RTL instruction in BB4 gets transformed to "test rm, rm", the second RTL
instruction gets transformed to "movt rn".)

 [bb 3]
 (set (reg:SI 167) (reg:SI 173))
 -> bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 (set (reg:SI 167) (xor:SI (reg:SI 147 t) (const_int 1)))
 -> bb 5

 [bb 5]
 (set (reg:SI 147 t) (eq:SI (reg:SI 167) (const_int 0)))
 (set (pc) (if_then_else (ne (reg:SI 147 t) (const_int 0)))
                         (label_ref:SI 50) (pc)))

It is afterwards quite likely that the set instruction in bb3 can be elminated
by renaming reg 167 to reg 173.

If this transformation is removed, though, the transformation of example 4
applies instead, which will result in

 [bb 3]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 -> bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
 (set (reg:SI 147 t) (xor:SI (reg:SI 147 t) (const_int 1)))
 -> bb 5

 [bb 5]
 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
                         (label_ref:SI 50) (pc)))

and this gets (except SH2A with nott) transformed to (by define_insn_and_split
"nott" in the machine definition)

 [bb 3]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 -> bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
 (set (reg:SI 200) (xor:SI (reg:SI 147 t) (const_int 1)))
 (set (reg:SI 147 t) (eq:SI (reg:SI 200) (const_int 0)))
 -> bb 5

 [bb 5]
 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
                         (label_ref:SI 50) (pc)))

which, at least in my example, seems to be transformed by register renaming and
common code elimination (combining the to treg setting instructions at the end
of bb3 and bb4) to something like

 [bb 3]
 -> bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
 (set (reg:SI 173) (xor:SI (reg:SI 147 t) (const_int 1)))
 -> bb 5

 [bb 5]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
                         (label_ref:SI 50) (pc)))

which is exactly the result you also get with special-casing the
compare-to-zero case.

> OK, thanks for the confirmation.  I had a closer look at the current code
> and your patch.  I think that the bitwise-not code paths should not be
> removed entirely, as they handle the "mixed normal and inverting cstores"
> cases.  I will prepare a patch in a couple of days.

If I am not mistaken in my analysis above, the "mixed cstores" case handling in
try_eliminate_cstores does already what you need, while pushing the desicision
whether those are "comparisons to zero" and can be further optimized on common
subexpression elimination, instead of hand-written code checking for a special
"compare to zero" pattern.

Disclaimer: I performed no further test than looking at machine code generated
for the example program given in the bug report.

[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

Reply via email to