Re: peephole2: dead regs not marked as dead

Georg Lay Wed, 10 Nov 2010 08:03:54 -0800

Paolo Bonzini schrieb:
> On 11/10/2010 11:58 AM, Georg Lay wrote:
>> In the old 3.4.x (private port) I introduced a target hook in combine,
>> just prior to where recog_for_combine gets called. The hook did some
>> canonicalization of rtx and thereby considerably reduced the number of
>> patterns that would have been necessary without that hook. It
>> transformed some unspecs.
> 
> How is that related to delegitimize_address?


It's nothing to do with delegitimize address.

Ok, let me tell the storyan go into the mess. Some users wanted to
have a bit data type in C and to write down variables that represent a
bit in memory or in auto or as parameter. (I strongly recommended not
to do such hacking in a compiler because it is not within C and for
reasons you all know. For reasons you also know, the project was
agreed ...).

There was already some hacking in gcc when I started and users
complained about superfluous zero-extract here and there. No bloat,
just a trade-off compiler vs. asm. tricore is capable of loading a
byte from absolute address space (LD.BU) and has instructions to
operate on any bit (and, or, xor, jmp, compare-accumulate-bitop,
extract, insert, etc.). New type _bit was mapped to BImode. To see
where the inconveniences are, let's have a look at a comment

;;   The TriCore instruction set has many instructions that deal with
;;   bit operands. These bits are then given in two operands: one operand
;;   is the register that contains the bit, the other operand specifies
;;   the bit's position within that register as an immediate, i.e. the
;;   bit's position must be a runtime constant.
;;
;;   A new relocation type `bpos' for the bit's position within a byte had
;;   been introduced, enabling to pack bits in memory without the need to
;;   pack them in bitfields.
;;
;;   Such an instruction sequence may look like
;;
;;         ld.bu   %d0, the_bit                     # insn A
;;         jz.t    %d0, bpos:the_bit, .some_label   # insn A
;;
;;   The compiler could, of course, emit such instruction seqences.
;;   However, emitting such sequences "en bloc" has several drawbacks:
;;
;;   -- every time we need a bit the bit must be read
;;   -- emitting instruction en bloc knocks out the scheduler
;;   -- when writing all the cases as combiner patterns the number of
;;      insn patterns would explode
;;
;;   Therefore, a bit could have been described as BImode, which would
;;   lead to the following instruction sequence
;;
;;         ld.bu   %d0, the_bit                    # insn A
;;         extr.u  %d0, %d0,bpos:the_bit, 1        # insn A
;;         jz.t    %d0, 0, .some_label             # insn B
;;
;;   I.e. a bit is represented as BImode and therefore always resides
;;   in the LSB. The advantage is that the bit now lives in a register
;;   and can be reused. The disadvantage is that
;;
;;   -- often, there is a superfluous instruction (extr.u)
;;   -- the scheduler cannot schedule ld.bu away from extr.u
;;   -- for combiner patterns the same as above applies
;;
;;   BPOS representation
;;   ===================
;;
;;   Thus, we need a way to tag a register with the bit's position.
;;   I use the new LOADBI and BPOS rtx_code, here in an assembler dump
from gcc
;;   which shows the 1-to-1 correspondence between insn and machine
instruction:
;;
;;     # (set (reg:SI 0)
;;     #      (loadbi:SI (mem:BI (symbol_ref:SI ("?the_bit")))
;;     #                 (symbol_ref:SI ("?the_bit"))))
;;     ld.bu   %d0, the_bit                        # insn A
;;
;;     # (set (pc)
;;     #      (if_then_else (eq (bpos:BI (reg:SI 0)
;;     #                                 (symbol_ref:SI ("?the_bit")))
;;     #                        (const_int 0))
;;     #                    (label_ref 42)
;;     #                    (pc)))
;;     jz.t    %d0, bpos:the_bit, .some_label      # insn B
;;
;;   The sole SYMBOL_REFs are the tags.
;;   The reason to use own RTX code and not UNSPEC is because the combiner
;;   does not like UNSPEC and the code would not be as dense as could be.
;;
;;   Using ZERO_EXTRACT for LOADBI does not work either because a
ZERO_EXTRACT
;;   may not be used as lvalue which is needed when loading a bit.
;;   The ZERO_EXTRACT on the left side would assume that the register that
;;   is to be loaded already lives, because the ZERO_EXTRACT affects
;;   just some bits, not all.
;;
;;   Note that a LOADBI in this context is a load together with a
shift left
;;   and a BPOS is a shift right i.e. equivalent to some ZERO_EXTRACT.
;;
;;   The tag in a BPOS may also be a CONST_INT
;;   in which case the bit's position is a compile time constant.
;;
;;   The output mode of a BPOS (M1) needs not to be BI, it may also be
;;   some wider mode like QI or SI. A wider mode represents a BPOS that is
;;   zero extended to that mode. The input mode's bit size must be at most
;;   one plus the bit's position (8 for a SYMBOL_REF).
;;
;;   BPOS Operands
;;   =============
;;
;;   Next, I introduced BPOS operands, i.e. a bpos_operand predicate and
;;   BPOS constraints. Thereby, I can use `bpos_operand' in insn
;;   predicates and there is no need to write down the very BPOS
;;   representation. That technique hides the internal BPOS representation
;;   from the machine description.
;;
;;   A BPOS operand is one of these:
;;      (REG:BI)
;;      (SUBREG ((REG:BI) 0))
;;      (BPOS ((REG) (CONST_INT)))
;;      (BPOS ((REG) (SYMBOL_REF)))
;;      (SUBREG (BPOS ((REG) (CONST_INT)) 0))
;;      (SUBREG (BPOS ((REG) (SYMBOL_REF)) 0))
;;   with the bit's position in REG and SUBREG is 0.
;;
;;   RELOAD pass
;;   ===========
;;
;;   Introducing bpos_operand implies the need to reload them.
;;   Reloading a BPOS should not reload the BPOS as a whole but only the
;;   BPOS's register. The patch is done by MAP_RELOADS in `find_reloads'
;;   which ends up in `tric_map_reloads'.
;;
;;   BPOS pass
;;   =========
;;
;;   A complete new pass had been introduced. The load of a bit is
;;   emitted a usual
;;
;;   # (set (reg:BI 100)
;;   #      (mem:BI (symbol_ref:SI ("?the_bit"))))
;;
;;   These insns get splitted into the loadbi and bpos insns mentioned
above.
;;   The bpos pass had been placed before combine, i.e. somewhere
before life
;;   analysis but after the loop optimizer.
;;
;;   After all loadbi insns are split, the insns are traveled again to
insert
;;   the newly generated BPOSs into insns that support bpos_operand.
;;   Amongst them are jumps, some shifts, extracts, bit operations, etc.
;;
;;   In order to reduce the number of patterns, operands that encode
;;   operations that are equivalent to some BPOS are canonicalized to
;;   bpos_operands, e.g. shifting a register right by 4 and then and-ing
;;   with 1 and finally zero extending to SI will become
;;   (bpos:SI (reg) (4)). Canonicalizing includes ZERO_EXTRACT rtx which
;;   are mapped to BPOS, too.
;;
;;   The hook is placed in toplev.c, the worker function is
`tric_insert_bpos'.
;;
;;   COMBINE pass
;;   ============
;;
;;   The combiner puts insns together. The result is stuff like
;;   (bpos:SI (not:SI (reg)) (pos)) which must be canonicalized to
;;   (subreg:SI (not:BI (bpos:BI (reg) (pos))) 0) so that the resulting
;;   rtx will fit into bpos_operand insns. The hook is in combine.c
;;   and the worker function is `tric_replace_for_combine'.
;;
;;   Own enum rtx_code for BPOS/LOADBI
;;   =================================
;;
;;   Introducing new rtx_code requires minor and straight forward
changes in
;;   ./gcc
;;      rtl.def: DEF_RTL_EXPR for BPOS and LOADBI
;;         DEF_RTL_EXPR(BPOS,   "bpos",   "ee", '2')
;;         DEF_RTL_EXPR(LOADBI, "loadbi", "ee", '2')
;;      simplify-rtx.c::simplify_binary_operation()
;;         return 0 for BPOS and LOADBI
;; ...

and one of the insn is

(define_insn "*iorsi3.bit0.nbinopn"
  [(set (match_operand:SI 0 "data_register_operand" "=d")
        (ior:SI (and:SI (not:SI (subreg:SI (match_operator:BI 1
"accumulate_operator"
    [(not:BI (match_operand:BI 2 "bpos_operand" "xd"))
     (match_operand:BI 3 "bpos_operand" "xd")]) 0))
                        (const_int 1))
                (match_operand:SI 4 "data_register_operand" "0")))]
  "XOR != GET_CODE (operands[1])"
  "or.%D1.t\t%0, %B2, %B3"
  [(set_attr "pipe" "ip")])

The second not might come from the combine hook that transformed
things like (bpos:BI (xor (reg 4) 2) to (not:BI (bpos (reg 2))).
(bpos (reg 2)) is matched by "bpos_operand" and reload reloads just
the reg if it must spill and not the bpos as a whole.


>> A second use case of such a hook could look as follows: Imagine a
>> combined pattern that does not match but would be legal if the backend
>> knew that some register contains some specific value like, e.g.,
>> non-negative, zero-or-storeflagvalue, combiner has proved that some
>> bits will always be set or always be cleared;
> 
> You can use nonzero_bits or num_signbit_copies in define_splits.  In
> _this_ case, a define_insn_or_split doesn't really help, I agree with
> Joern on this. :)

Yes. It's all inside of combine. May the second or first insn
generated by split be no-op insn? What, if the split is in need of a
new pseudo? There may be left-over regs from combine as is tries to
find split points, but that means when combine inserts an SI and you
need a DI you are stuck. I hat that case and couldn't find a solution.
Note that this was back in 3.4 and I don't intend to break out of the
backend's sandbox if not absolutely necessary (in the case above I
tried to keep the changes minimal, like a hook in reload).

>> II.
>> Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2.
>>
>> Combine combines A+B and sees costs f(A+B) = 5. This makes combine
>> reject the pattern, not looking deeper.
> 
> Does it really do that?  I see nothing in combine_instructions which
> would prevent going deeper.

The costs. I see that some passes treat cost=0 special. But gccint
doesn't say a word about it so from the internals you get
interpretation like: "0 will cost nothing". Again, I don't like to use
features that are not documented on gccint level as they may change
over time like "there is this a global, non-gccint-documented variable
that in a specific pass contains a value that I can make use of". As
backend programmer I must have an interface (description) to rely on
and prefer stability over hacking.

Re: peephole2: dead regs not marked as dead

Reply via email to