Paolo Bonzini schrieb:
> On 11/10/2010 11:58 AM, Georg Lay wrote:
>> In the old 3.4.x (private port) I introduced a target hook in combine,
>> just prior to where recog_for_combine gets called. The hook did some
>> canonicalization of rtx and thereby considerably reduced the number of
>> patterns that would have been necessary without that hook. It
>> transformed some unspecs.
>
> How is that related to delegitimize_address?
It's nothing to do with delegitimize address.
Ok, let me tell the storyan go into the mess. Some users wanted to
have a bit data type in C and to write down variables that represent a
bit in memory or in auto or as parameter. (I strongly recommended not
to do such hacking in a compiler because it is not within C and for
reasons you all know. For reasons you also know, the project was
agreed ...).
There was already some hacking in gcc when I started and users
complained about superfluous zero-extract here and there. No bloat,
just a trade-off compiler vs. asm. tricore is capable of loading a
byte from absolute address space (LD.BU) and has instructions to
operate on any bit (and, or, xor, jmp, compare-accumulate-bitop,
extract, insert, etc.). New type _bit was mapped to BImode. To see
where the inconveniences are, let's have a look at a comment
;; The TriCore instruction set has many instructions that deal with
;; bit operands. These bits are then given in two operands: one operand
;; is the register that contains the bit, the other operand specifies
;; the bit's position within that register as an immediate, i.e. the
;; bit's position must be a runtime constant.
;;
;; A new relocation type `bpos' for the bit's position within a byte had
;; been introduced, enabling to pack bits in memory without the need to
;; pack them in bitfields.
;;
;; Such an instruction sequence may look like
;;
;; ld.bu %d0, the_bit # insn A
;; jz.t %d0, bpos:the_bit, .some_label # insn A
;;
;; The compiler could, of course, emit such instruction seqences.
;; However, emitting such sequences "en bloc" has several drawbacks:
;;
;; -- every time we need a bit the bit must be read
;; -- emitting instruction en bloc knocks out the scheduler
;; -- when writing all the cases as combiner patterns the number of
;; insn patterns would explode
;;
;; Therefore, a bit could have been described as BImode, which would
;; lead to the following instruction sequence
;;
;; ld.bu %d0, the_bit # insn A
;; extr.u %d0, %d0,bpos:the_bit, 1 # insn A
;; jz.t %d0, 0, .some_label # insn B
;;
;; I.e. a bit is represented as BImode and therefore always resides
;; in the LSB. The advantage is that the bit now lives in a register
;; and can be reused. The disadvantage is that
;;
;; -- often, there is a superfluous instruction (extr.u)
;; -- the scheduler cannot schedule ld.bu away from extr.u
;; -- for combiner patterns the same as above applies
;;
;; BPOS representation
;; ===================
;;
;; Thus, we need a way to tag a register with the bit's position.
;; I use the new LOADBI and BPOS rtx_code, here in an assembler dump
from gcc
;; which shows the 1-to-1 correspondence between insn and machine
instruction:
;;
;; # (set (reg:SI 0)
;; # (loadbi:SI (mem:BI (symbol_ref:SI ("?the_bit")))
;; # (symbol_ref:SI ("?the_bit"))))
;; ld.bu %d0, the_bit # insn A
;;
;; # (set (pc)
;; # (if_then_else (eq (bpos:BI (reg:SI 0)
;; # (symbol_ref:SI ("?the_bit")))
;; # (const_int 0))
;; # (label_ref 42)
;; # (pc)))
;; jz.t %d0, bpos:the_bit, .some_label # insn B
;;
;; The sole SYMBOL_REFs are the tags.
;; The reason to use own RTX code and not UNSPEC is because the combiner
;; does not like UNSPEC and the code would not be as dense as could be.
;;
;; Using ZERO_EXTRACT for LOADBI does not work either because a
ZERO_EXTRACT
;; may not be used as lvalue which is needed when loading a bit.
;; The ZERO_EXTRACT on the left side would assume that the register that
;; is to be loaded already lives, because the ZERO_EXTRACT affects
;; just some bits, not all.
;;
;; Note that a LOADBI in this context is a load together with a
shift left
;; and a BPOS is a shift right i.e. equivalent to some ZERO_EXTRACT.
;;
;; The tag in a BPOS may also be a CONST_INT
;; in which case the bit's position is a compile time constant.
;;
;; The output mode of a BPOS (M1) needs not to be BI, it may also be
;; some wider mode like QI or SI. A wider mode represents a BPOS that is
;; zero extended to that mode. The input mode's bit size must be at most
;; one plus the bit's position (8 for a SYMBOL_REF).
;;
;; BPOS Operands
;; =============
;;
;; Next, I introduced BPOS operands, i.e. a bpos_operand predicate and
;; BPOS constraints. Thereby, I can use `bpos_operand' in insn
;; predicates and there is no need to write down the very BPOS
;; representation. That technique hides the internal BPOS representation
;; from the machine description.
;;
;; A BPOS operand is one of these:
;; (REG:BI)
;; (SUBREG ((REG:BI) 0))
;; (BPOS ((REG) (CONST_INT)))
;; (BPOS ((REG) (SYMBOL_REF)))
;; (SUBREG (BPOS ((REG) (CONST_INT)) 0))
;; (SUBREG (BPOS ((REG) (SYMBOL_REF)) 0))
;; with the bit's position in REG and SUBREG is 0.
;;
;; RELOAD pass
;; ===========
;;
;; Introducing bpos_operand implies the need to reload them.
;; Reloading a BPOS should not reload the BPOS as a whole but only the
;; BPOS's register. The patch is done by MAP_RELOADS in `find_reloads'
;; which ends up in `tric_map_reloads'.
;;
;; BPOS pass
;; =========
;;
;; A complete new pass had been introduced. The load of a bit is
;; emitted a usual
;;
;; # (set (reg:BI 100)
;; # (mem:BI (symbol_ref:SI ("?the_bit"))))
;;
;; These insns get splitted into the loadbi and bpos insns mentioned
above.
;; The bpos pass had been placed before combine, i.e. somewhere
before life
;; analysis but after the loop optimizer.
;;
;; After all loadbi insns are split, the insns are traveled again to
insert
;; the newly generated BPOSs into insns that support bpos_operand.
;; Amongst them are jumps, some shifts, extracts, bit operations, etc.
;;
;; In order to reduce the number of patterns, operands that encode
;; operations that are equivalent to some BPOS are canonicalized to
;; bpos_operands, e.g. shifting a register right by 4 and then and-ing
;; with 1 and finally zero extending to SI will become
;; (bpos:SI (reg) (4)). Canonicalizing includes ZERO_EXTRACT rtx which
;; are mapped to BPOS, too.
;;
;; The hook is placed in toplev.c, the worker function is
`tric_insert_bpos'.
;;
;; COMBINE pass
;; ============
;;
;; The combiner puts insns together. The result is stuff like
;; (bpos:SI (not:SI (reg)) (pos)) which must be canonicalized to
;; (subreg:SI (not:BI (bpos:BI (reg) (pos))) 0) so that the resulting
;; rtx will fit into bpos_operand insns. The hook is in combine.c
;; and the worker function is `tric_replace_for_combine'.
;;
;; Own enum rtx_code for BPOS/LOADBI
;; =================================
;;
;; Introducing new rtx_code requires minor and straight forward
changes in
;; ./gcc
;; rtl.def: DEF_RTL_EXPR for BPOS and LOADBI
;; DEF_RTL_EXPR(BPOS, "bpos", "ee", '2')
;; DEF_RTL_EXPR(LOADBI, "loadbi", "ee", '2')
;; simplify-rtx.c::simplify_binary_operation()
;; return 0 for BPOS and LOADBI
;; ...
and one of the insn is
(define_insn "*iorsi3.bit0.nbinopn"
[(set (match_operand:SI 0 "data_register_operand" "=d")
(ior:SI (and:SI (not:SI (subreg:SI (match_operator:BI 1
"accumulate_operator"
[(not:BI (match_operand:BI 2 "bpos_operand" "xd"))
(match_operand:BI 3 "bpos_operand" "xd")]) 0))
(const_int 1))
(match_operand:SI 4 "data_register_operand" "0")))]
"XOR != GET_CODE (operands[1])"
"or.%D1.t\t%0, %B2, %B3"
[(set_attr "pipe" "ip")])
The second not might come from the combine hook that transformed
things like (bpos:BI (xor (reg 4) 2) to (not:BI (bpos (reg 2))).
(bpos (reg 2)) is matched by "bpos_operand" and reload reloads just
the reg if it must spill and not the bpos as a whole.
>> A second use case of such a hook could look as follows: Imagine a
>> combined pattern that does not match but would be legal if the backend
>> knew that some register contains some specific value like, e.g.,
>> non-negative, zero-or-storeflagvalue, combiner has proved that some
>> bits will always be set or always be cleared;
>
> You can use nonzero_bits or num_signbit_copies in define_splits. In
> _this_ case, a define_insn_or_split doesn't really help, I agree with
> Joern on this. :)
Yes. It's all inside of combine. May the second or first insn
generated by split be no-op insn? What, if the split is in need of a
new pseudo? There may be left-over regs from combine as is tries to
find split points, but that means when combine inserts an SI and you
need a DI you are stuck. I hat that case and couldn't find a solution.
Note that this was back in 3.4 and I don't intend to break out of the
backend's sandbox if not absolutely necessary (in the case above I
tried to keep the changes minimal, like a hook in reload).
>> II.
>> Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2.
>>
>> Combine combines A+B and sees costs f(A+B) = 5. This makes combine
>> reject the pattern, not looking deeper.
>
> Does it really do that? I see nothing in combine_instructions which
> would prevent going deeper.
The costs. I see that some passes treat cost=0 special. But gccint
doesn't say a word about it so from the internals you get
interpretation like: "0 will cost nothing". Again, I don't like to use
features that are not documented on gccint level as they may change
over time like "there is this a global, non-gccint-documented variable
that in a specific pass contains a value that I can make use of". As
backend programmer I must have an interface (description) to rely on
and prefer stability over hacking.