Paolo Bonzini schrieb: > On 11/10/2010 11:58 AM, Georg Lay wrote: >> In the old 3.4.x (private port) I introduced a target hook in combine, >> just prior to where recog_for_combine gets called. The hook did some >> canonicalization of rtx and thereby considerably reduced the number of >> patterns that would have been necessary without that hook. It >> transformed some unspecs. > > How is that related to delegitimize_address?
It's nothing to do with delegitimize address. Ok, let me tell the storyan go into the mess. Some users wanted to have a bit data type in C and to write down variables that represent a bit in memory or in auto or as parameter. (I strongly recommended not to do such hacking in a compiler because it is not within C and for reasons you all know. For reasons you also know, the project was agreed ...). There was already some hacking in gcc when I started and users complained about superfluous zero-extract here and there. No bloat, just a trade-off compiler vs. asm. tricore is capable of loading a byte from absolute address space (LD.BU) and has instructions to operate on any bit (and, or, xor, jmp, compare-accumulate-bitop, extract, insert, etc.). New type _bit was mapped to BImode. To see where the inconveniences are, let's have a look at a comment ;; The TriCore instruction set has many instructions that deal with ;; bit operands. These bits are then given in two operands: one operand ;; is the register that contains the bit, the other operand specifies ;; the bit's position within that register as an immediate, i.e. the ;; bit's position must be a runtime constant. ;; ;; A new relocation type `bpos' for the bit's position within a byte had ;; been introduced, enabling to pack bits in memory without the need to ;; pack them in bitfields. ;; ;; Such an instruction sequence may look like ;; ;; ld.bu %d0, the_bit # insn A ;; jz.t %d0, bpos:the_bit, .some_label # insn A ;; ;; The compiler could, of course, emit such instruction seqences. ;; However, emitting such sequences "en bloc" has several drawbacks: ;; ;; -- every time we need a bit the bit must be read ;; -- emitting instruction en bloc knocks out the scheduler ;; -- when writing all the cases as combiner patterns the number of ;; insn patterns would explode ;; ;; Therefore, a bit could have been described as BImode, which would ;; lead to the following instruction sequence ;; ;; ld.bu %d0, the_bit # insn A ;; extr.u %d0, %d0,bpos:the_bit, 1 # insn A ;; jz.t %d0, 0, .some_label # insn B ;; ;; I.e. a bit is represented as BImode and therefore always resides ;; in the LSB. The advantage is that the bit now lives in a register ;; and can be reused. The disadvantage is that ;; ;; -- often, there is a superfluous instruction (extr.u) ;; -- the scheduler cannot schedule ld.bu away from extr.u ;; -- for combiner patterns the same as above applies ;; ;; BPOS representation ;; =================== ;; ;; Thus, we need a way to tag a register with the bit's position. ;; I use the new LOADBI and BPOS rtx_code, here in an assembler dump from gcc ;; which shows the 1-to-1 correspondence between insn and machine instruction: ;; ;; # (set (reg:SI 0) ;; # (loadbi:SI (mem:BI (symbol_ref:SI ("?the_bit"))) ;; # (symbol_ref:SI ("?the_bit")))) ;; ld.bu %d0, the_bit # insn A ;; ;; # (set (pc) ;; # (if_then_else (eq (bpos:BI (reg:SI 0) ;; # (symbol_ref:SI ("?the_bit"))) ;; # (const_int 0)) ;; # (label_ref 42) ;; # (pc))) ;; jz.t %d0, bpos:the_bit, .some_label # insn B ;; ;; The sole SYMBOL_REFs are the tags. ;; The reason to use own RTX code and not UNSPEC is because the combiner ;; does not like UNSPEC and the code would not be as dense as could be. ;; ;; Using ZERO_EXTRACT for LOADBI does not work either because a ZERO_EXTRACT ;; may not be used as lvalue which is needed when loading a bit. ;; The ZERO_EXTRACT on the left side would assume that the register that ;; is to be loaded already lives, because the ZERO_EXTRACT affects ;; just some bits, not all. ;; ;; Note that a LOADBI in this context is a load together with a shift left ;; and a BPOS is a shift right i.e. equivalent to some ZERO_EXTRACT. ;; ;; The tag in a BPOS may also be a CONST_INT ;; in which case the bit's position is a compile time constant. ;; ;; The output mode of a BPOS (M1) needs not to be BI, it may also be ;; some wider mode like QI or SI. A wider mode represents a BPOS that is ;; zero extended to that mode. The input mode's bit size must be at most ;; one plus the bit's position (8 for a SYMBOL_REF). ;; ;; BPOS Operands ;; ============= ;; ;; Next, I introduced BPOS operands, i.e. a bpos_operand predicate and ;; BPOS constraints. Thereby, I can use `bpos_operand' in insn ;; predicates and there is no need to write down the very BPOS ;; representation. That technique hides the internal BPOS representation ;; from the machine description. ;; ;; A BPOS operand is one of these: ;; (REG:BI) ;; (SUBREG ((REG:BI) 0)) ;; (BPOS ((REG) (CONST_INT))) ;; (BPOS ((REG) (SYMBOL_REF))) ;; (SUBREG (BPOS ((REG) (CONST_INT)) 0)) ;; (SUBREG (BPOS ((REG) (SYMBOL_REF)) 0)) ;; with the bit's position in REG and SUBREG is 0. ;; ;; RELOAD pass ;; =========== ;; ;; Introducing bpos_operand implies the need to reload them. ;; Reloading a BPOS should not reload the BPOS as a whole but only the ;; BPOS's register. The patch is done by MAP_RELOADS in `find_reloads' ;; which ends up in `tric_map_reloads'. ;; ;; BPOS pass ;; ========= ;; ;; A complete new pass had been introduced. The load of a bit is ;; emitted a usual ;; ;; # (set (reg:BI 100) ;; # (mem:BI (symbol_ref:SI ("?the_bit")))) ;; ;; These insns get splitted into the loadbi and bpos insns mentioned above. ;; The bpos pass had been placed before combine, i.e. somewhere before life ;; analysis but after the loop optimizer. ;; ;; After all loadbi insns are split, the insns are traveled again to insert ;; the newly generated BPOSs into insns that support bpos_operand. ;; Amongst them are jumps, some shifts, extracts, bit operations, etc. ;; ;; In order to reduce the number of patterns, operands that encode ;; operations that are equivalent to some BPOS are canonicalized to ;; bpos_operands, e.g. shifting a register right by 4 and then and-ing ;; with 1 and finally zero extending to SI will become ;; (bpos:SI (reg) (4)). Canonicalizing includes ZERO_EXTRACT rtx which ;; are mapped to BPOS, too. ;; ;; The hook is placed in toplev.c, the worker function is `tric_insert_bpos'. ;; ;; COMBINE pass ;; ============ ;; ;; The combiner puts insns together. The result is stuff like ;; (bpos:SI (not:SI (reg)) (pos)) which must be canonicalized to ;; (subreg:SI (not:BI (bpos:BI (reg) (pos))) 0) so that the resulting ;; rtx will fit into bpos_operand insns. The hook is in combine.c ;; and the worker function is `tric_replace_for_combine'. ;; ;; Own enum rtx_code for BPOS/LOADBI ;; ================================= ;; ;; Introducing new rtx_code requires minor and straight forward changes in ;; ./gcc ;; rtl.def: DEF_RTL_EXPR for BPOS and LOADBI ;; DEF_RTL_EXPR(BPOS, "bpos", "ee", '2') ;; DEF_RTL_EXPR(LOADBI, "loadbi", "ee", '2') ;; simplify-rtx.c::simplify_binary_operation() ;; return 0 for BPOS and LOADBI ;; ... and one of the insn is (define_insn "*iorsi3.bit0.nbinopn" [(set (match_operand:SI 0 "data_register_operand" "=d") (ior:SI (and:SI (not:SI (subreg:SI (match_operator:BI 1 "accumulate_operator" [(not:BI (match_operand:BI 2 "bpos_operand" "xd")) (match_operand:BI 3 "bpos_operand" "xd")]) 0)) (const_int 1)) (match_operand:SI 4 "data_register_operand" "0")))] "XOR != GET_CODE (operands[1])" "or.%D1.t\t%0, %B2, %B3" [(set_attr "pipe" "ip")]) The second not might come from the combine hook that transformed things like (bpos:BI (xor (reg 4) 2) to (not:BI (bpos (reg 2))). (bpos (reg 2)) is matched by "bpos_operand" and reload reloads just the reg if it must spill and not the bpos as a whole. >> A second use case of such a hook could look as follows: Imagine a >> combined pattern that does not match but would be legal if the backend >> knew that some register contains some specific value like, e.g., >> non-negative, zero-or-storeflagvalue, combiner has proved that some >> bits will always be set or always be cleared; > > You can use nonzero_bits or num_signbit_copies in define_splits. In > _this_ case, a define_insn_or_split doesn't really help, I agree with > Joern on this. :) Yes. It's all inside of combine. May the second or first insn generated by split be no-op insn? What, if the split is in need of a new pseudo? There may be left-over regs from combine as is tries to find split points, but that means when combine inserts an SI and you need a DI you are stuck. I hat that case and couldn't find a solution. Note that this was back in 3.4 and I don't intend to break out of the backend's sandbox if not absolutely necessary (in the case above I tried to keep the changes minimal, like a hook in reload). >> II. >> Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2. >> >> Combine combines A+B and sees costs f(A+B) = 5. This makes combine >> reject the pattern, not looking deeper. > > Does it really do that? I see nothing in combine_instructions which > would prevent going deeper. The costs. I see that some passes treat cost=0 special. But gccint doesn't say a word about it so from the internals you get interpretation like: "0 will cost nothing". Again, I don't like to use features that are not documented on gccint level as they may change over time like "there is this a global, non-gccint-documented variable that in a specific pass contains a value that I can make use of". As backend programmer I must have an interface (description) to rely on and prefer stability over hacking.