Hi Vladimir,
on 2021/6/30 下午11:24, Vladimir Makarov wrote:
>
> On 2021-06-28 2:26 a.m., Kewen.Lin wrote:
>> Hi!
>>
>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> PR100328 has some details about this issue, I am trying to
>>> brief it here. In the hottest function LBM_performStreamCollideTRT
>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
>>> (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style
>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
>>> class have 64 registers whose foregoing 32 ones make up the
>>> whole FLOAT_REG. There are some differences for these two
>>> flavors, taking "*fma4_fpr" as example:
>>>
>>> (define_insn "*fma4_fpr"
>>> [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
>>> (fma:SFDF
>>> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
>>> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
>>> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]
>>>
>>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
>>> // (f/d) => A floating point register, aka. FLOAT_REG.
>>>
>>> So for VSX_REG, we only have the destructive form, when VSX_REG
>>> alternative being used, the operand 2 or operand 3 is required
>>> to be the same as operand 0. reload has to take care of this
>>> constraint and create some non-free register copies if required.
>>>
>>> Assuming one fma insn looks like:
>>> op0 = FMA (op1, op2, op3)
>>>
>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
>>> IRA simply creates three shuffle copies for them (here the operand
>>> order matters, since with the same freq, the one with smaller number
>>> takes preference), but IMO both op2 and op3 should take higher priority
>>> in copy queue due to the matching constraint.
>>>
>>> I noticed that there is one function ira_get_dup_out_num, which meant
>>> to create this kind of constraint copy, but the below code looks to
>>> refuse to create if there is an alternative which has valid regclass
>>> without spilled need.
>>>
>>> default:
>>> {
>>> enum constraint_num cn = lookup_constraint (str);
>>> enum reg_class cl = reg_class_for_constraint (cn);
>>> if (cl != NO_REGS
>>> && !targetm.class_likely_spilled_p (cl))
>>> goto fail
>>>
>>> ...
>>>
>>> I cooked one patch attached to make ira respect this kind of matching
>>> constraint guarded with one parameter. As I stated in the PR, I was
>>> not sure this is on the right track. The RFC patch is to check the
>>> matching constraint in all alternatives, if there is one alternative
>>> with matching constraint and matches the current preferred regclass
>>> (or best of allocno?), it will record the output operand number and
>>> further create one constraint copy for it. Normally it can get the
>>> priority against shuffle copies and the matching constraint will get
>>> satisfied with higher possibility, reload doesn't create extra copies
>>> to meet the matching constraint or the desirable register class when
>>> it has to.
>>>
>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
>>> as shuffle copies, and later any of A,B,C,D gets assigned by one
>>> hardware register which is a VSX register (VSX_REG) but not a FP
>>> register (FLOAT_REG), which means it has to pay costs once we can NOT
>>> go with VSX alternatives, so at that time it's important to respect
>>> the matching constraint then we can increase the freq for the remaining
>>> copies related to this (A/B, A/C, A/D). This idea requires some side
>>> tables to record some information and seems a bit complicated in the
>>> current framework, so the proposed patch aggressively emphasizes the
>>> matching constraint at the time of creating copies.
>>>
>> Comparing with the original patch (v1), this patch v3 has
>> considered: (this should be v2 for this mail list, but bump
>> it to be consistent as PR's).
>>
>> - Excluding the case where for one preferred register class
>> there can be two or more alternatives, one of them has the
>> matching constraint, while another doesn't have. So for
>> the given operand, even if it's assigned by a hardware reg
>> which doesn't meet the matching constraint, it can simply
>> use the alternative which doesn't have matching constraint
>> so no register move is needed. One typical case is
>> define_insn *mov_internal2 on rs6000. So we
>> shouldn't create constraint copy for it.
>>
>> - The possible free register move in the same register class,
>> disable this if so since the register move to meet the
>> constraint is considered as free.
>>
>> - Making it on by default, suggested by Segher & Vladimir, we
>> hope to get rid of the parameter if the benchmarking result
>> looks good on major targets.
>>
>> - Tweaking cost when either of matching constraint two sides
>> is hardware register. Before this patch, the