https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328
--- Comment #2 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- (In reply to Kewen Lin from comment #1) > Created attachment 50715 [details] > ira:consider matching cstr in all alternatives > > With little understanding on ira, I am not quite sure this patch is on the > reasonable direction. It aims to check the matching constraint in all > alternatives, if there is one alternative with matching constraint and > matches the current preferred regclass, it will record the output operand > number and further create one copy for it. Normally it can get the priority > against shuffle copies and the matching constraint will get satisfied with > higher possibility, reload doesn't create extra copies to meet the matching > constraint or the desirable register class when it has to. > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as > shuffle copies, and later any of A,B,C,D gets assigned by one hardware > register which is a VSX register but not a FP register, which means it has > to pay costs once we can NOT go with VSX alternatives, so at that time we > can increase the freq for the remaining copies related to this, once the > matching constraint gets satisfied further, there aren't any extra costs to > pay. This idea seems a bit complicated in the current framework, so the > proposed patch aggressively emphasizes the matching constraint at the time > of creating copies. > > FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with > Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r > +2.51%, no remarkable degradation is observed. Thank you for working on this issue. The current implementation of ira_get_dup_out_num was specifically tuned for better register allocation for x86-64 div insns. Your patch definitely improves code for power9 and I would say significantly (congratulations!). The patch you proposed makes me think that it might work for major targets as well. I would prefer to avoid introducing new parameter because there are too many of them already and its description is cryptic. It would be nice if you benchmark the patch on x86-64 too, If there is no overall degradation with new behaviour we could remove the parameter and make the new behaviour as a default. If it is not, well we will keep the parameter. As for the patch itself, I don't like some variable names. Sorry. Could you use op_regno, out_regno, and present_alt instead of op_no, out_no, tot. Please, in general use longer variable names reflecting their purpose as GCC developers reads code in many times more than writing it.