Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

Vineet Gupta Mon, 14 Aug 2023 17:35:29 -0700


On 8/11/23 17:04, Jeff Law wrote:

I'm wondering (naively) if there is some way to tune this - for agiven backend. In general it would make sense to do the replacement,but not if the cost changes (e.g. consts could be embedded in x86insn freely, but not for RISC-V where this is costly and if somethingis split, it might been intentional.
I'm not immediately aware of a way to tune.
When it comes to tuning, the toplevel questions are do we have any ofthe info we need to tune at the point where the transformation occurs.The two most obvious pieces here would be loop info an register pressure.
ie, do we have enough loop structure to know if the def is at ashallower loop nest than the use. There's a reasonable chance we havethis information as my recollection is this analysis is done fairlyearly in IRA.
But that means we likely don't have any sense of register pressure atthe points between the def and use. So the most useful metric fortuning isn't really available.

I'd argue that even if the register pressure were high, in some cases,there's just no way around it and RA needs to honor what the backend didapriori (split in this case), otherwise we end up with something whichdoesn't compute literally and leads to ICE. I'm puzzled that in thiscase, intentional implementation is getting in the way. So while I don'tcare about the -0.0 case in itself, it seems with the current frameworkwe can't just achieve the results, other that the roundabout way ofpeephole2 you alluded to.

The one thing that stands out is we don't do this transformation atall when register pressure sensitive scheduling is enabled. And wereally should be turning that on by default. Our data shows registerpressure sensitive scheduling is about a 6-7% cycle improvement onx264 as it avoids spilling in those key satd loops.
 /* Don't move insns if live range shrinkage or register
     pressure-sensitive scheduling were done because it will not
     improve allocation but likely worsen insn scheduling.  */
  if (optimize
      && !flag_live_range_shrinkage
      && !(flag_sched_pressure && flag_schedule_insns))
    combine_and_move_insns ();
So you might want to look at register pressure sensitive schedulingfirst. If you go into x264_r from specint and look atx264_pixel_satd_8x4. First verify the loops are fully unrolled. Ifthey are, then look for 32bit loads/stores into the stack. If youhave them, then you're spilling and getting crappy performance. Usingregister pressure sensitive scheduling should help significantly.


Is that -fira-loop-pressure ?

We've certainly seen that internally. The plan was to submit a patchto make register pressure sensitive scheduling the default when thescheduler is enabled. We just haven't pushed on it. If you canverify that you're seeing spilling as well, then it'd certainlybolster the argument that register-pressure-sensitive-scheduling isdesirable.

I can confirm that the loop is fully unrolled and there's a zillionstack spills there for intermediate computes (-Ofast-march=rv64gc_zba_zbb_zbs, no V in that build).


Thx,
-Vineet

Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

Reply via email to