On Fri, Nov 14, 2014 at 02:43:12AM +0000, Bin.Cheng wrote: > On Fri, Nov 7, 2014 at 7:13 AM, Jeff Law <l...@redhat.com> wrote: > > On 11/05/14 02:30, Bin.Cheng wrote: > >> Thanks very much for reviewing. I refined the patch according to your > >> comments. Also made two small changes: a) skip breaking dependency > >> between memory access and the corresponding base-reg modifying > >> instruction. This feature doesn't help load/store pair that much and > >> only increases compilation time. b) a minor bug fix in arm backend > >> hook when calculating priority for memory accesses with minus offset. > >> > >> I am running bootstrap/test against latest trunk, and will adapt > >> ChangeLog once get approved generally. So how about this one? > > > > OK for the trunk. Thanks for your patience. > > > > Jeff > > > > Thanks for reviewing. For the record, attached patch is committed. > The only update is I disabled the pass if peephole2 isn't in effect > because it relies on peephole2 to do real fusion work.
Hi Bin, The documentation for TARGET_SCHED_FUSION_PRIORITY doesn't look right to me (see: https://gcc.gnu.org/onlinedocs/gccint/Scheduling.html ). I think you'll need to wrap your examples in something like @smallexample tags if you want to maintain their formatting. Thanks, James > Index: gcc/target.def > =================================================================== > --- gcc/target.def (revision 217474) > +++ gcc/target.def (working copy) > @@ -1526,6 +1526,79 @@ parallelism required in output calculations chain. > int, (unsigned int opc, machine_mode mode), > hook_int_uint_mode_1) > > +/* The following member value is a function that returns priority for > + fusion of each instruction via pointer parameters. */ > +DEFHOOK > +(fusion_priority, > +"This hook is called by scheduling fusion pass. It calculates fusion\n\ > +priorities for each instruction passed in by parameter. The priorities\n\ > +are returned via pointer parameters.\n\ > +\n\ > +@var{insn} is the instruction whose priorities need to be calculated.\n\ > +@var{max_pri} is the maximum priority can be returned in any cases.\n\ > +@var{fusion_pri} is the pointer parameter through which @var{insn}'s\n\ > +fusion priority should be calculated and returned.\n\ > +@var{pri} is the pointer parameter through which @var{insn}'s priority\n\ > +should be calculated and returned.\n\ > +\n\ > +Same @var{fusion_pri} should be returned for instructions which should\n\ > +be scheduled together. Different @var{pri} should be returned for\n\ > +instructions with same @var{fusion_pri}. @var{fusion_pri} is the major\n\ > +sort key, @var{pri} is the minor sort key. All instructions will be\n\ > +scheduled according to the two priorities. All priorities calculated\n\ > +should be between 0 (exclusive) and @var{max_pri} (inclusive). To avoid\n\ > +false dependencies, @var{fusion_pri} of instructions which need to be\n\ > +scheduled together should be smaller than @var{fusion_pri} of irrelevant\n\ > +instructions.\n\ > +\n\ > +Given below example:\n\ > +\n\ > + ldr r10, [r1, 4]\n\ > + add r4, r4, r10\n\ > + ldr r15, [r2, 8]\n\ > + sub r5, r5, r15\n\ > + ldr r11, [r1, 0]\n\ > + add r4, r4, r11\n\ > + ldr r16, [r2, 12]\n\ > + sub r5, r5, r16\n\ > +\n\ > +On targets like ARM/AArch64, the two pairs of consecutive loads should be\n\ > +merged. Since peephole2 pass can't help in this case unless consecutive\n\ > +loads are actually next to each other in instruction flow. That's where\n\ > +this scheduling fusion pass works. This hook calculates priority for each\n\ > +instruction based on its fustion type, like:\n\ > +\n\ > + ldr r10, [r1, 4] ; fusion_pri=99, pri=96 \n\ > + add r4, r4, r10 ; fusion_pri=100, pri=100 \n\ > + ldr r15, [r2, 8] ; fusion_pri=98, pri=92 \n\ > + sub r5, r5, r15 ; fusion_pri=100, pri=100 \n\ > + ldr r11, [r1, 0] ; fusion_pri=99, pri=100 \n\ > + add r4, r4, r11 ; fusion_pri=100, pri=100 \n\ > + ldr r16, [r2, 12] ; fusion_pri=98, pri=88 \n\ > + sub r5, r5, r16 ; fusion_pri=100, pri=100 \n\ > +\n\ > +Scheduling fusion pass then sorts all ready to issue instructions > according\n\ > +to the priorities. As a result, instructions of same fusion type will be\n\ > +pushed together in instruction flow, like:\n\ > +\n\ > + ldr r11, [r1, 0]\n\ > + ldr r10, [r1, 4]\n\ > + ldr r15, [r2, 8]\n\ > + ldr r16, [r2, 12]\n\ > + add r4, r4, r10\n\ > + sub r5, r5, r15\n\ > + add r4, r4, r11\n\ > + sub r5, r5, r16\n\ > +\n\ > +Now peephole2 pass can simply merge the two pairs of loads.\n\ > +\n\ > +Since scheduling fusion pass relies on peephole2 to do real fusion\n\ > +work, it is only enabled by default when peephole2 is in effect.\n\ > +\n\ > +This is firstly introduced on ARM/AArch64 targets, please refer to\n\ > +the hook implementation for how different fusion types are supported.", > +void, (rtx_insn *insn, int max_pri, int *fusion_pri, int *pri), NULL) > + > HOOK_VECTOR_END (sched) > > /* Functions relating to OpenMP and Cilk Plus SIMD clones. */