Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

James Greenhalgh Mon, 24 Nov 2014 06:29:32 -0800

On Fri, Nov 14, 2014 at 02:43:12AM +0000, Bin.Cheng wrote:
> On Fri, Nov 7, 2014 at 7:13 AM, Jeff Law <l...@redhat.com> wrote:
> > On 11/05/14 02:30, Bin.Cheng wrote:
> >> Thanks very much for reviewing.  I refined the patch according to your
> >> comments.  Also made two small changes: a)  skip breaking dependency
> >> between memory access and the corresponding base-reg modifying
> >> instruction.  This feature doesn't help load/store pair that much and
> >> only increases compilation time.  b) a minor bug fix in arm backend
> >> hook when calculating priority for memory accesses with minus offset.
> >>
> >> I am running bootstrap/test against latest trunk, and will adapt
> >> ChangeLog once get approved generally.  So how about this one?
> >
> > OK for the trunk.  Thanks for your patience.
> >
> > Jeff
> >
> 
> Thanks for reviewing.  For the record, attached patch is committed.
> The only update is I disabled the pass if peephole2 isn't in effect
> because it relies on peephole2 to do real fusion work.


Hi Bin,

The documentation for TARGET_SCHED_FUSION_PRIORITY doesn't look
right to me (see: https://gcc.gnu.org/onlinedocs/gccint/Scheduling.html ).

I think you'll need to wrap your examples in something like @smallexample
tags if you want to maintain their formatting.

Thanks,
James

> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def    (revision 217474)
> +++ gcc/target.def    (working copy)
> @@ -1526,6 +1526,79 @@ parallelism required in output calculations chain.
>  int, (unsigned int opc, machine_mode mode),
>  hook_int_uint_mode_1)
>  
> +/* The following member value is a function that returns priority for
> +   fusion of each instruction via pointer parameters.  */
> +DEFHOOK
> +(fusion_priority,
> +"This hook is called by scheduling fusion pass.  It calculates fusion\n\
> +priorities for each instruction passed in by parameter.  The priorities\n\
> +are returned via pointer parameters.\n\
> +\n\
> +@var{insn} is the instruction whose priorities need to be calculated.\n\
> +@var{max_pri} is the maximum priority can be returned in any cases.\n\
> +@var{fusion_pri} is the pointer parameter through which @var{insn}'s\n\
> +fusion priority should be calculated and returned.\n\
> +@var{pri} is the pointer parameter through which @var{insn}'s priority\n\
> +should be calculated and returned.\n\
> +\n\
> +Same @var{fusion_pri} should be returned for instructions which should\n\
> +be scheduled together.  Different @var{pri} should be returned for\n\
> +instructions with same @var{fusion_pri}.  @var{fusion_pri} is the major\n\
> +sort key, @var{pri} is the minor sort key.  All instructions will be\n\
> +scheduled according to the two priorities.  All priorities calculated\n\
> +should be between 0 (exclusive) and @var{max_pri} (inclusive).  To avoid\n\
> +false dependencies, @var{fusion_pri} of instructions which need to be\n\
> +scheduled together should be smaller than @var{fusion_pri} of irrelevant\n\
> +instructions.\n\
> +\n\
> +Given below example:\n\
> +\n\
> +    ldr r10, [r1, 4]\n\
> +    add r4, r4, r10\n\
> +    ldr r15, [r2, 8]\n\
> +    sub r5, r5, r15\n\
> +    ldr r11, [r1, 0]\n\
> +    add r4, r4, r11\n\
> +    ldr r16, [r2, 12]\n\
> +    sub r5, r5, r16\n\
> +\n\
> +On targets like ARM/AArch64, the two pairs of consecutive loads should be\n\
> +merged.  Since peephole2 pass can't help in this case unless consecutive\n\
> +loads are actually next to each other in instruction flow.  That's where\n\
> +this scheduling fusion pass works.  This hook calculates priority for each\n\
> +instruction based on its fustion type, like:\n\
> +\n\
> +    ldr r10, [r1, 4]  ; fusion_pri=99,  pri=96   \n\
> +    add r4, r4, r10   ; fusion_pri=100, pri=100  \n\
> +    ldr r15, [r2, 8]  ; fusion_pri=98,  pri=92   \n\
> +    sub r5, r5, r15   ; fusion_pri=100, pri=100  \n\
> +    ldr r11, [r1, 0]  ; fusion_pri=99,  pri=100  \n\
> +    add r4, r4, r11   ; fusion_pri=100, pri=100  \n\
> +    ldr r16, [r2, 12] ; fusion_pri=98,  pri=88   \n\
> +    sub r5, r5, r16   ; fusion_pri=100, pri=100  \n\
> +\n\
> +Scheduling fusion pass then sorts all ready to issue instructions 
> according\n\
> +to the priorities.  As a result, instructions of same fusion type will be\n\
> +pushed together in instruction flow, like:\n\
> +\n\
> +    ldr r11, [r1, 0]\n\
> +    ldr r10, [r1, 4]\n\
> +    ldr r15, [r2, 8]\n\
> +    ldr r16, [r2, 12]\n\
> +    add r4, r4, r10\n\
> +    sub r5, r5, r15\n\
> +    add r4, r4, r11\n\
> +    sub r5, r5, r16\n\
> +\n\
> +Now peephole2 pass can simply merge the two pairs of loads.\n\
> +\n\
> +Since scheduling fusion pass relies on peephole2 to do real fusion\n\
> +work, it is only enabled by default when peephole2 is in effect.\n\
> +\n\
> +This is firstly introduced on ARM/AArch64 targets, please refer to\n\
> +the hook implementation for how different fusion types are supported.",
> +void, (rtx_insn *insn, int max_pri, int *fusion_pri, int *pri), NULL)
> +
>  HOOK_VECTOR_END (sched)
>  
>  /* Functions relating to OpenMP and Cilk Plus SIMD clones.  */

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Reply via email to