RE: [PATCH] MIPS: Add support for P6600

2018-06-13 Thread Robert Suchanek
Hi Matthew,

> With one more change to add another comment as below, this is OK to
> commit.
> 
> > @@ -18957,7 +19039,10 @@ mips_reorg_process_insns (void)
> >  sequence and replace it with the delay slot instruction
> >  then the jump to clear the forbidden slot hazard.  */
> 
> This bit does need the comment extending.  Add this:
> 
> For the P6600, this optimisation solves the performance penalty associated
> with BALC followed by a delay slot branch.  We do not set fs_delay as we
> do not want the full logic of a forbidden slot; the penalty exists only
> against branches not the full class of forbidden slot instructions.
> 
> >
> > - if (fs_delay)
> > + if (fs_delay || (TUNE_P6600
> > +  && TARGET_CB_MAYBE
> > +  && mips_classify_branch_p6600 (insn)
> > + == UC_BALC))
> > {
> >   /* Search onwards from the current position looking for
> >  a SEQUENCE.  We are looking for pipeline hazards
> here

Added and committed as r261570.

Regards,
Robert


RE: [PATCH] MIPS: Add support for P6600

2018-06-13 Thread Matthew Fortune
Robert Suchanek  writes:
> As already discussed, the link to the P6600 doesn't
> appear to be referenced on mips.com but reachable
> when searching for 'p6600':
> 
> https://www.mips.com/downloads/p6600-multiprocessing-programmers-guide/

Thanks, good spot.

> gcc/ChangeLog:
> 
> 2018-06-12  Matthew Fortune  
> Prachi Godbole  
> 
>   * config/mips/mips-cpus.def: Define P6600.
>   * config/mips/mips-tables.opt: Regenerate.
>   * config/mips/mips.c (mips_ucbranch_type): New enum.
>   (mips_rtx_cost_data): Add support for P6600.
>   (mips_issue_rate): Likewise.
>   (mips_multipass_dfa_lookahead): Likewise.
>   (mips_avoid_hazard): Likewise.
>   (mips_reorg_process_insns): Likewise.
>   (mips_classify_branch_p6600): New function.
>   * config/mips/mips.h (TUNE_P6600): New define.
>   (MIPS_ISA_LEVEL_SPEC): Infer mips64r6 from p6600.
>   (ENABLE_LD_ST_PAIRS): Enable load/store bonding for p6600.
>   * config/mips/mips.md: Include p6600.md.
>   (processor): Add p6600.
>   * config/mips/p6600.md: New file.
>   * doc/invoke.texi: Add p6600 to supported architectures.

With one more change to add another comment as below, this is OK to
commit.

> @@ -18957,7 +19039,10 @@ mips_reorg_process_insns (void)
>sequence and replace it with the delay slot instruction
>then the jump to clear the forbidden slot hazard.  */

This bit does need the comment extending.  Add this:

For the P6600, this optimisation solves the performance penalty
associated with BALC followed by a delay slot branch.  We do not set
fs_delay as we do not want the full logic of a forbidden slot; the
penalty exists only against branches not the full class of forbidden
slot instructions.

> 
> -   if (fs_delay)
> +   if (fs_delay || (TUNE_P6600
> +&& TARGET_CB_MAYBE
> +&& mips_classify_branch_p6600 (insn)
> +   == UC_BALC))
>   {
> /* Search onwards from the current position looking
> for
>a SEQUENCE.  We are looking for pipeline hazards
here

Thanks,
Matthew



RE: [PATCH] MIPS: Add support for P6600

2018-06-12 Thread Robert Suchanek
Hi Matthew,

As already discussed, the link to the P6600 doesn't
appear to be referenced on mips.com but reachable
when searching for 'p6600':

https://www.mips.com/downloads/p6600-multiprocessing-programmers-guide/

I'm resubmitting the whole patch again with updated
ChangeLog.

> >
> > +/* Classifies an unconditional branch of interest for the P6600.  */
> > +
> > +enum mips_ucbranch_type {
> 
> Newline for brace.

Done.

> > +/* Subroutine of mips_avoid_hazard.  We classify unconditional
> > branches
> > +   of interest for the P6600 for performance reasons.  We're
> > interested
> > +   in differentiating BALC from JIC, JIALC and BC.  */
> > +
> > +static enum mips_ucbranch_type
> > +mips_classify_branch_p6600 (rtx_insn *insn) {
> > +  if (!(insn
> > +  && USEFUL_INSN_P (insn)
> > +  && GET_CODE (PATTERN (insn)) != SEQUENCE))
> > +return UC_UNDEFINED;
> 
> Let's switch this around to the following with a comment to say ignore
> sequences because they represent a filled delay slot branch (which
> presumably is not affected by the uArch issue).
> 
>   if (insn
>   || !USEFUL_INSN_P (insn)
>   || GET_CODE (PATTERN (insn)) == SEQUENCE))
> return UC_UNDEFINED;

Switched around and added a comment.
> 
> > +
> > +  if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC.  */
> > +  || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose
> > jump.  */
> > +return UC_OTHER;
> 
> I don't recall what a loose jump was supposed to refer to, presumably
> 'direct jump'.

Left it just as 'BC'.

> 
> >  /* Subroutine of mips_reorg_process_insns.  If there is a hazard
> > between
> > INSN and a previous instruction, avoid it by inserting nops after
> > instruction AFTER.
> > @@ -18699,14 +18744,36 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn
> > *insn, int *hilo_delay,
> >&& GET_CODE (pattern) != ASM_INPUT
> >&& asm_noperands (pattern) < 0)
> >  nops = 1;
> > +  /* The P6600's branch predictor does not handle certain static
> > + sequences of back-to-back jumps well.  Inserting a no-op only
> > + costs space as the dispatch unit will disregard the nop.
> > + Here we handle the cases of back to back unconditional branches
> > + that are inefficent.  */
> > +  else if (TUNE_P6600 && TARGET_CB_MAYBE && !optimize_size
> > +  && ((mips_classify_branch_p6600 (after) == UC_BALC
> > +   && mips_classify_branch_p6600 (insn) == UC_OTHER)
> > +  || (mips_classify_branch_p6600 (insn) == UC_BALC
> > +  && (mips_classify_branch_p6600 (after) == UC_OTHER
> 
> Unnecessary braces on the last clause here.

Removed the braces and change the comment.

> 
> > +nops = 1;
> 
> This appears to show that a BALC with a compact branch (other than BALC)
> is the problem case either before or after the BALC. This is what we need
> to see in a public document. Or at least (re)confirmed as the issue.
> 
> >else
> >  nops = 0;
> >
> >/* Insert the nops between this instruction and the previous one.
> >   Each new nop takes us further from the last hilo hazard.  */
> >*hilo_delay += nops;
> > +
> > +  /* If we're tuning for the P6600, we come across an annoying GCC
> > + assumption that debug information always follows a call.  Move
> > + past any debug information in that case.  */
> > +  rtx_insn *real_after = after;
> > +  if (real_after && nops && CALL_P (real_after))
> > +while (TUNE_P6600 && real_after
> > +  && (NOTE_P (NEXT_INSN (real_after))
> > +  || BARRIER_P (NEXT_INSN (real_after
> > +  real_after = NEXT_INSN (real_after);
> > +
> 
> This looks OK but it seems unnecessary to limit to the P6600.  It is just
> that we have not had hazards on branches before now.

Removed the restriction.

> 
> >while (nops-- > 0)
> > -emit_insn_after (gen_hazard_nop (), after);
> > +emit_insn_after (gen_hazard_nop (), real_after);
> >
> >/* Set up the state for the next instruction.  */
> >*hilo_delay += ninsns;
> > @@ -18716,6 +18783,14 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn
> > *insn, int *hilo_delay,
> >  switch (get_attr_hazard (insn))
> >{
> >case HAZARD_NONE:
> > +   /* For the P6600, flag some unconditional branches as having a
> > +  pseudo-forbidden slot.  This will cause additional nop
> > insertion
> > +  or SEQUENCE breaking as required.  */
> > +   if (TUNE_P6600
> > +   && !optimize_size
> > +   && TARGET_CB_MAYBE
> > +   && mips_classify_branch_p6600 (insn) == UC_OTHER)
> > + *fs_delay = true;
> 
> This appears to be saying that all unconditional branches (except
> BALC) will introduce a performance penalty if followed by any other non-
> delay/non-forbidden slot instruction.

Added a comment that this is for performance reasons.
> 
> > break;
> >
> >case HAZARD_FORBIDDEN_SLOT:
> > @@ -18957,7 +19032,10 @@ mips_reorg_process_insns (void)
> >  sequence and replace it 

RE: [PATCH] MIPS: Add support for P6600

2018-06-11 Thread Matthew Fortune
Robert Suchanek  writes:
> The below adds support for -march=p6600.  It includes
> a new scheduler plus performance tweaks.
> 
> gcc/ChangeLog:
> 
> 2018-06-01  Matthew Fortune  
> Prachi Godbole  
>   * config/mips/mips-cpus.def: Define P6600.
>   * config/mips/mips-tables.opt: Regenerate.
>   * config/mips/mips.c (mips_multipass_dfa_lookahead): Add P6600.
>   * config/mips/mips.h (TUNE_P6600): New define.
>   (MIPS_ISA_LEVEL_SPEC): Infer mips64r6 from p6600.
>   * doc/invoke.texi: Add p6600 to supported architectures.

It seems that mips.com has no longer got any architecture specific
documents available for either i6400/i6500 (which did exist until
recently) nor p6600. There isn't anything particularly unusual
about i6400/i6500 support but this patch for p6600 attempts to deal
with some undocumented micro-architecture details that are not
sufficiently described in the patch to maintain over time nor is
there reference material available.

Despite the fact that I am one of the original authors of the
work, I can't accept this upstream without further information.

Some notes on the patch are below:

> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index bfe64bb..9c77c13 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -198,6 +198,15 @@ enum mips_address_type {
>ADDRESS_SYMBOLIC
>  };
> 
> +/* Classifies an unconditional branch of interest for the P6600.  */
> +
> +enum mips_ucbranch_type {

Newline for brace.

> +  /* May not even be a branch.  */
> +  UC_UNDEFINED,
> +  UC_BALC,
> +  UC_OTHER
> +};
> +
>  /* Macros to create an enumeration identifier for a function
> prototype.  */
>  #define MIPS_FTYPE_NAME1(A, B) MIPS_##A##_FTYPE_##B
>  #define MIPS_FTYPE_NAME2(A, B, C) MIPS_##A##_FTYPE_##B##_##C
> @@ -1127,6 +1136,19 @@ static const struct mips_rtx_cost_data
>  COSTS_N_INSNS (36),   /* int_div_di */
>   2,/* branch_cost */
>   4 /* memory_latency */
> +  },
> +  { /* P6600 */
> +COSTS_N_INSNS (4),/* fp_add */
> +COSTS_N_INSNS (5),/* fp_mult_sf */
> +COSTS_N_INSNS (5),/* fp_mult_df */
> +COSTS_N_INSNS (17),   /* fp_div_sf */
> +COSTS_N_INSNS (17),   /* fp_div_df */
> +COSTS_N_INSNS (5),/* int_mult_si */
> +COSTS_N_INSNS (5),/* int_mult_di */
> +COSTS_N_INSNS (8),/* int_div_si */
> +COSTS_N_INSNS (8),/* int_div_di */
> + 2,/* branch_cost */
> + 4 /* memory_latency */
>}
>  };
> 
> 
> @@ -14592,6 +14614,7 @@ mips_issue_rate (void)
>  case PROCESSOR_LOONGSON_2F:
>  case PROCESSOR_LOONGSON_3A:
>  case PROCESSOR_P5600:
> +case PROCESSOR_P6600:
>return 4;
> 
>  case PROCESSOR_XLP:
> @@ -14727,7 +14750,7 @@ mips_multipass_dfa_lookahead (void)
>if (TUNE_OCTEON)
>  return 2;
> 
> -  if (TUNE_P5600 || TUNE_I6400)
> +  if (TUNE_P5600 || TUNE_P6600 || TUNE_I6400)
>  return 4;
> 
>return 0;
> @@ -18647,6 +18670,28 @@ mips_orphaned_high_part_p (mips_offset_table
> *htab, rtx_insn *insn)
>return false;
>  }
> 
> +/* Subroutine of mips_avoid_hazard.  We classify unconditional
> branches
> +   of interest for the P6600 for performance reasons.  We're
> interested
> +   in differentiating BALC from JIC, JIALC and BC.  */
> +
> +static enum mips_ucbranch_type
> +mips_classify_branch_p6600 (rtx_insn *insn)
> +{
> +  if (!(insn
> +  && USEFUL_INSN_P (insn)
> +  && GET_CODE (PATTERN (insn)) != SEQUENCE))
> +return UC_UNDEFINED;

Let's switch this around to the following with a comment to say
ignore sequences because they represent a filled delay slot
branch (which presumably is not affected by the uArch issue).

  if (insn
  || !USEFUL_INSN_P (insn)
  || GET_CODE (PATTERN (insn)) == SEQUENCE))
return UC_UNDEFINED;

> +
> +  if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC.  */
> +  || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose
> jump.  */
> +return UC_OTHER;

I don't recall what a loose jump was supposed to refer to, presumably
'direct jump'. 

>  /* Subroutine of mips_reorg_process_insns.  If there is a hazard
> between
> INSN and a previous instruction, avoid it by inserting nops after
> instruction AFTER.
> @@ -18699,14 +18744,36 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn
> *insn, int *hilo_delay,
>  && GET_CODE (pattern) != ASM_INPUT
>  && asm_noperands (pattern) < 0)
>  nops = 1;
> +  /* The P6600's branch predictor does not handle certain static
> + sequences of back-to-back jumps well.  Inserting a no-op only
> + costs space as the dispatch unit will disregard the nop.
> + Here we handle the cases of back to back unconditional branches
> + that are inefficent.  */
> +  else if (TUNE_P6600 && TARGET_CB_MAYBE && !optimize_size
> 

RE: [PATCH][MIPS] Add support for P6600

2016-05-24 Thread Matthew Fortune
Hi Robert,

A few comments inlined. I'd like to review again and/or let Catherine
comment before commit.

Thanks,
Matthew

> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 06acd30..cbe1007 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -18496,6 +18519,28 @@ mips_orphaned_high_part_p (mips_offset_table *htab, 
> rtx_insn
> *insn)
>return false;
>  }
> 
> +/* Subroutine of mips_avoid_hazard.  We classify unconditional branches
> +   of interest for the P6600 for performance reasons.  We are interested
> +   in differentiating BALC from JIC, JIALC and BC.  */
> +
> +static enum mips_ucbranch_type
> +mips_classify_branch_p6600 (rtx_insn *insn)
> +{
> +  if (!(insn
> + && USEFUL_INSN_P (insn)
> + && GET_CODE (PATTERN (insn)) != SEQUENCE))
> +return UC_UNDEFINED;

This might be easier to read if you move the not inside.

> +  if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC.  */
> +  || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose jump.  */
> +return UC_OTHER;

I think I understand what 'loose jump' means here. It is saying that
this is a TYPE_JUMP instruction which does not in a sequence which
means it must not have a delay slot. I would just say /* BC.  */ in the
comment though.

> +  if (CALL_P (insn) && get_attr_jal (insn) == JAL_DIRECT)
> +return UC_BALC;
> +
> +  return UC_UNDEFINED;
> +}
> +
>  /* Subroutine of mips_reorg_process_insns.  If there is a hazard between
> INSN and a previous instruction, avoid it by inserting nops after
> instruction AFTER.
> @@ -18548,14 +18593,36 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
> int
> *hilo_delay,
>  && GET_CODE (pattern) != ASM_INPUT
>  && asm_noperands (pattern) < 0)
>  nops = 1;
> +  /* The P6600's branch predictor does not handle certain static
> + sequences of back-to-back jumps well.  Inserting a no-op only
> + costs space as the dispatch unit will disregard the nop.
> + Here we handle the cases of back to back unconditional branches
> + that are inefficient.  */
> +  else if (TUNE_P6600 && TARGET_CB_MAYBE && !optimize_size
> +&& ((mips_classify_branch_p6600 (after) == UC_BALC
> + && mips_classify_branch_p6600 (insn) == UC_OTHER)
> +|| (mips_classify_branch_p6600 (insn) == UC_BALC
> +&& (mips_classify_branch_p6600 (after) == UC_OTHER
> +nops = 1;
>else
>  nops = 0;
> 
>/* Insert the nops between this instruction and the previous one.
>   Each new nop takes us further from the last hilo hazard.  */
>*hilo_delay += nops;
> +
> +  /* If we're tuning for the P6600, we come across an annoying GCC
> + assumption that debug information always follows a call.  Move
> + past any debug information in that case.  */
> +  rtx_insn *real_after = after;
> +  if (real_after && nops && CALL_P (real_after))
> +while (TUNE_P6600 && real_after
> +&& (NOTE_P (NEXT_INSN (real_after))
> +|| BARRIER_P (NEXT_INSN (real_after
> +  real_after = NEXT_INSN (real_after);
> +

As Bernhard pointed out this could be improved by hoisting the TUNE_P6600
up to the if.

The comment on this doesn't really say what is going on here.  We have
to move to the next real instruction because we are going to insert a
NOP and the current instruction is a call which will have debug
information.  We can't separate the call from the debug info so move
past it. As to whether this should only happen for the P6600 specific
hazard is subjective. It should be save all the time so TUNE_P6600 could
be deleted entirely.

>while (nops-- > 0)
> -emit_insn_after (gen_hazard_nop (), after);
> +emit_insn_after (gen_hazard_nop (), real_after);
> 
>/* Set up the state for the next instruction.  */
>*hilo_delay += ninsns;
> @@ -18565,6 +18632,14 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
> int
> *hilo_delay,
>  switch (get_attr_hazard (insn))
>{
>case HAZARD_NONE:
> + /* For the P6600, flag some unconditional branches as having
> +a pseudo-forbidden slot.  This will cause additional nop insertion
> +or SEQUENCE breaking as required.  */

This should explain that the addition of nops is for performance reasons
not correctness.

> + if (TUNE_P6600
> + && !optimize_size
> + && TARGET_CB_MAYBE
> + && mips_classify_branch_p6600 (insn) == UC_OTHER)
> +   *fs_delay = true;
>   break;
> 
>case HAZARD_FORBIDDEN_SLOT:
> @@ -18806,7 +18881,10 @@ mips_reorg_process_insns (void)
>sequence and replace it with the delay slot instruction
>then the jump to clear the forbidden slot hazard.  */
> 
> -   if (fs_delay)
> +   if (fs_delay || (TUNE_P6600
> +&& TARGET_CB_MAYBE
> +&& mips_classify_branch_p6600 (insn)
> +