Re: RFA: Add lock_lenth attribute to support the ARC port (Was: Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles)

2012-10-24 Thread Joseph S. Myers
On Tue, 23 Oct 2012, Joern Rennecke wrote:

> I'll be posting the ARC port shortly; it does not fit into a single 100 KB
> posting, so I'm thinking of splitting it in a configury patch and zx
> compressed files/tarballs for arc.c, arc.md, libgcc, and the rest of the port.

The size limit is 400 kB, not 100 kB.  Hopefully that means you don't need 
to compress so many bits and can attach more as plain patches for future 
revisions.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFA: Add lock_lenth attribute to support the ARC port (Was: Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles)

2012-10-24 Thread Joern Rennecke

Quoting Richard Biener :


Just to add some extra information, can you quote your ports
ADJUST_INSN_LENGTH and one example instruction with length/lock_length
attribute the above applies to?


This is a bit besides the point, since lock_length does mostly the traditional
branch shortening (with alignment effects rolled in for branches - I'm not
really happy about this, but that was required for convergence), length
does alignment effects on top, and ADJUST_INSN_LENGTH does random
context-sensitive adjustments related to pipeline quirks and conditional
execution.

I'm afraid this gcc port's logic in this area is both more complicated than
what you would expect in an average gcc port, and much more simplistic than
what a competent assembler programmer uses to optimize code for these
microarchitectures.

from arc.h:

#define ADJUST_INSN_LENGTH(X, LENGTH)   \
  (LENGTH) += arc_adjust_insn_length ((X), (LENGTH))

from arc.c:

/* Return length adjustment for INSN.  */
int
arc_adjust_insn_length (rtx insn, int len)
{
  int adj = 0;

  if (!INSN_P (insn))
return 0;
  if (GET_CODE (PATTERN (insn)) == SEQUENCE)
{
  int adj0, adj1, len1;
  rtx pat = PATTERN (insn);
  rtx i0 = XVECEXP (pat, 0, 0);
  rtx i1 = XVECEXP (pat, 0, 1);

  len1 = get_attr_lock_length (i1);
  gcc_assert (!len || len >= 4 || (len == 2 && get_attr_iscompact (i1)));
  if (!len1)
len1 = get_attr_iscompact (i1) != ISCOMPACT_FALSE ? 2 : 4;
  adj0 = arc_adjust_insn_length (i0, len - len1);
  adj1 = arc_adjust_insn_length (i1, len1);
  return adj0 + adj1;
}
  if (recog_memoized (insn) == CODE_FOR_doloop_end_i)
{
  rtx prev = prev_nonnote_insn (insn);

  return ((LABEL_P (prev)
   || (TARGET_ARC600
   && (JUMP_P (prev) || GET_CODE (PATTERN (prev)) ==  
SEQUENCE)))

  ? 4 : 0);
}

  /* Check for return with but one preceding insn since function
 start / call.  */
  if (TARGET_PAD_RETURN
  && JUMP_P (insn)
  && GET_CODE (PATTERN (insn)) != ADDR_VEC
  && GET_CODE (PATTERN (insn)) != ADDR_DIFF_VEC
  && get_attr_type (insn) == TYPE_RETURN)
{
  rtx prev = prev_active_insn (insn);

  if (!prev || !(prev = prev_active_insn (prev))
  || ((NONJUMP_INSN_P (prev)
   && GET_CODE (PATTERN (prev)) == SEQUENCE)
  ? CALL_ATTR (XVECEXP (PATTERN (prev), 0, 0), NON_SIBCALL)
  : CALL_ATTR (prev, NON_SIBCALL)))
return 4;
}
  /* Rtl changes too much before arc_reorg to keep ccfsm state.
 But we are not required to give exact answers then.  */
  if (cfun->machine->arc_reorg_started
  && (JUMP_P (insn) || (len & 2)))
{
  struct arc_ccfsm *statep = &cfun->machine->ccfsm_current;

  arc_ccfsm_advance_to (insn);
  switch (statep->state)
{
case 0:
  break;
case 1: case 2:
  /* Deleted branch.  */
  return -len;
case 3: case 4: case 5:
  /* Conditionalized insn.  */
  if ((!JUMP_P (insn)
   || (get_attr_type (insn) != TYPE_BRANCH
   && get_attr_type (insn) != TYPE_UNCOND_BRANCH
   && (get_attr_type (insn) != TYPE_RETURN
   || (statep->cc != ARC_CC_EQ && statep->cc != ARC_CC_NE)
   || NEXT_INSN (PREV_INSN (insn)) != insn)))
  && (len & 2))
adj = 2;
  break;
default:
  gcc_unreachable ();
}
}
  if (TARGET_ARC600)
{
  rtx succ = next_real_insn (insn);

  if (succ && INSN_P (succ))
adj += arc600_corereg_hazard (insn, succ);
}

  /* Restore extracted operands - otherwise splitters like the  
addsi3_mixed one

 can go awry.  */
  extract_constrain_insn_cached (insn);

  return adj;
}

From arc.md:

; Since the demise of REG_N_SETS, it is no longer possible to find out
; in the prologue / epilogue expanders how many times blink is set.
; Using df_regs_ever_live_p to decide if blink needs saving means that
; any explicit use of blink will cause it to be saved; hence we cannot
; represent the blink use in return / sibcall instructions themselves, and
; instead have to show it in EPILOGUE_USES and must explicitly
; forbid instructions that change blink in the return / sibcall delay slot.
(define_insn "return_i"
  [(return)]
  "reload_completed"
{
  rtx reg
= gen_rtx_REG (Pmode,
   arc_return_address_regs[arc_compute_function_type (cfun)]);

  if (TARGET_PAD_RETURN)
arc_pad_return ();
  output_asm_insn (\"j%!%* [%0]%&\", ®);
  return \"\";
}
  [(set_attr "type" "return")
   (set_attr "cond" "canuse")
   (set (attr "iscompact")
(cond [(eq (symbol_ref "arc_compute_function_type (cfun)")
   (symbol_ref "ARC_FUNCTION_NORMAL"))
   (const_string "maybe")]
  (const_string "false")))
   (set (attr "length")
(cond [(eq (symbol_ref "arc

Re: RFA: Add lock_lenth attribute to support the ARC port (Was: Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles)

2012-10-24 Thread Richard Biener
On Wed, Oct 24, 2012 at 3:42 AM, Joern Rennecke
 wrote:
> Quoting Richard Biener :
>
>> On Tue, Oct 16, 2012 at 9:35 PM, Joern Rennecke
>>  wrote:
>
> ..
>>>
>>> Well, we could split it anyway, and give ports without the need for
>>> multiple length attributes the benefit of the optimistic algorithm.
>>>
>>> I have attached a patch that implements this.
>>
>>
>> Looks reasonable to me, though I'm not familiar enough with the code
>> to approve it.
>
>
> Now that Richard Sandiford has reviewed that split-off part and it's in
> the source tree, we can return to the remaining functionality needed
> by for the ARC port.
>
>> I'd strongly suggest to try harder to make things work for you without
>> the new attribute even though I wasn't really able to follow your
>> reasoning
>> on why that wouldn't work.  It may be easier to motivate this change
>> once the port is in without that attribute so one can actually look at
>> the machine description and port details.
>
>
> Well, it doesn't simply drop in with the existing branch shortening -
> libgcc won't compile because of out-of-range branches.
> I tried to lump the length and lock_length atribute together, and that
> just gives genattrtab indigestion.  It sits there looping forever.
> I could start debugging this, but that would take an unknown amount of
> time, and then the review of the fix would take an unknown amount of time,
> and then the ARC port probably needs fixing up again because it just
> doesn't work right with these fudged lengths.  And even if we could get
> everything required in before the close of phase 1, the branch shortening
> would be substandard.
> It seems more productive to get the branch shortening working now.
> The lock_length atrtibute is completely optional, so no port maintainer
> would be forced to use the functionality if it's not desired.
>
> The issue is that the some instructions need to be aligned or unaligned
> for performance or in a few cases even for correctness.  Just inserting
> random nops would be a waste of code space and cycles, since most of the
> time, the desired (mis)alignment can be archived by selectively making
> a short instruction long.  If an instruction that is long once was forced
> to stay long henceforth, that would actually defeat the purpose of getting
> the desired alignment.  Then another short instruction - if it can be found
> -
> would need to be upsized.  So a size increase could ripple through as
> alignments are distorted.  The natural thing to do is really when the
> alignemnt changes is really to let the upsized instruction be short again.
> Only length-determined branch sizes have to be locked to avoid cycles.

Just to add some extra information, can you quote your ports
ADJUST_INSN_LENGTH and one example instruction with length/lock_length
attribute the above applies to?

> This is the documentation for the new role of the lock_length attribute
> (reduced from my previous attempt):
>
> @cindex lock_length
> Usually, when doing optimizing branch shortening, the instruction length
> is calculated by avaluating the @code{length} attribute, applying
> @code{ADJUST_INSN_LENGTH}, and taking the maximum of the resultant
> value and the length of the instruction in the previous iteration.

Which sounds straight-forward.  The docs of ADJUST_INSN_LENGTH
are not entirely clear, but it sounds like it may only increase length, correct?
I see that ADJUST_INSN_LENGTH is really not needed as everything
should be expressable in the length attribute of an instruction?

> If you define the @code{lock_length} attribute, the @code{lock_length}
> attribute will be evaluated, and then the maximum with of @code{lock_length}

with of?  I read it as 'of the'

> value from the previous iteration will be formed and saved.

So lock_length will only increase during iteration.

> Then the maximum of that value with the @code{length} attribute will
> be formed, and @code{ADJUST_INSN_LENGTH} will be applied.

ADJUST_INSN_LENGTH will be applied to the maximum?  What will
be the 'instruction length' equivalent to the simple non-lock_length case?
Is it the above, max (ADJUST_INSN_LENGTH (lock-length-max, length))?

> Thus, you can allow the length to vary downwards as well as upwards
> across iterations with suitable definitions of the @code{length} attribute
> and/or @code{ADJUST_INSN_LENGTH}.  Care has to be taken that this does not
> lead to infinite loops.

I don't see that you can shrink length with just suitable lock_length and
length attributes.  What seems to be the cruical difference is that you
apply ADJUST_INSN_LENGTH after combining lock-length-max and length.
But then you _decrease_ length with ADJUST_INSN_LENGHT ...

Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is
applied afterwards?  Thus,

> Usually, when doing optimizing branch shortening, the instruction length
> is calculated by avaluating the @code{length} attribute, applying
> @code{ADJUST_INSN_LENGTH}, and taking the maximum of the re