Good catch on the FS side of things and thanks for looking into this. Once
we get the patch settled, this will be a useful optimization for all the O3
users.

-Korey


On Tue, Apr 2, 2013 at 1:59 PM, Mitch Hayenga
<[email protected]>wrote:

> Yeah, I'll see if I get time to do a more full solution later this week.  I
> also realized this current patch would break FS mode, since the frontend
> signals instruction fetch page faults by creating a nop with a fault
> attached (this patch would just discard that nop).  So, checking for a
> fault would be required before discarding.
>
> Discarding unconditional jumps also works.  Did a quick mod where I
> discarded them if the following was true "if (isUncondCtrl() &&
> isDirectCtrl() && !inst->writesRegs())".  Where writesRegs was returned
> true if the instruction wrote something other than the pc or zero reg (on
> ARM).  But using a flag in the isa files would be a better way than
> checking the destination registers explicitly.
>
> On Tue, Apr 2, 2013 at 11:50 AM, Korey Sewell <[email protected]> wrote:
>
> > Hi Mitch,
> > I see what you are saying about the atomicity aspect of the IT block.
> Those
> > are fair points. Likewise, it's fair to optimize them about past decode
> > like you what your patch does.
> >
> > I'm looking for something extra such that another CPU model (or code)
> will
> > not look at that instruction and think it's just a "nop". For instance,
> the
> > prefetch instruction is marked with a "Prefetch" flag which allows a CPU
> > model to check for prefetch and handle them differently if it wishes to.
> >
> > To me, it looks like the converged solution is:
> > 1) add a flag called "isPurePredicate" (or a better name!) in DynInst.
> > 2) Then, in your patch you can give the instruction two flags: "isNop"
> and
> > "isPurePredicate".
> > 3) Finally, when the instruction is removed from the CPU, you check to
> see
> > if the "isPurePredicate" is asserted and if the instruction is not
> > squashed.  If that condition is true, increment a stat counting how many
> > times we performed this optimization.
> >
> > I'm hoping this both eliminates the IT instruction from the back-end
> (isNop
> > flag)and then allows for a fair accounting of that optimization in the
> end
> > of simulation stats (isPurePredicate flag).
> >
> > Would you agree with that?
> >
> >
> >
> >
> > On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga <
> > [email protected]
> > > wrote:
> >
> > > "Lastly, this optimization could also applied to any branch
> instructions
> > > that get resolved at decode, right?"
> > > That's a good one that I'm definitely going to implement.
> > >
> > > I think whoever wrote the current IPC counting mechanism was trying to
> > > measure backend IPC and not total IPC.  This makes sense by counting
> data
> > > prefetches but not instruction prefetches towards IPC.
> > >
> > > I'm still with ignoring IT instructions though, since it was originally
> > > created when ARM shrank their opcodes for the THUMB instruction set and
> > > didn't have enough bits to do their normal predication encoding.  IT
> > > instructions just allow the decoder to save and append these bits to
> > > recreate the full ARM opcode.  They've also made IT blocks be as atomic
> > as
> > > possible (only the last instruction is allowed to be a branch and
> jumps,
> > > other than exception returns, into IT blocks are not permitted).  So,
> in
> > my
> > > mind IT instructions are effectively part of the "instruction" that the
> > > entire block comprises.
> > >
> > >
> > > On Mon, Apr 1, 2013 at 11:16 AM, Korey Sewell <[email protected]>
> wrote:
> > >
> > > > Hi Mitch,
> > > > Thanks for the quick response. I pretty much agree with the sentiment
> > > that
> > > > this is a valid optimization but probably disagree a bit on going
> > forward
> > > > with (3).
> > > >
> > > > I think you pose a valid question of "If it's already acceptable to
> not
> > > > count ISA-level nops towards IPC, why not IT instructions as well?".
> My
> > > > answer to that would be that whereas nops/prefetches can safely be
> > > ignored
> > > > and not affect instruction order, you can't literally ignore an IT
> > > > instruction without affecting instruction order.
> > > >
> > > > If I err in that reasoning, then I think I'd be OK with #3, but if
> it's
> > > > the case where the output of the IT instruction is actually needed to
> > > alter
> > > > control flow then I don't think it's OK to treat it as a nop and
> ignore
> > > it
> > > > in stats.
> > > >
> > > > I'd be for #1 actually. Although it may sound "hackish", each ISA
> does
> > > > have it's own quirks and at commit I wouldn't be against checking the
> > > > ISA-specific state to figure out if this were a optimized instruction
> > > (mark
> > > > a flag in the DynInst) and when it leaves the O3 cpu (instDone()?),
> > check
> > > > to see if this is flag is asserted but the committed flag isn't. If
> > not,
> > > > count it as a committed op.
> > > >
> > > > Lastly, this optimization could also applied to any branch
> instructions
> > > > that get resolved at decode, right?
> > > >
> > > > -Korey
> > > > On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
> > > > [email protected]> wrote:
> > > >
> > > >> Re-sending this so it gets sent to the list.
> > > >>
> > > >> Yes, right now this would not properly credit IPC for IT
> instructions,
> > > >> since nops don't count towards IPC.  I overlooked that since I use
> > > >> execution time as my evaluation metric.
> > > >>
> > > >> Three quick thoughts on this...
> > > >> 1)  A quick solution would be to look at the ITstate of committing
> ops
> > > >> and infer a dropped IT instruction.  This would be a bit hackish and
> > ARM
> > > >> specific though.
> > > >> 2)  Maintaining the current method of sending nops through the
> > pipeline
> > > >> could be made to work.  By going through and modifying the code to
> be
> > > sure
> > > >> nops did not count against bandwidth or size restrictions.  You'd
> also
> > > have
> > > >> to worry about not impacting stats like rob reads/writes that the
> > McPAT
> > > >> users would feed to their power models.  And at commit you'd still
> > have
> > > to
> > > >> special case the IT instruction to make sure it got counted.
> > > >> 3)  If it's already acceptable to not count ISA-level nops towards
> > IPC,
> > > >> why not IT instructions as well.  They do feed some information to
> the
> > > >> decoder, but overall their relative work isn't much more than a nop
> > > (being
> > > >> fetched + decoded).  They also potentially do far less work than a
> > > prefetch
> > > >> instruction (which is also not counted).
> > > >>
> > > >> I personally like 3, since the current subset of instructions
> counted
> > > >> towards IPC already seems to have a bit of arbitrariness and would
> > > require
> > > >> no changes.
> > > >>
> > > >> PS: I coded this up because I noticed a few times where up to 1/5 of
> > my
> > > >> instruction window could be occupied by "useless" IT instructions
> > > >>
> > > >>
> > > >>
> > > >> On Sun, Mar 31, 2013 at 10:50 PM, Korey Sewell <[email protected]>
> > > wrote:
> > > >>
> > > >>> Hi Mitch,
> > > >>> Another thing I wonder about with this patch is the impact on
> stats.
> > > >>>
> > > >>> If I recall right, O3 throws aways nops. So when we talk about IPC
> > with
> > > >>> this patch in, we aren't giving the CPU "credit" for doing what's
> > > necessary
> > > >>> for the ARM IT instruction right?
> > > >>>
> > > >>> I'm thinking there may need to be another patch supplemented to
> this
> > > >>> that counts the # of times this optimization happens. That way, we
> > > have all
> > > >>> the bases covered for instruction/IPC counting.
> > > >>>
> > > >>> Thoughts?
> > > >>>
> > > >>> -Korey
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
> > > >>> [email protected]> wrote:
> > > >>>
> > > >>>>
> > > >>>>
> > > >>>> > On March 30, 2013, 7:31 a.m., Ali Saidi wrote:
> > > >>>> > > While this seems harmless enough, I wonder if there is some
> > > >>>> interaction between faults/interrupts and the instruction that we
> > > should
> > > >>>> worry about. I haven't given it enough thought to say either way,
> > but
> > > it
> > > >>>> seems like it could be a concern.
> > > >>>>
> > > >>>> I thought about it somewhat, since IT blocks are required to be
> able
> > > to
> > > >>>> handle faults and return to execution properly within an IT block.
> >  It
> > > >>>> seems the gem5 solution is probably similar to what a real
> processor
> > > >>>> implementation would use, appending the IT state to the PC.  So an
> > > >>>> exception/interrupt within an IT block would just return and the
> > > decoder
> > > >>>> would pick off the extra IT bits from the PC (that detail how to
> > > predicate
> > > >>>> up to the next 3 ops).  If the exception/interrupt was just prior
> to
> > > the IT
> > > >>>> instruction, it would just get sent to the decoder like normal.
> > > >>>>
> > > >>>> I was thinking more on the "discarding nops at decode" part.  The
> > only
> > > >>>> case I think that could give that trouble is self-modifying code,
> > > since
> > > >>>> you'd want to track instruction addresses to know if a snooped
> write
> > > >>>> changed a currently executing instruction.  But gem5 doesn't
> really
> > > provide
> > > >>>> that now anyway and you could use cheaper structures to perform
> that
> > > >>>> operation (since false positives would be ok).
> > > >>>>
> > > >>>>
> > > >>>> - Mitch
> > > >>>>
> > > >>>>
> > > >>>> -----------------------------------------------------------
> > > >>>>
> > > >>>> This is an automatically generated e-mail. To reply, visit:
> > > >>>> http://reviews.gem5.org/r/1805/#review4177
> > > >>>> -----------------------------------------------------------
> > > >>>>
> > > >>>>
> > > >>>> On March 29, 2013, 7:47 p.m., Mitch Hayenga wrote:
> > > >>>> >
> > > >>>> > -----------------------------------------------------------
> > > >>>>
> > > >>>> > This is an automatically generated e-mail. To reply, visit:
> > > >>>> > http://reviews.gem5.org/r/1805/
> > > >>>> > -----------------------------------------------------------
> > > >>>> >
> > > >>>> > (Updated March 29, 2013, 7:47 p.m.)
> > > >>>> >
> > > >>>> >
> > > >>>> > Review request for Default.
> > > >>>> >
> > > >>>> >
> > > >>>> > Description
> > > >>>> > -------
> > > >>>>
> > > >>>> >
> > > >>>> > Mark ARM IT (if-then) instructions as nops.
> > > >>>> >
> > > >>>> > ARM's IT instructions predicate up to the next 4 instructions on
> > > >>>> various condition codes.  IT instructions really just send control
> > > signals
> > > >>>> to the decoder, after decode they do not read or write any
> > registers.
> > > >>>> Marking them as nops (along with the other patch that drops nops
> at
> > > decode)
> > > >>>> saves execution resources and bandwidth.
> > > >>>> >
> > > >>>> >
> > > >>>> > Diffs
> > > >>>> > -----
> > > >>>> >
> > > >>>> >   src/arch/arm/isa/insts/misc.isa 47591444a7c5
> > > >>>> >
> > > >>>> > Diff: http://reviews.gem5.org/r/1805/diff/
> > > >>>> >
> > > >>>> >
> > > >>>> > Testing
> > > >>>> > -------
> > > >>>> >
> > > >>>> > A fast libquantum run.
> > > >>>> >
> > > >>>> >
> > > >>>> > Thanks,
> > > >>>> >
> > > >>>> > Mitch Hayenga
> > > >>>> >
> > > >>>> >
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> gem5-dev mailing list
> > > >>>> [email protected]
> > > >>>> http://m5sim.org/mailman/listinfo/gem5-dev
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> - Korey
> > > >>>
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > - Korey
> > > >
> > > _______________________________________________
> > > gem5-dev mailing list
> > > [email protected]
> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > >
> >
> >
> >
> > --
> > - Korey
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>



-- 
- Korey
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to