Good catch on the FS side of things and thanks for looking into this. Once we get the patch settled, this will be a useful optimization for all the O3 users.
-Korey On Tue, Apr 2, 2013 at 1:59 PM, Mitch Hayenga <[email protected]>wrote: > Yeah, I'll see if I get time to do a more full solution later this week. I > also realized this current patch would break FS mode, since the frontend > signals instruction fetch page faults by creating a nop with a fault > attached (this patch would just discard that nop). So, checking for a > fault would be required before discarding. > > Discarding unconditional jumps also works. Did a quick mod where I > discarded them if the following was true "if (isUncondCtrl() && > isDirectCtrl() && !inst->writesRegs())". Where writesRegs was returned > true if the instruction wrote something other than the pc or zero reg (on > ARM). But using a flag in the isa files would be a better way than > checking the destination registers explicitly. > > On Tue, Apr 2, 2013 at 11:50 AM, Korey Sewell <[email protected]> wrote: > > > Hi Mitch, > > I see what you are saying about the atomicity aspect of the IT block. > Those > > are fair points. Likewise, it's fair to optimize them about past decode > > like you what your patch does. > > > > I'm looking for something extra such that another CPU model (or code) > will > > not look at that instruction and think it's just a "nop". For instance, > the > > prefetch instruction is marked with a "Prefetch" flag which allows a CPU > > model to check for prefetch and handle them differently if it wishes to. > > > > To me, it looks like the converged solution is: > > 1) add a flag called "isPurePredicate" (or a better name!) in DynInst. > > 2) Then, in your patch you can give the instruction two flags: "isNop" > and > > "isPurePredicate". > > 3) Finally, when the instruction is removed from the CPU, you check to > see > > if the "isPurePredicate" is asserted and if the instruction is not > > squashed. If that condition is true, increment a stat counting how many > > times we performed this optimization. > > > > I'm hoping this both eliminates the IT instruction from the back-end > (isNop > > flag)and then allows for a fair accounting of that optimization in the > end > > of simulation stats (isPurePredicate flag). > > > > Would you agree with that? > > > > > > > > > > On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga < > > [email protected] > > > wrote: > > > > > "Lastly, this optimization could also applied to any branch > instructions > > > that get resolved at decode, right?" > > > That's a good one that I'm definitely going to implement. > > > > > > I think whoever wrote the current IPC counting mechanism was trying to > > > measure backend IPC and not total IPC. This makes sense by counting > data > > > prefetches but not instruction prefetches towards IPC. > > > > > > I'm still with ignoring IT instructions though, since it was originally > > > created when ARM shrank their opcodes for the THUMB instruction set and > > > didn't have enough bits to do their normal predication encoding. IT > > > instructions just allow the decoder to save and append these bits to > > > recreate the full ARM opcode. They've also made IT blocks be as atomic > > as > > > possible (only the last instruction is allowed to be a branch and > jumps, > > > other than exception returns, into IT blocks are not permitted). So, > in > > my > > > mind IT instructions are effectively part of the "instruction" that the > > > entire block comprises. > > > > > > > > > On Mon, Apr 1, 2013 at 11:16 AM, Korey Sewell <[email protected]> > wrote: > > > > > > > Hi Mitch, > > > > Thanks for the quick response. I pretty much agree with the sentiment > > > that > > > > this is a valid optimization but probably disagree a bit on going > > forward > > > > with (3). > > > > > > > > I think you pose a valid question of "If it's already acceptable to > not > > > > count ISA-level nops towards IPC, why not IT instructions as well?". > My > > > > answer to that would be that whereas nops/prefetches can safely be > > > ignored > > > > and not affect instruction order, you can't literally ignore an IT > > > > instruction without affecting instruction order. > > > > > > > > If I err in that reasoning, then I think I'd be OK with #3, but if > it's > > > > the case where the output of the IT instruction is actually needed to > > > alter > > > > control flow then I don't think it's OK to treat it as a nop and > ignore > > > it > > > > in stats. > > > > > > > > I'd be for #1 actually. Although it may sound "hackish", each ISA > does > > > > have it's own quirks and at commit I wouldn't be against checking the > > > > ISA-specific state to figure out if this were a optimized instruction > > > (mark > > > > a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), > > check > > > > to see if this is flag is asserted but the committed flag isn't. If > > not, > > > > count it as a committed op. > > > > > > > > Lastly, this optimization could also applied to any branch > instructions > > > > that get resolved at decode, right? > > > > > > > > -Korey > > > > On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga < > > > > [email protected]> wrote: > > > > > > > >> Re-sending this so it gets sent to the list. > > > >> > > > >> Yes, right now this would not properly credit IPC for IT > instructions, > > > >> since nops don't count towards IPC. I overlooked that since I use > > > >> execution time as my evaluation metric. > > > >> > > > >> Three quick thoughts on this... > > > >> 1) A quick solution would be to look at the ITstate of committing > ops > > > >> and infer a dropped IT instruction. This would be a bit hackish and > > ARM > > > >> specific though. > > > >> 2) Maintaining the current method of sending nops through the > > pipeline > > > >> could be made to work. By going through and modifying the code to > be > > > sure > > > >> nops did not count against bandwidth or size restrictions. You'd > also > > > have > > > >> to worry about not impacting stats like rob reads/writes that the > > McPAT > > > >> users would feed to their power models. And at commit you'd still > > have > > > to > > > >> special case the IT instruction to make sure it got counted. > > > >> 3) If it's already acceptable to not count ISA-level nops towards > > IPC, > > > >> why not IT instructions as well. They do feed some information to > the > > > >> decoder, but overall their relative work isn't much more than a nop > > > (being > > > >> fetched + decoded). They also potentially do far less work than a > > > prefetch > > > >> instruction (which is also not counted). > > > >> > > > >> I personally like 3, since the current subset of instructions > counted > > > >> towards IPC already seems to have a bit of arbitrariness and would > > > require > > > >> no changes. > > > >> > > > >> PS: I coded this up because I noticed a few times where up to 1/5 of > > my > > > >> instruction window could be occupied by "useless" IT instructions > > > >> > > > >> > > > >> > > > >> On Sun, Mar 31, 2013 at 10:50 PM, Korey Sewell <[email protected]> > > > wrote: > > > >> > > > >>> Hi Mitch, > > > >>> Another thing I wonder about with this patch is the impact on > stats. > > > >>> > > > >>> If I recall right, O3 throws aways nops. So when we talk about IPC > > with > > > >>> this patch in, we aren't giving the CPU "credit" for doing what's > > > necessary > > > >>> for the ARM IT instruction right? > > > >>> > > > >>> I'm thinking there may need to be another patch supplemented to > this > > > >>> that counts the # of times this optimization happens. That way, we > > > have all > > > >>> the bases covered for instruction/IPC counting. > > > >>> > > > >>> Thoughts? > > > >>> > > > >>> -Korey > > > >>> > > > >>> > > > >>> > > > >>> On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga < > > > >>> [email protected]> wrote: > > > >>> > > > >>>> > > > >>>> > > > >>>> > On March 30, 2013, 7:31 a.m., Ali Saidi wrote: > > > >>>> > > While this seems harmless enough, I wonder if there is some > > > >>>> interaction between faults/interrupts and the instruction that we > > > should > > > >>>> worry about. I haven't given it enough thought to say either way, > > but > > > it > > > >>>> seems like it could be a concern. > > > >>>> > > > >>>> I thought about it somewhat, since IT blocks are required to be > able > > > to > > > >>>> handle faults and return to execution properly within an IT block. > > It > > > >>>> seems the gem5 solution is probably similar to what a real > processor > > > >>>> implementation would use, appending the IT state to the PC. So an > > > >>>> exception/interrupt within an IT block would just return and the > > > decoder > > > >>>> would pick off the extra IT bits from the PC (that detail how to > > > predicate > > > >>>> up to the next 3 ops). If the exception/interrupt was just prior > to > > > the IT > > > >>>> instruction, it would just get sent to the decoder like normal. > > > >>>> > > > >>>> I was thinking more on the "discarding nops at decode" part. The > > only > > > >>>> case I think that could give that trouble is self-modifying code, > > > since > > > >>>> you'd want to track instruction addresses to know if a snooped > write > > > >>>> changed a currently executing instruction. But gem5 doesn't > really > > > provide > > > >>>> that now anyway and you could use cheaper structures to perform > that > > > >>>> operation (since false positives would be ok). > > > >>>> > > > >>>> > > > >>>> - Mitch > > > >>>> > > > >>>> > > > >>>> ----------------------------------------------------------- > > > >>>> > > > >>>> This is an automatically generated e-mail. To reply, visit: > > > >>>> http://reviews.gem5.org/r/1805/#review4177 > > > >>>> ----------------------------------------------------------- > > > >>>> > > > >>>> > > > >>>> On March 29, 2013, 7:47 p.m., Mitch Hayenga wrote: > > > >>>> > > > > >>>> > ----------------------------------------------------------- > > > >>>> > > > >>>> > This is an automatically generated e-mail. To reply, visit: > > > >>>> > http://reviews.gem5.org/r/1805/ > > > >>>> > ----------------------------------------------------------- > > > >>>> > > > > >>>> > (Updated March 29, 2013, 7:47 p.m.) > > > >>>> > > > > >>>> > > > > >>>> > Review request for Default. > > > >>>> > > > > >>>> > > > > >>>> > Description > > > >>>> > ------- > > > >>>> > > > >>>> > > > > >>>> > Mark ARM IT (if-then) instructions as nops. > > > >>>> > > > > >>>> > ARM's IT instructions predicate up to the next 4 instructions on > > > >>>> various condition codes. IT instructions really just send control > > > signals > > > >>>> to the decoder, after decode they do not read or write any > > registers. > > > >>>> Marking them as nops (along with the other patch that drops nops > at > > > decode) > > > >>>> saves execution resources and bandwidth. > > > >>>> > > > > >>>> > > > > >>>> > Diffs > > > >>>> > ----- > > > >>>> > > > > >>>> > src/arch/arm/isa/insts/misc.isa 47591444a7c5 > > > >>>> > > > > >>>> > Diff: http://reviews.gem5.org/r/1805/diff/ > > > >>>> > > > > >>>> > > > > >>>> > Testing > > > >>>> > ------- > > > >>>> > > > > >>>> > A fast libquantum run. > > > >>>> > > > > >>>> > > > > >>>> > Thanks, > > > >>>> > > > > >>>> > Mitch Hayenga > > > >>>> > > > > >>>> > > > > >>>> > > > >>>> _______________________________________________ > > > >>>> gem5-dev mailing list > > > >>>> [email protected] > > > >>>> http://m5sim.org/mailman/listinfo/gem5-dev > > > >>>> > > > >>> > > > >>> > > > >>> > > > >>> -- > > > >>> - Korey > > > >>> > > > >> > > > >> > > > > > > > > > > > > -- > > > > - Korey > > > > > > > _______________________________________________ > > > gem5-dev mailing list > > > [email protected] > > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > > > > > > > -- > > - Korey > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > -- - Korey _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
