Yeah, I'll see if I get time to do a more full solution later this week. I also realized this current patch would break FS mode, since the frontend signals instruction fetch page faults by creating a nop with a fault attached (this patch would just discard that nop). So, checking for a fault would be required before discarding.
Discarding unconditional jumps also works. Did a quick mod where I discarded them if the following was true "if (isUncondCtrl() && isDirectCtrl() && !inst->writesRegs())". Where writesRegs was returned true if the instruction wrote something other than the pc or zero reg (on ARM). But using a flag in the isa files would be a better way than checking the destination registers explicitly. On Tue, Apr 2, 2013 at 11:50 AM, Korey Sewell <[email protected]> wrote: > Hi Mitch, > I see what you are saying about the atomicity aspect of the IT block. Those > are fair points. Likewise, it's fair to optimize them about past decode > like you what your patch does. > > I'm looking for something extra such that another CPU model (or code) will > not look at that instruction and think it's just a "nop". For instance, the > prefetch instruction is marked with a "Prefetch" flag which allows a CPU > model to check for prefetch and handle them differently if it wishes to. > > To me, it looks like the converged solution is: > 1) add a flag called "isPurePredicate" (or a better name!) in DynInst. > 2) Then, in your patch you can give the instruction two flags: "isNop" and > "isPurePredicate". > 3) Finally, when the instruction is removed from the CPU, you check to see > if the "isPurePredicate" is asserted and if the instruction is not > squashed. If that condition is true, increment a stat counting how many > times we performed this optimization. > > I'm hoping this both eliminates the IT instruction from the back-end (isNop > flag)and then allows for a fair accounting of that optimization in the end > of simulation stats (isPurePredicate flag). > > Would you agree with that? > > > > > On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga < > [email protected] > > wrote: > > > "Lastly, this optimization could also applied to any branch instructions > > that get resolved at decode, right?" > > That's a good one that I'm definitely going to implement. > > > > I think whoever wrote the current IPC counting mechanism was trying to > > measure backend IPC and not total IPC. This makes sense by counting data > > prefetches but not instruction prefetches towards IPC. > > > > I'm still with ignoring IT instructions though, since it was originally > > created when ARM shrank their opcodes for the THUMB instruction set and > > didn't have enough bits to do their normal predication encoding. IT > > instructions just allow the decoder to save and append these bits to > > recreate the full ARM opcode. They've also made IT blocks be as atomic > as > > possible (only the last instruction is allowed to be a branch and jumps, > > other than exception returns, into IT blocks are not permitted). So, in > my > > mind IT instructions are effectively part of the "instruction" that the > > entire block comprises. > > > > > > On Mon, Apr 1, 2013 at 11:16 AM, Korey Sewell <[email protected]> wrote: > > > > > Hi Mitch, > > > Thanks for the quick response. I pretty much agree with the sentiment > > that > > > this is a valid optimization but probably disagree a bit on going > forward > > > with (3). > > > > > > I think you pose a valid question of "If it's already acceptable to not > > > count ISA-level nops towards IPC, why not IT instructions as well?". My > > > answer to that would be that whereas nops/prefetches can safely be > > ignored > > > and not affect instruction order, you can't literally ignore an IT > > > instruction without affecting instruction order. > > > > > > If I err in that reasoning, then I think I'd be OK with #3, but if it's > > > the case where the output of the IT instruction is actually needed to > > alter > > > control flow then I don't think it's OK to treat it as a nop and ignore > > it > > > in stats. > > > > > > I'd be for #1 actually. Although it may sound "hackish", each ISA does > > > have it's own quirks and at commit I wouldn't be against checking the > > > ISA-specific state to figure out if this were a optimized instruction > > (mark > > > a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), > check > > > to see if this is flag is asserted but the committed flag isn't. If > not, > > > count it as a committed op. > > > > > > Lastly, this optimization could also applied to any branch instructions > > > that get resolved at decode, right? > > > > > > -Korey > > > On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga < > > > [email protected]> wrote: > > > > > >> Re-sending this so it gets sent to the list. > > >> > > >> Yes, right now this would not properly credit IPC for IT instructions, > > >> since nops don't count towards IPC. I overlooked that since I use > > >> execution time as my evaluation metric. > > >> > > >> Three quick thoughts on this... > > >> 1) A quick solution would be to look at the ITstate of committing ops > > >> and infer a dropped IT instruction. This would be a bit hackish and > ARM > > >> specific though. > > >> 2) Maintaining the current method of sending nops through the > pipeline > > >> could be made to work. By going through and modifying the code to be > > sure > > >> nops did not count against bandwidth or size restrictions. You'd also > > have > > >> to worry about not impacting stats like rob reads/writes that the > McPAT > > >> users would feed to their power models. And at commit you'd still > have > > to > > >> special case the IT instruction to make sure it got counted. > > >> 3) If it's already acceptable to not count ISA-level nops towards > IPC, > > >> why not IT instructions as well. They do feed some information to the > > >> decoder, but overall their relative work isn't much more than a nop > > (being > > >> fetched + decoded). They also potentially do far less work than a > > prefetch > > >> instruction (which is also not counted). > > >> > > >> I personally like 3, since the current subset of instructions counted > > >> towards IPC already seems to have a bit of arbitrariness and would > > require > > >> no changes. > > >> > > >> PS: I coded this up because I noticed a few times where up to 1/5 of > my > > >> instruction window could be occupied by "useless" IT instructions > > >> > > >> > > >> > > >> On Sun, Mar 31, 2013 at 10:50 PM, Korey Sewell <[email protected]> > > wrote: > > >> > > >>> Hi Mitch, > > >>> Another thing I wonder about with this patch is the impact on stats. > > >>> > > >>> If I recall right, O3 throws aways nops. So when we talk about IPC > with > > >>> this patch in, we aren't giving the CPU "credit" for doing what's > > necessary > > >>> for the ARM IT instruction right? > > >>> > > >>> I'm thinking there may need to be another patch supplemented to this > > >>> that counts the # of times this optimization happens. That way, we > > have all > > >>> the bases covered for instruction/IPC counting. > > >>> > > >>> Thoughts? > > >>> > > >>> -Korey > > >>> > > >>> > > >>> > > >>> On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga < > > >>> [email protected]> wrote: > > >>> > > >>>> > > >>>> > > >>>> > On March 30, 2013, 7:31 a.m., Ali Saidi wrote: > > >>>> > > While this seems harmless enough, I wonder if there is some > > >>>> interaction between faults/interrupts and the instruction that we > > should > > >>>> worry about. I haven't given it enough thought to say either way, > but > > it > > >>>> seems like it could be a concern. > > >>>> > > >>>> I thought about it somewhat, since IT blocks are required to be able > > to > > >>>> handle faults and return to execution properly within an IT block. > It > > >>>> seems the gem5 solution is probably similar to what a real processor > > >>>> implementation would use, appending the IT state to the PC. So an > > >>>> exception/interrupt within an IT block would just return and the > > decoder > > >>>> would pick off the extra IT bits from the PC (that detail how to > > predicate > > >>>> up to the next 3 ops). If the exception/interrupt was just prior to > > the IT > > >>>> instruction, it would just get sent to the decoder like normal. > > >>>> > > >>>> I was thinking more on the "discarding nops at decode" part. The > only > > >>>> case I think that could give that trouble is self-modifying code, > > since > > >>>> you'd want to track instruction addresses to know if a snooped write > > >>>> changed a currently executing instruction. But gem5 doesn't really > > provide > > >>>> that now anyway and you could use cheaper structures to perform that > > >>>> operation (since false positives would be ok). > > >>>> > > >>>> > > >>>> - Mitch > > >>>> > > >>>> > > >>>> ----------------------------------------------------------- > > >>>> > > >>>> This is an automatically generated e-mail. To reply, visit: > > >>>> http://reviews.gem5.org/r/1805/#review4177 > > >>>> ----------------------------------------------------------- > > >>>> > > >>>> > > >>>> On March 29, 2013, 7:47 p.m., Mitch Hayenga wrote: > > >>>> > > > >>>> > ----------------------------------------------------------- > > >>>> > > >>>> > This is an automatically generated e-mail. To reply, visit: > > >>>> > http://reviews.gem5.org/r/1805/ > > >>>> > ----------------------------------------------------------- > > >>>> > > > >>>> > (Updated March 29, 2013, 7:47 p.m.) > > >>>> > > > >>>> > > > >>>> > Review request for Default. > > >>>> > > > >>>> > > > >>>> > Description > > >>>> > ------- > > >>>> > > >>>> > > > >>>> > Mark ARM IT (if-then) instructions as nops. > > >>>> > > > >>>> > ARM's IT instructions predicate up to the next 4 instructions on > > >>>> various condition codes. IT instructions really just send control > > signals > > >>>> to the decoder, after decode they do not read or write any > registers. > > >>>> Marking them as nops (along with the other patch that drops nops at > > decode) > > >>>> saves execution resources and bandwidth. > > >>>> > > > >>>> > > > >>>> > Diffs > > >>>> > ----- > > >>>> > > > >>>> > src/arch/arm/isa/insts/misc.isa 47591444a7c5 > > >>>> > > > >>>> > Diff: http://reviews.gem5.org/r/1805/diff/ > > >>>> > > > >>>> > > > >>>> > Testing > > >>>> > ------- > > >>>> > > > >>>> > A fast libquantum run. > > >>>> > > > >>>> > > > >>>> > Thanks, > > >>>> > > > >>>> > Mitch Hayenga > > >>>> > > > >>>> > > > >>>> > > >>>> _______________________________________________ > > >>>> gem5-dev mailing list > > >>>> [email protected] > > >>>> http://m5sim.org/mailman/listinfo/gem5-dev > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> - Korey > > >>> > > >> > > >> > > > > > > > > > -- > > > - Korey > > > > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > > -- > - Korey > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
