Hi Mitch, I see what you are saying about the atomicity aspect of the IT block. Those are fair points. Likewise, it's fair to optimize them about past decode like you what your patch does.
I'm looking for something extra such that another CPU model (or code) will not look at that instruction and think it's just a "nop". For instance, the prefetch instruction is marked with a "Prefetch" flag which allows a CPU model to check for prefetch and handle them differently if it wishes to. To me, it looks like the converged solution is: 1) add a flag called "isPurePredicate" (or a better name!) in DynInst. 2) Then, in your patch you can give the instruction two flags: "isNop" and "isPurePredicate". 3) Finally, when the instruction is removed from the CPU, you check to see if the "isPurePredicate" is asserted and if the instruction is not squashed. If that condition is true, increment a stat counting how many times we performed this optimization. I'm hoping this both eliminates the IT instruction from the back-end (isNop flag)and then allows for a fair accounting of that optimization in the end of simulation stats (isPurePredicate flag). Would you agree with that? On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga <[email protected] > wrote: > "Lastly, this optimization could also applied to any branch instructions > that get resolved at decode, right?" > That's a good one that I'm definitely going to implement. > > I think whoever wrote the current IPC counting mechanism was trying to > measure backend IPC and not total IPC. This makes sense by counting data > prefetches but not instruction prefetches towards IPC. > > I'm still with ignoring IT instructions though, since it was originally > created when ARM shrank their opcodes for the THUMB instruction set and > didn't have enough bits to do their normal predication encoding. IT > instructions just allow the decoder to save and append these bits to > recreate the full ARM opcode. They've also made IT blocks be as atomic as > possible (only the last instruction is allowed to be a branch and jumps, > other than exception returns, into IT blocks are not permitted). So, in my > mind IT instructions are effectively part of the "instruction" that the > entire block comprises. > > > On Mon, Apr 1, 2013 at 11:16 AM, Korey Sewell <[email protected]> wrote: > > > Hi Mitch, > > Thanks for the quick response. I pretty much agree with the sentiment > that > > this is a valid optimization but probably disagree a bit on going forward > > with (3). > > > > I think you pose a valid question of "If it's already acceptable to not > > count ISA-level nops towards IPC, why not IT instructions as well?". My > > answer to that would be that whereas nops/prefetches can safely be > ignored > > and not affect instruction order, you can't literally ignore an IT > > instruction without affecting instruction order. > > > > If I err in that reasoning, then I think I'd be OK with #3, but if it's > > the case where the output of the IT instruction is actually needed to > alter > > control flow then I don't think it's OK to treat it as a nop and ignore > it > > in stats. > > > > I'd be for #1 actually. Although it may sound "hackish", each ISA does > > have it's own quirks and at commit I wouldn't be against checking the > > ISA-specific state to figure out if this were a optimized instruction > (mark > > a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), check > > to see if this is flag is asserted but the committed flag isn't. If not, > > count it as a committed op. > > > > Lastly, this optimization could also applied to any branch instructions > > that get resolved at decode, right? > > > > -Korey > > On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga < > > [email protected]> wrote: > > > >> Re-sending this so it gets sent to the list. > >> > >> Yes, right now this would not properly credit IPC for IT instructions, > >> since nops don't count towards IPC. I overlooked that since I use > >> execution time as my evaluation metric. > >> > >> Three quick thoughts on this... > >> 1) A quick solution would be to look at the ITstate of committing ops > >> and infer a dropped IT instruction. This would be a bit hackish and ARM > >> specific though. > >> 2) Maintaining the current method of sending nops through the pipeline > >> could be made to work. By going through and modifying the code to be > sure > >> nops did not count against bandwidth or size restrictions. You'd also > have > >> to worry about not impacting stats like rob reads/writes that the McPAT > >> users would feed to their power models. And at commit you'd still have > to > >> special case the IT instruction to make sure it got counted. > >> 3) If it's already acceptable to not count ISA-level nops towards IPC, > >> why not IT instructions as well. They do feed some information to the > >> decoder, but overall their relative work isn't much more than a nop > (being > >> fetched + decoded). They also potentially do far less work than a > prefetch > >> instruction (which is also not counted). > >> > >> I personally like 3, since the current subset of instructions counted > >> towards IPC already seems to have a bit of arbitrariness and would > require > >> no changes. > >> > >> PS: I coded this up because I noticed a few times where up to 1/5 of my > >> instruction window could be occupied by "useless" IT instructions > >> > >> > >> > >> On Sun, Mar 31, 2013 at 10:50 PM, Korey Sewell <[email protected]> > wrote: > >> > >>> Hi Mitch, > >>> Another thing I wonder about with this patch is the impact on stats. > >>> > >>> If I recall right, O3 throws aways nops. So when we talk about IPC with > >>> this patch in, we aren't giving the CPU "credit" for doing what's > necessary > >>> for the ARM IT instruction right? > >>> > >>> I'm thinking there may need to be another patch supplemented to this > >>> that counts the # of times this optimization happens. That way, we > have all > >>> the bases covered for instruction/IPC counting. > >>> > >>> Thoughts? > >>> > >>> -Korey > >>> > >>> > >>> > >>> On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga < > >>> [email protected]> wrote: > >>> > >>>> > >>>> > >>>> > On March 30, 2013, 7:31 a.m., Ali Saidi wrote: > >>>> > > While this seems harmless enough, I wonder if there is some > >>>> interaction between faults/interrupts and the instruction that we > should > >>>> worry about. I haven't given it enough thought to say either way, but > it > >>>> seems like it could be a concern. > >>>> > >>>> I thought about it somewhat, since IT blocks are required to be able > to > >>>> handle faults and return to execution properly within an IT block. It > >>>> seems the gem5 solution is probably similar to what a real processor > >>>> implementation would use, appending the IT state to the PC. So an > >>>> exception/interrupt within an IT block would just return and the > decoder > >>>> would pick off the extra IT bits from the PC (that detail how to > predicate > >>>> up to the next 3 ops). If the exception/interrupt was just prior to > the IT > >>>> instruction, it would just get sent to the decoder like normal. > >>>> > >>>> I was thinking more on the "discarding nops at decode" part. The only > >>>> case I think that could give that trouble is self-modifying code, > since > >>>> you'd want to track instruction addresses to know if a snooped write > >>>> changed a currently executing instruction. But gem5 doesn't really > provide > >>>> that now anyway and you could use cheaper structures to perform that > >>>> operation (since false positives would be ok). > >>>> > >>>> > >>>> - Mitch > >>>> > >>>> > >>>> ----------------------------------------------------------- > >>>> > >>>> This is an automatically generated e-mail. To reply, visit: > >>>> http://reviews.gem5.org/r/1805/#review4177 > >>>> ----------------------------------------------------------- > >>>> > >>>> > >>>> On March 29, 2013, 7:47 p.m., Mitch Hayenga wrote: > >>>> > > >>>> > ----------------------------------------------------------- > >>>> > >>>> > This is an automatically generated e-mail. To reply, visit: > >>>> > http://reviews.gem5.org/r/1805/ > >>>> > ----------------------------------------------------------- > >>>> > > >>>> > (Updated March 29, 2013, 7:47 p.m.) > >>>> > > >>>> > > >>>> > Review request for Default. > >>>> > > >>>> > > >>>> > Description > >>>> > ------- > >>>> > >>>> > > >>>> > Mark ARM IT (if-then) instructions as nops. > >>>> > > >>>> > ARM's IT instructions predicate up to the next 4 instructions on > >>>> various condition codes. IT instructions really just send control > signals > >>>> to the decoder, after decode they do not read or write any registers. > >>>> Marking them as nops (along with the other patch that drops nops at > decode) > >>>> saves execution resources and bandwidth. > >>>> > > >>>> > > >>>> > Diffs > >>>> > ----- > >>>> > > >>>> > src/arch/arm/isa/insts/misc.isa 47591444a7c5 > >>>> > > >>>> > Diff: http://reviews.gem5.org/r/1805/diff/ > >>>> > > >>>> > > >>>> > Testing > >>>> > ------- > >>>> > > >>>> > A fast libquantum run. > >>>> > > >>>> > > >>>> > Thanks, > >>>> > > >>>> > Mitch Hayenga > >>>> > > >>>> > > >>>> > >>>> _______________________________________________ > >>>> gem5-dev mailing list > >>>> [email protected] > >>>> http://m5sim.org/mailman/listinfo/gem5-dev > >>>> > >>> > >>> > >>> > >>> -- > >>> - Korey > >>> > >> > >> > > > > > > -- > > - Korey > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > -- - Korey _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
