Hi Mitch, Thanks for the quick response. I pretty much agree with the sentiment that this is a valid optimization but probably disagree a bit on going forward with (3).
I think you pose a valid question of "If it's already acceptable to not count ISA-level nops towards IPC, why not IT instructions as well?". My answer to that would be that whereas nops/prefetches can safely be ignored and not affect instruction order, you can't literally ignore an IT instruction without affecting instruction order. If I err in that reasoning, then I think I'd be OK with #3, but if it's the case where the output of the IT instruction is actually needed to alter control flow then I don't think it's OK to treat it as a nop and ignore it in stats. I'd be for #1 actually. Although it may sound "hackish", each ISA does have it's own quirks and at commit I wouldn't be against checking the ISA-specific state to figure out if this were a optimized instruction (mark a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), check to see if this is flag is asserted but the committed flag isn't. If not, count it as a committed op. Lastly, this optimization could also applied to any branch instructions that get resolved at decode, right? -Korey On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga < [email protected]> wrote: > Re-sending this so it gets sent to the list. > > Yes, right now this would not properly credit IPC for IT instructions, > since nops don't count towards IPC. I overlooked that since I use > execution time as my evaluation metric. > > Three quick thoughts on this... > 1) A quick solution would be to look at the ITstate of committing ops and > infer a dropped IT instruction. This would be a bit hackish and ARM > specific though. > 2) Maintaining the current method of sending nops through the pipeline > could be made to work. By going through and modifying the code to be sure > nops did not count against bandwidth or size restrictions. You'd also have > to worry about not impacting stats like rob reads/writes that the McPAT > users would feed to their power models. And at commit you'd still have to > special case the IT instruction to make sure it got counted. > 3) If it's already acceptable to not count ISA-level nops towards IPC, > why not IT instructions as well. They do feed some information to the > decoder, but overall their relative work isn't much more than a nop (being > fetched + decoded). They also potentially do far less work than a prefetch > instruction (which is also not counted). > > I personally like 3, since the current subset of instructions counted > towards IPC already seems to have a bit of arbitrariness and would require > no changes. > > PS: I coded this up because I noticed a few times where up to 1/5 of my > instruction window could be occupied by "useless" IT instructions > > > > On Sun, Mar 31, 2013 at 10:50 PM, Korey Sewell <[email protected]> wrote: > >> Hi Mitch, >> Another thing I wonder about with this patch is the impact on stats. >> >> If I recall right, O3 throws aways nops. So when we talk about IPC with >> this patch in, we aren't giving the CPU "credit" for doing what's necessary >> for the ARM IT instruction right? >> >> I'm thinking there may need to be another patch supplemented to this that >> counts the # of times this optimization happens. That way, we have all the >> bases covered for instruction/IPC counting. >> >> Thoughts? >> >> -Korey >> >> >> >> On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga < >> [email protected]> wrote: >> >>> >>> >>> > On March 30, 2013, 7:31 a.m., Ali Saidi wrote: >>> > > While this seems harmless enough, I wonder if there is some >>> interaction between faults/interrupts and the instruction that we should >>> worry about. I haven't given it enough thought to say either way, but it >>> seems like it could be a concern. >>> >>> I thought about it somewhat, since IT blocks are required to be able to >>> handle faults and return to execution properly within an IT block. It >>> seems the gem5 solution is probably similar to what a real processor >>> implementation would use, appending the IT state to the PC. So an >>> exception/interrupt within an IT block would just return and the decoder >>> would pick off the extra IT bits from the PC (that detail how to predicate >>> up to the next 3 ops). If the exception/interrupt was just prior to the IT >>> instruction, it would just get sent to the decoder like normal. >>> >>> I was thinking more on the "discarding nops at decode" part. The only >>> case I think that could give that trouble is self-modifying code, since >>> you'd want to track instruction addresses to know if a snooped write >>> changed a currently executing instruction. But gem5 doesn't really provide >>> that now anyway and you could use cheaper structures to perform that >>> operation (since false positives would be ok). >>> >>> >>> - Mitch >>> >>> >>> ----------------------------------------------------------- >>> >>> This is an automatically generated e-mail. To reply, visit: >>> http://reviews.gem5.org/r/1805/#review4177 >>> ----------------------------------------------------------- >>> >>> >>> On March 29, 2013, 7:47 p.m., Mitch Hayenga wrote: >>> > >>> > ----------------------------------------------------------- >>> >>> > This is an automatically generated e-mail. To reply, visit: >>> > http://reviews.gem5.org/r/1805/ >>> > ----------------------------------------------------------- >>> > >>> > (Updated March 29, 2013, 7:47 p.m.) >>> > >>> > >>> > Review request for Default. >>> > >>> > >>> > Description >>> > ------- >>> >>> > >>> > Mark ARM IT (if-then) instructions as nops. >>> > >>> > ARM's IT instructions predicate up to the next 4 instructions on >>> various condition codes. IT instructions really just send control signals >>> to the decoder, after decode they do not read or write any registers. >>> Marking them as nops (along with the other patch that drops nops at decode) >>> saves execution resources and bandwidth. >>> > >>> > >>> > Diffs >>> > ----- >>> > >>> > src/arch/arm/isa/insts/misc.isa 47591444a7c5 >>> > >>> > Diff: http://reviews.gem5.org/r/1805/diff/ >>> > >>> > >>> > Testing >>> > ------- >>> > >>> > A fast libquantum run. >>> > >>> > >>> > Thanks, >>> > >>> > Mitch Hayenga >>> > >>> > >>> >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >>> >> >> >> >> -- >> - Korey >> > > -- - Korey _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
