Re: [gem5-dev] Review Request: Forward invalidations from Ruby to O3 CPU

Beckmann, Brad Wed, 09 Nov 2011 12:26:40 -0800

I see.  It sounds like you're still worried about how the RubyPort can support 
multiple M5 cpu ports and still adhere to a stronger consistency model.  Sorry 
for not directly responding to that question earlier, but to me that seems like 
an orthogonal issue that you've already solved.  If I recall correctly, the 
patch you sent out for review essentially attaches the multiple M5 cpu ports, 
representing simultaneous cpu requests, to the single RubyPort that represents 
the CPUs connection to the L1 caches.  That seems reasonable to me and I don't 
see any problem with it.  The key is that the cpu LSQ cannot blindly issue 
simultaneous requests to the memory system without expecting and acting upon 
probes that occur between issue and retirement.  Furthermore, the CPU needs to 
communicate to Ruby when the instructions associated with the memory operations 
retire (for loads) or reach the head of the store buffer (for stores).  Once 
Ruby receives that notification, it can stop monitoring that location and move 
the cache block to a base state.


Now to answer your specific question: We are definitely interested in a TSO 
model and in my opinion that is the only consistency model that we have to 
implement.  Remember TSO is a valid implementation of Alpha's or ARM's weaker 
models.  We can certainly implement subsequent models, but that should not be a 
short term goal.

I know this can be a complicated subject so please send me questions if you 
disagree or are confused.  I certainly may be overlooking something and my 
thoughts are constantly evolving as well as I page more of this into my memory. 
 For instance, I realize that my previous mail was incorrect because I confused 
the LSQ, which contains pre-retirement memory instructions, with the store 
buffer, which contains post-retirement store instruction values.  If a probe 
hits in the store buffer, the CPU doesn't (it can't) reissue the store 
instruction.  The store buffer shields the CPU from that probe.  As long as the 
cache has write permission when the store reaches the head of the store buffer, 
stores have a global order and TSO is maintained.  Of course probing loads in 
the LSQ also needs to occur, along with several other features for supporting 
locks, fences, etc.

If you do have further questions, please be specific as possible.  It is hard 
to talk about this subject using generalities.

Brad


> -----Original Message-----
> From: Nilay Vaish [mailto:[email protected]]
> Sent: Wednesday, November 09, 2011 10:15 AM
> To: Beckmann, Brad
> Cc: Default; Mark D. Hill
> Subject: RE: Review Request: Forward invalidations from Ruby to O3 CPU
> 
> Brad,
> 
> As long as we use multiple ports only to fetch coherence permissions and
> only one store is performed at a time, it is intutively clear to me that SC 
> and
> TSO can be implemented. But if we implement this, it might mean forgoing
> the Alpha-like memory model that we have in place right now. This goes back
> to my earlier question on what memory model(s) are we interested in? Do
> we prefer co-existence of multiple memory models?
> 
> --
> Nilay
> 
> On Wed, 9 Nov 2011, Beckmann, Brad wrote:
> 
> > Hi Nilay,
> >
> > With regards to your question about how to allow multiple simultaneous
> > stores, do you not believe my second and third proposals achieve that?
> >
> > As I stated before, I don't think we need to make any fundamental
> > changes to Ruby.  We just need to provide the correct information and
> > interfaces to the LSQ/Store Buffer.
> >
> > Brad
> >
> >
> >> -----Original Message-----
> >> From: Nilay Vaish [mailto:[email protected]]
> >> Sent: Tuesday, November 08, 2011 6:12 PM
> >> To: Beckmann, Brad
> >> Cc: Default; Mark D. Hill
> >> Subject: RE: Review Request: Forward invalidations from Ruby to O3
> >> CPU
> >>
> >> On Wed, 2 Nov 2011, Nilay Vaish wrote:
> >>
> >>> On Fri, 28 Oct 2011, Beckmann, Brad wrote:
> >>>
> >>>> Let’s move this conversation to just the email thread.
> >>>>
> >>>> I suspect we may be talking past each other, so let’s talk about
> >>>> the complete implementations not just Ruby.  There are multiple
> >>>> ways one
> >> can
> >>>> implement the store portion of x86-TSO.  I’m not sure what the O3
> >> model
> >>>> does, but here are a few possibilities:
> >>>>
> >>>> - Do not issue any part of the store to the memory system when the
> >>>> instruction is executed.  Instead, simply buffer it in the LSQ
> >>>> until
> >> the
> >>>> instruction retires, then buffer in the store buffer after
> >> retirement. Only
> >>>> when the store reaches the head of the store buffer, issue it to
> >> Ruby.  The
> >>>> next store is not issued to Ruby until the previous store head
> >> completes,
> >>>> maintaining correct store ordering.
> >>>>
> >>>> - Do not issue any part of the store to the memory system when the
> >>>> instruction is executed.  Instead, simply buffer it in the LSQ
> >>>> until
> >> the
> >>>> instruction retires.  Once it retires and enters the store buffer
> >> and we
> >>>> issue the address request to Ruby (no L1 data update).  Ruby
> >> forwards
> >>>> probes/replacemetns to the store buffer and if the store buffer
> >>>> sees
> >> a
> >>>> probe/replacement to an address who’s address request has already
> >>>> completed, the store buffer reissues the request.  Once the store
> >> reaches
> >>>> the head of the store buffer, double check with Ruby that write
> >> permissions
> >>>> still exist in the L1.
> >>>>
> >>>> - Issue the store address (no L1 data update) to Ruby when the
> >> instruction
> >>>> is executed.  When it retires, it enters the store buffer. Ruby
> >> forwards
> >>>> probes/replacemetns to the LSQ+store buffer and if either sees a
> >>>> probe/replacement to an address who’s address request has already
> >>>> completed, the request reissues (several policies exist on when to
> >> reissue
> >>>> the request).  Once the store reaches the head of the store buffer,
> >> double
> >>>> check with Ruby that write permissions still exist in the L1.
> >>>>
> >>>> Do those scenarios make sense to you?  I believe we can implement
> >> any one
> >>>> of them without modifying Ruby’s core functionality.  If you are
> >>>> envisioning or if O3 implements something completely different,
> >> please let
> >>>> me know.
> >>>>
> >>>
> >>> 1. What's current memory model that the O3 CPU implements? Do we
> >>> want multiple memory models to co-exist? We might want to have both
> >>> SC and
> >> TSO,
> >>> though Alpha had a weaker model.
> >>>
> >>> 2. I think we should try to stick what the O3 CPU implements
> >> currently,
> >>> meaning we should not change the stage when the store is issued to
> >> the cache.
> >>> I am more concerned about how multiple ports get handled.
> >>>
> >>
> >> Looking at the trace generated by the toy application I use for
> >> testing the O3 CPU and Ruby combination, I have been able to confirm
> >> my suspicion that stores can become visible to the rest of the system
> >> in an order different from the program order.
> >>
> >> It might be that the classic memory system does not allow stores to
> >> go out of order. Or that the initial implementation of the O3 CPU was
> >> for a weaker memory model like that of Alpha architecture (Prof. Hill
> >> suggested that this might be the case).
> >>
> >> Overall I am still not clear on how to make O3 and Ruby work together
> >> correctly for SC or TSO, in case when multiple stores can be issued
> >> to the memory system in parallel.
> >>
> >> --
> >> Nilay
> >
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request: Forward invalidations from Ruby to O3 CPU

Reply via email to