It sounds like you guys are doing a good job of working this out, but I have a few comments. Sorry not to jump in sooner.
- It's absolutely true that O3 was written for Alpha and as such did not have to worry about the CPU model enforcing any memory orderings beyond the explicit barrier/fence ops. It's no surprise that O3 needs to be modified to support stronger consistency models. - While TSO is a valid implementation of the Alpha memory model, we should not unnecessarily restrict performance by constraining memory order. Note that even though Alpha is not used much, we have other ISAs (most notably ARM but also Power) that have weak memory models. In the near term it's fine to just work on implementing TSO without considering Alpha, but for the final commit it would be good to find a minimal set of changes that enforce TSO and condition them on the ISA being x86 (or if we want to get fancy, we could introduce a "memory consistency model" flag and set it to TSO when the ISA is x86). - Since we need to implement some consistency mechanism in O3 more or less from scratch, I suggest we do a reasonably aggressive mechanism that corresponds most closely with what modern processors do, without being overly complicated. O3 is not intended to be an extremely accurate model of any particular modern CPU, but we don't want to create unnecessary differences between its behavior and that of a typical modern CPU either. - If we need to make some changes in the Port interface to make this work well, that's OK. Someday I would still like to see Port and RubyPort integrated so we don't have to do a translation between the two structs on every memory access. That probably doesn't affect this directly, but it's good to keep in mind as we evolve the code. Steve On Wed, Nov 9, 2011 at 3:25 PM, Nilay Vaish <[email protected]> wrote: > Brad, your reply clears some air. > > The current patch allows us to use the existing O3 CPU with Ruby. Since > the O3 CPU already provides Alpha's memory model, we get that for free. Now > that we would like to have TSO as well, we need to work out how the two > models would co-exist. I'll think more about this, but we need a broader > consensus on this. > > > -- > Nilay > > On Wed, 9 Nov 2011, Beckmann, Brad wrote: > > I see. It sounds like you're still worried about how the RubyPort can >> support multiple M5 cpu ports and still adhere to a stronger consistency >> model. Sorry for not directly responding to that question earlier, but to >> me that seems like an orthogonal issue that you've already solved. If I >> recall correctly, the patch you sent out for review essentially attaches >> the multiple M5 cpu ports, representing simultaneous cpu requests, to the >> single RubyPort that represents the CPUs connection to the L1 caches. That >> seems reasonable to me and I don't see any problem with it. The key is >> that the cpu LSQ cannot blindly issue simultaneous requests to the memory >> system without expecting and acting upon probes that occur between issue >> and retirement. Furthermore, the CPU needs to communicate to Ruby when the >> instructions associated with the memory operations retire (for loads) or >> reach the head of the store buffer (for stores). Once Ruby receives that >> notification, it can stop monitoring that location and move the cache block >> to a base state. >> >> Now to answer your specific question: We are definitely interested in a >> TSO model and in my opinion that is the only consistency model that we have >> to implement. Remember TSO is a valid implementation of Alpha's or ARM's >> weaker models. We can certainly implement subsequent models, but that >> should not be a short term goal. >> >> I know this can be a complicated subject so please send me questions if >> you disagree or are confused. I certainly may be overlooking something and >> my thoughts are constantly evolving as well as I page more of this into my >> memory. For instance, I realize that my previous mail was incorrect >> because I confused the LSQ, which contains pre-retirement memory >> instructions, with the store buffer, which contains post-retirement store >> instruction values. If a probe hits in the store buffer, the CPU doesn't >> (it can't) reissue the store instruction. The store buffer shields the CPU >> from that probe. As long as the cache has write permission when the store >> reaches the head of the store buffer, stores have a global order and TSO is >> maintained. Of course probing loads in the LSQ also needs to occur, along >> with several other features for supporting locks, fences, etc. >> >> If you do have further questions, please be specific as possible. It is >> hard to talk about this subject using generalities. >> >> Brad >> >> >> ______________________________**_________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
