Brad, your reply clears some air.
The current patch allows us to use the existing O3 CPU with Ruby. Since
the O3 CPU already provides Alpha's memory model, we get that for free.
Now that we would like to have TSO as well, we need to work out how the
two models would co-exist. I'll think more about this, but we need a
broader consensus on this.
--
Nilay
On Wed, 9 Nov 2011, Beckmann, Brad wrote:
I see. It sounds like you're still worried about how the RubyPort can
support multiple M5 cpu ports and still adhere to a stronger consistency
model. Sorry for not directly responding to that question earlier, but
to me that seems like an orthogonal issue that you've already solved.
If I recall correctly, the patch you sent out for review essentially
attaches the multiple M5 cpu ports, representing simultaneous cpu
requests, to the single RubyPort that represents the CPUs connection to
the L1 caches. That seems reasonable to me and I don't see any problem
with it. The key is that the cpu LSQ cannot blindly issue simultaneous
requests to the memory system without expecting and acting upon probes
that occur between issue and retirement. Furthermore, the CPU needs to
communicate to Ruby when the instructions associated with the memory
operations retire (for loads) or reach the head of the store buffer (for
stores). Once Ruby receives that notification, it can stop monitoring
that location and move the cache block to a base state.
Now to answer your specific question: We are definitely interested in a
TSO model and in my opinion that is the only consistency model that we
have to implement. Remember TSO is a valid implementation of Alpha's or
ARM's weaker models. We can certainly implement subsequent models, but
that should not be a short term goal.
I know this can be a complicated subject so please send me questions if
you disagree or are confused. I certainly may be overlooking something
and my thoughts are constantly evolving as well as I page more of this
into my memory. For instance, I realize that my previous mail was
incorrect because I confused the LSQ, which contains pre-retirement
memory instructions, with the store buffer, which contains
post-retirement store instruction values. If a probe hits in the store
buffer, the CPU doesn't (it can't) reissue the store instruction. The
store buffer shields the CPU from that probe. As long as the cache has
write permission when the store reaches the head of the store buffer,
stores have a global order and TSO is maintained. Of course probing
loads in the LSQ also needs to occur, along with several other features
for supporting locks, fences, etc.
If you do have further questions, please be specific as possible. It is
hard to talk about this subject using generalities.
Brad
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev