In sum, I think we all agree that Ruby is going to handle *only
non-speculative stores*. M5 CPU model(s) handles all of speculative and
non-speculative stores that are *yet to be revealed to the memory
sub-system*.
To make it clearer, as I understand, we now have following:
1. All store buffering (speculative and non-speculative) is handled by
CPU model in M5.
2. Ruby needs to forward intervention/invalidation received at L1 cache
controller to the CPU model to let it take appropriate action to
guarantee required memory consistency guarantees (e.g t may need to
flush pipeline).
OR
CPU models need to check coherence permission at L1 cache at the
commit time to know if intervening writes has happened or not (might be
required to implement stricter model like SC).
I think we need to provide one of the functionality from Ruby side to
allow the second condition above. Which one to provide depends upon what
M5 CPU models wants to do to guarantee consistency.
Please let me know if you disagree or if I am missing something.
Thanks
Arka
On 02/24/2011 05:22 PM, Beckmann, Brad wrote:
So I think Steve and I are in agreement here. We both agree that both
speculative and non-speculative store buffers should be on the CPU side of the
RubyPort interface. I believe that was the same line that existed when Ruby
tied to Opal in GEMS. I believe the non-speculative store buffer was only a
feature used when Opal was not attached, and it was just the simple
SimicsProcessor driving Ruby.
The sequencer is a separate issue. Certain functionality of the sequencer can
probably be eliminated in gem5, but I think other functionality needs to remain
or at least be moved to some other part of Ruby. The sequencer performs a lot
of protocol independent functionality including: updating the actual data
block, performing synchronization with respect to the cache memory, translating
m5 packets to ruby requests, checking for per-cacheblock deadlock, and
coalescing requests to the same cache block. The coalescing functionality can
probably be eliminated, but I think the other functionality needs to remain.
Brad
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of
Steve Reinhardt
Sent: Thursday, February 24, 2011 1:52 PM
To: M5 Developer List
Subject: Re: [m5-dev] Store Buffer
On Thu, Feb 24, 2011 at 1:32 PM, Nilay
Vaish<ni...@cs.wisc.edu<mailto:ni...@cs.wisc.edu>> wrote:
On Thu, 24 Feb 2011, Beckmann, Brad wrote:
Steve, I think we are in agreement here and we may just be disagreeing with the
definition of speculative. From the Ruby perspective, I don't think it really
matters...I don't think there is difference between a speculative store address
request and a prefetch-with-write-intent. Also we agree that probes will need
to be sent to O3 LSQ to support the consistency model.
My point is that if we believe this functionality is required, what is the
extra overhead of adding a non-speculative store buffer to the O3 model as
well? I think that will be easier than trying to incorporate the current Ruby
non-speculative store buffer into each protocol.
I don't know the O3 LSQ model very well, but I assume it buffers both
speculative and non-speculative stores. Are there two different structures in
Ruby for that?
I think the general issue here is that the dividing line between "processor" and "memory
system" is different in M5 than it was with GEMS. with M5 assuming that write buffers, redundant request
filtering, etc. all happens in the "processor". For example, I know I've had you explain this to
me multiple times already, but I still don't understand why we still need Ruby sequencers either :-).
Brad, I raise the same point that Arka raised earlier. Other processor models
can also make use of store buffer. So, why only O3 should have a store buffer?
Nilay, I think that's a different issue... we're not saying that other CPU
models can't have store buffers, but in practice, the simple CPU models block
on memory accesses so they don't need one. If the inorder model wants to add a
store buffer (if it doesn't already have one), it would be an internal decision
for them whether they want to write one from scratch or try to reuse the O3
code. There are already some shared structures in src/cpu like branch
predictors that can be reused across CPU models.
So in other words we need to decide first where the store buffer should live
(CPU or memory system) and then we can worry about how to reuse that code if
that's useful.
Steve
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev