What we really need for x86 is more general locking. We've added a LOCKED flag to the mem request, and the current x86 microcode uses this for locked operations, where the semantics are that a load with the LOCKED flag (which should "lock" the cache block) is always followed by a store with the LOCKED flag that clears the lock. Locking out all other accesses between those operations is certainly a reasonable way to implement that; the O3 model is the only one that even has the potential to insert other operations between those, but it probably won't since I think (most? all?) locked operations are also serializing... and if that's not sufficient, I'm sure we can find a way to prevent it anyway.
The swap/cmp&swap support was added for SPARC (I don't believe we have a fetch&add op), but the long-term goal is to get rid of that and make SPARC use the same mechanism as x86, which sounds in line with what you're proposing. I think this might require making CAS a microcoded operation in SPARC, but I don't think that's a big deal... Gabe or Ali, let me know if I'm wrong. Note that to support cmp&swap and the x86 CMPXCHG we'd need the microcode to unconditionally do a write to release the lock, even if it's just writing back the old value. I believe this is not unrealistic, but if we want to do otherwise we'll have to extend the protocol somehow. Steve On Tue, Jul 7, 2009 at 1:00 PM, Derek Hower <d...@cs.wisc.edu> wrote: > We (Wisconsin) are working on implementing atomics in Ruby. For LL/SC, we > are going to stick with the implementation from the original GEM5 that > closely mimics how LL/SC is/was handled in the M5 cache. > > For single instructions atomics (fetch&add, comp&swap, etc), we are > thinking about an implementation that just gives the processor priority at > the cache controller on the second half of a RMW, i.e., immediately after > the read of an RMW completes, we will make sure that the next message the > cache controller handles comes from the processor, which should be the RMW > write. For this to work, we have to guarantee that the next request from > the cpu model after an RMW read returns is the corresponding RMW write. For > Bochs this is always true. Is this also true in M5 (prefetches excluded, as > we can probably just deprioritize those)? > > The benefit of this approach is that it does not require any changes to the > existing protocols. The drawback is that it may not closely mimic what is > done in real hardware. Does anyone have a better approach that would more > closely approximate a real implementation? > > -Derek > > > > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev > >
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev