What we really need for x86 is more general locking.  We've added a LOCKED
flag to the mem request, and the current x86 microcode uses this for locked
operations, where the semantics are that a load with the LOCKED flag (which
should "lock" the cache block) is always followed by a store with the LOCKED
flag that clears the lock.  Locking out all other accesses between those
operations is certainly a reasonable way to implement that; the O3 model is
the only one that even has the potential to insert other operations between
those, but it probably won't since I think (most? all?) locked operations
are also serializing... and if that's not sufficient, I'm sure we can find a
way to prevent it anyway.

The swap/cmp&swap support was added for SPARC (I don't believe we have a
fetch&add op), but the long-term goal is to get rid of that and make SPARC
use the same mechanism as x86, which sounds in line with what you're
proposing.  I think this might require making CAS a microcoded operation in
SPARC, but I don't think that's a big deal... Gabe or Ali, let me know if
I'm wrong.

Note that to support cmp&swap and the x86 CMPXCHG we'd need the microcode to
unconditionally do a write to release the lock, even if it's just writing
back the old value.  I believe this is not unrealistic, but if we want to do
otherwise we'll have to extend the protocol somehow.

Steve

On Tue, Jul 7, 2009 at 1:00 PM, Derek Hower <d...@cs.wisc.edu> wrote:

> We (Wisconsin) are working on implementing atomics in Ruby.  For LL/SC, we
> are going to stick with the implementation from the original GEM5 that
> closely mimics how LL/SC is/was handled in the M5 cache.
>
> For single instructions atomics (fetch&add, comp&swap, etc), we are
> thinking about an implementation that just gives the processor priority at
> the cache controller on the second half of a RMW, i.e., immediately after
> the read of an RMW completes, we will make sure that the next message the
> cache controller handles comes from the processor, which should be the RMW
> write.  For this to work, we have to guarantee that the next request from
> the cpu model after an RMW read returns is the corresponding RMW write.  For
> Bochs this is always true.  Is this also true in M5 (prefetches excluded, as
> we can probably just deprioritize those)?
>
> The benefit of this approach is that it does not require any changes to the
> existing protocols.  The drawback is that it may not closely mimic what is
> done in real hardware.  Does anyone have a better approach that would more
> closely approximate a real implementation?
>
> -Derek
>
>
>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to