On Fri, 28 Oct 2011, Beckmann, Brad wrote:
Let???s move this conversation to just the email thread.
I suspect we may be talking past each other, so let???s talk about the
complete implementations not just Ruby. There are multiple ways one can
implement the store portion of x86-TSO. I???m not sure what the O3
model does, but here are a few possibilities:
- Do not issue any part of the store to the memory system when the
instruction is executed. Instead, simply buffer it in the LSQ until the
instruction retires, then buffer in the store buffer after retirement.
Only when the store reaches the head of the store buffer, issue it to
Ruby. The next store is not issued to Ruby until the previous store
head completes, maintaining correct store ordering.
- Do not issue any part of the store to the memory system when the
instruction is executed. Instead, simply buffer it in the LSQ until the
instruction retires. Once it retires and enters the store buffer and we
issue the address request to Ruby (no L1 data update). Ruby forwards
probes/replacemetns to the store buffer and if the store buffer sees a
probe/replacement to an address who???s address request has already
completed, the store buffer reissues the request. Once the store
reaches the head of the store buffer, double check with Ruby that write
permissions still exist in the L1.
- Issue the store address (no L1 data update) to Ruby when the
instruction is executed. When it retires, it enters the store buffer.
Ruby forwards probes/replacemetns to the LSQ+store buffer and if either
sees a probe/replacement to an address who???s address request has
already completed, the request reissues (several policies exist on when
to reissue the request). Once the store reaches the head of the store
buffer, double check with Ruby that write permissions still exist in the
L1.
Do those scenarios make sense to you? I believe we can implement any
one of them without modifying Ruby???s core functionality. If you are
envisioning or if O3 implements something completely different, please
let me know.
1. What's current memory model that the O3 CPU implements? Do we want
multiple memory models to co-exist? We might want to have both SC and TSO,
though Alpha had a weaker model.
2. I think we should try to stick what the O3 CPU implements currently,
meaning we should not change the stage when the store is issued to the
cache. I am more concerned about how multiple ports get handled.
--
Nilay
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev