It sounds like you guys are doing a good job of working this out, but I
have a few comments.  Sorry not to jump in sooner.

- It's absolutely true that O3 was written for Alpha and as such did not
have to worry about the CPU model enforcing any memory orderings beyond the
explicit barrier/fence ops.  It's no surprise that O3 needs to be modified
to support stronger consistency models.

- While TSO is a valid implementation of the Alpha memory model, we should
not unnecessarily restrict performance by constraining memory order.  Note
that even though Alpha is not used much, we have other ISAs (most notably
ARM but also Power) that have weak memory models.  In the near term it's
fine to just work on implementing TSO without considering Alpha, but for
the final commit it would be good to find a minimal set of changes that
enforce TSO and condition them on the ISA being x86 (or if we want to get
fancy, we could introduce a "memory consistency model" flag and set it to
TSO when the ISA is x86).

- Since we need to implement some consistency mechanism in O3 more or less
from scratch, I suggest we do a reasonably aggressive mechanism that
corresponds most closely with what modern processors do, without being
overly complicated.  O3 is not intended to be an extremely accurate model
of any particular modern CPU, but we don't want to create unnecessary
differences between its behavior and that of a typical modern CPU either.

- If we need to make some changes in the Port interface to make this work
well, that's OK.  Someday I would still like to see Port and RubyPort
integrated so we don't have to do a translation between the two structs on
every memory access.  That probably doesn't affect this directly, but it's
good to keep in mind as we evolve the code.

Steve

On Wed, Nov 9, 2011 at 3:25 PM, Nilay Vaish <[email protected]> wrote:

> Brad, your reply clears some air.
>
> The current patch allows us to use the existing O3 CPU with Ruby. Since
> the O3 CPU already provides Alpha's memory model, we get that for free. Now
> that we would like to have TSO as well, we need to work out how the two
> models would co-exist. I'll think more about this, but we need a broader
> consensus on this.
>
>
> --
> Nilay
>
> On Wed, 9 Nov 2011, Beckmann, Brad wrote:
>
>  I see.  It sounds like you're still worried about how the RubyPort can
>> support multiple M5 cpu ports and still adhere to a stronger consistency
>> model.  Sorry for not directly responding to that question earlier, but to
>> me that seems like an orthogonal issue that you've already solved. If I
>> recall correctly, the patch you sent out for review essentially attaches
>> the multiple M5 cpu ports, representing simultaneous cpu requests, to the
>> single RubyPort that represents the CPUs connection to the L1 caches.  That
>> seems reasonable to me and I don't see any problem with it.  The key is
>> that the cpu LSQ cannot blindly issue simultaneous requests to the memory
>> system without expecting and acting upon probes that occur between issue
>> and retirement.  Furthermore, the CPU needs to communicate to Ruby when the
>> instructions associated with the memory operations retire (for loads) or
>> reach the head of the store buffer (for stores).  Once Ruby receives that
>> notification, it can stop monitoring that location and move the cache block
>> to a base state.
>>
>> Now to answer your specific question: We are definitely interested in a
>> TSO model and in my opinion that is the only consistency model that we have
>> to implement.  Remember TSO is a valid implementation of Alpha's or ARM's
>> weaker models.  We can certainly implement subsequent models, but that
>> should not be a short term goal.
>>
>> I know this can be a complicated subject so please send me questions if
>> you disagree or are confused.  I certainly may be overlooking something and
>> my thoughts are constantly evolving as well as I page more of this into my
>> memory.  For instance, I realize that my previous mail was incorrect
>> because I confused the LSQ, which contains pre-retirement memory
>> instructions, with the store buffer, which contains post-retirement store
>> instruction values.  If a probe hits in the store buffer, the CPU doesn't
>> (it can't) reissue the store instruction.  The store buffer shields the CPU
>> from that probe.  As long as the cache has write permission when the store
>> reaches the head of the store buffer, stores have a global order and TSO is
>> maintained.  Of course probing loads in the LSQ also needs to occur, along
>> with several other features for supporting locks, fences, etc.
>>
>> If you do have further questions, please be specific as possible.  It is
>> hard to talk about this subject using generalities.
>>
>> Brad
>>
>>
>>  ______________________________**_________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to