Sorry for the confusion.  Could you update the comment on the patch so that I 
don’t make that mistake again?  Is that possible?

I don’t think Ruby needs to ensure that at least one load is committed between 
invalidations.  All Ruby cache coherence protocols ensure that eventually 
any/all particular load operations complete.  The CPU needs make sure that 
eventually it will issue loads non-speculatively so that when Ruby completes 
the load, the instruction retires and forward progress is maintained.

Brad


From: Nilay Vaish [mailto:[email protected]]
Sent: Tuesday, November 15, 2011 3:38 PM
To: Nilay Vaish; Beckmann, Brad; Default
Subject: Re: Review Request: O3, Ruby: Forward invalidations from Ruby to O3 CPU

This is an automatically generated e-mail. To reply, visit: 
http://reviews.m5sim.org/r/894/



On November 15th, 2011, 2:44 p.m., Brad Beckmann wrote:

Nilay,



Thanks for pushing this patch along.  This is a very important feature for gem5 
and I'm glad we have you working on it.



First, to answer your questions:

- We can certainly avoid deadlock, but exactly how we do it depends on the 
interactions between the O3 CPU and Ruby.  For the most part, it is up to the 
O3 model to avoid deadlock.  I've heard through the grapevine that you are 
thinking about implementing the first, simplest option I suggested in my 
previous email.  Essentially that is the one where the O3 model doesn't issue 
stores to Ruby until they reach the head of the store buffer.  I think that is 
an excellent choice and it avoids having to worry about deadlock for stores 
since they are only issued to the memory system once they become 
non-speculative.  In contrast, I'm sure the O3 model will issue speculative 
loads to Ruby and if the O3 CPU relies on speculative loads to succeed, we will 
encounter deadlock.  However, as long as the O3 model eventually issues a load 
non-speculatively, I'm pretty sure we can guarantee forward progress.  Make 
sense?

- Testing at the CPU model is a great question.  Do you know if the O3 model 
can read in a trace?  If so, I would suggest a solution similar to the trace 
solution I suggested before to test Ruby.  Basically you need the trace entries 
include a fixed delay so that you can enforce certain reorderings.  I would use 
those fixed delay values to manipulate the delay in the mandatory queue.



A couple questions/comments:

- Why do you say that "My understanding is that this should ensure an SC 
execution, as long as Ruby can support SC. But I think Ruby does not support 
any memory model currently"?  Ruby implements a cache coherence protocol, which 
is a component of a memory model, but in itself is not a memory model.  Ruby 
can't alone support any particular memory model.  However, I believe by 
forwarding probes and evictions to the CPU, Ruby can help support SC, TSO, or 
any other memory model.  It is up to the CPU to act appropriately to achieve a 
certain model.

- I would modify the action name "cc_squash_speculation" to something like 
"foward_eviction_to_cpu".  It is really up to the CPU and memory model to 
determine whether speculation should be squashed.  We should not try to imply 
that Ruby is designed to support a specific memory model or CPU type.



Brad, those questions that appear in the description of the patch have been

there since I first posted the patch. In fact we did discuss those questions

earlier as well.



* So how do we ensure that at least one load is committed between successive

  invalidations?



* I'll change the name of the function in the protocol.


- Nilay


On November 15th, 2011, 6:53 a.m., Nilay Vaish wrote:
Review request for Default.
By Nilay Vaish.

Updated 2011-11-15 06:53:48

Description

O3, Ruby: Forward invalidations from Ruby to O3 CPU

This patch implements the functionality for forwarding invalidations and

replacements from the L1 cache of the Ruby memory system to the O3 CPU. The

implementation adds a list of ports to RubyPort. Whenever a replacement or an

invalidation is performed, the L1 cache forwards this to all the ports, which

I believe is the LSQ in case of the O3 CPU. Those who understand the O3 LSQ

should take a close look at the implementation and figure out (at least

qualitatively) if some thing is missing or erroneous.



This patch only modifies the MESI CMP directory protocol. I will modify other

protocols once we sort the major issues surrounding this patch.



My understanding is that this should ensure an SC execution, as long as Ruby

can support SC. But I think Ruby does not support any memory model currently.

A couple of issues that need discussion --



* Can this get in to a deadlock? A CPU may not be able to proceed if a

particularly cache block is repeatedly invalidated before the CPU can retire

the actual load/store instruction. How do we ensure that at least one

instruction is retired before an invalidation/replacement is processed?



* How to test this implementation? Is it possible to implement some of the

tests that we regularly come across in papers on consistency models? Or those

present in manuals from AMD and Intel? I have tested that Ruby will forward

the invalidations, but not the part where the LSQ needs to act on it.


Diffs

 *   configs/example/se.py (e66a566f2cfa)
 *   configs/ruby/MESI_CMP_directory.py (e66a566f2cfa)
 *   src/mem/protocol/MESI_CMP_directory-L1cache.sm (e66a566f2cfa)
 *   src/mem/protocol/RubySlicc_Types.sm (e66a566f2cfa)
 *   src/mem/ruby/system/RubyPort.hh (e66a566f2cfa)
 *   src/mem/ruby/system/RubyPort.cc (e66a566f2cfa)
 *   src/mem/ruby/system/Sequencer.hh (e66a566f2cfa)
 *   src/mem/ruby/system/Sequencer.cc (e66a566f2cfa)

View Diff<http://reviews.m5sim.org/r/894/diff/>


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to