Sorry for the confusion. Could you update the comment on the patch so that I don’t make that mistake again? Is that possible?
I don’t think Ruby needs to ensure that at least one load is committed between invalidations. All Ruby cache coherence protocols ensure that eventually any/all particular load operations complete. The CPU needs make sure that eventually it will issue loads non-speculatively so that when Ruby completes the load, the instruction retires and forward progress is maintained. Brad From: Nilay Vaish [mailto:[email protected]] Sent: Tuesday, November 15, 2011 3:38 PM To: Nilay Vaish; Beckmann, Brad; Default Subject: Re: Review Request: O3, Ruby: Forward invalidations from Ruby to O3 CPU This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/894/ On November 15th, 2011, 2:44 p.m., Brad Beckmann wrote: Nilay, Thanks for pushing this patch along. This is a very important feature for gem5 and I'm glad we have you working on it. First, to answer your questions: - We can certainly avoid deadlock, but exactly how we do it depends on the interactions between the O3 CPU and Ruby. For the most part, it is up to the O3 model to avoid deadlock. I've heard through the grapevine that you are thinking about implementing the first, simplest option I suggested in my previous email. Essentially that is the one where the O3 model doesn't issue stores to Ruby until they reach the head of the store buffer. I think that is an excellent choice and it avoids having to worry about deadlock for stores since they are only issued to the memory system once they become non-speculative. In contrast, I'm sure the O3 model will issue speculative loads to Ruby and if the O3 CPU relies on speculative loads to succeed, we will encounter deadlock. However, as long as the O3 model eventually issues a load non-speculatively, I'm pretty sure we can guarantee forward progress. Make sense? - Testing at the CPU model is a great question. Do you know if the O3 model can read in a trace? If so, I would suggest a solution similar to the trace solution I suggested before to test Ruby. Basically you need the trace entries include a fixed delay so that you can enforce certain reorderings. I would use those fixed delay values to manipulate the delay in the mandatory queue. A couple questions/comments: - Why do you say that "My understanding is that this should ensure an SC execution, as long as Ruby can support SC. But I think Ruby does not support any memory model currently"? Ruby implements a cache coherence protocol, which is a component of a memory model, but in itself is not a memory model. Ruby can't alone support any particular memory model. However, I believe by forwarding probes and evictions to the CPU, Ruby can help support SC, TSO, or any other memory model. It is up to the CPU to act appropriately to achieve a certain model. - I would modify the action name "cc_squash_speculation" to something like "foward_eviction_to_cpu". It is really up to the CPU and memory model to determine whether speculation should be squashed. We should not try to imply that Ruby is designed to support a specific memory model or CPU type. Brad, those questions that appear in the description of the patch have been there since I first posted the patch. In fact we did discuss those questions earlier as well. * So how do we ensure that at least one load is committed between successive invalidations? * I'll change the name of the function in the protocol. - Nilay On November 15th, 2011, 6:53 a.m., Nilay Vaish wrote: Review request for Default. By Nilay Vaish. Updated 2011-11-15 06:53:48 Description O3, Ruby: Forward invalidations from Ruby to O3 CPU This patch implements the functionality for forwarding invalidations and replacements from the L1 cache of the Ruby memory system to the O3 CPU. The implementation adds a list of ports to RubyPort. Whenever a replacement or an invalidation is performed, the L1 cache forwards this to all the ports, which I believe is the LSQ in case of the O3 CPU. Those who understand the O3 LSQ should take a close look at the implementation and figure out (at least qualitatively) if some thing is missing or erroneous. This patch only modifies the MESI CMP directory protocol. I will modify other protocols once we sort the major issues surrounding this patch. My understanding is that this should ensure an SC execution, as long as Ruby can support SC. But I think Ruby does not support any memory model currently. A couple of issues that need discussion -- * Can this get in to a deadlock? A CPU may not be able to proceed if a particularly cache block is repeatedly invalidated before the CPU can retire the actual load/store instruction. How do we ensure that at least one instruction is retired before an invalidation/replacement is processed? * How to test this implementation? Is it possible to implement some of the tests that we regularly come across in papers on consistency models? Or those present in manuals from AMD and Intel? I have tested that Ruby will forward the invalidations, but not the part where the LSQ needs to act on it. Diffs * configs/example/se.py (e66a566f2cfa) * configs/ruby/MESI_CMP_directory.py (e66a566f2cfa) * src/mem/protocol/MESI_CMP_directory-L1cache.sm (e66a566f2cfa) * src/mem/protocol/RubySlicc_Types.sm (e66a566f2cfa) * src/mem/ruby/system/RubyPort.hh (e66a566f2cfa) * src/mem/ruby/system/RubyPort.cc (e66a566f2cfa) * src/mem/ruby/system/Sequencer.hh (e66a566f2cfa) * src/mem/ruby/system/Sequencer.cc (e66a566f2cfa) View Diff<http://reviews.m5sim.org/r/894/diff/> _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
