> On April 6, 2016, 9:46 p.m., Andreas Hansson wrote:
> > For the classic memory system, all caches before the point of coherency 
> > have the same line size, and it is a property of the system. Is this change 
> > departing from this assumption for Ruby? What is the bigger picture here?

__TL;DR: Never uses global variables, static class members, or static variable 
declarations in function bodies. If you do, someone will come knocking on your 
office door in the future; it might be me.__

The bigger picture is the desire to run two or more "systems" inside the same 
gem5 process; ideally the systems could be independent from one another i.e. 
completely different coherence protocols, topologies, system block sizes, etc.. 
(Without Ruby, it was already possible to run multiple simulated systems in the 
same process. We only realized that Ruby had this problem when we started 
trying to enable it in our simulations.) Being able to run two simulated 
systems together in the same process is important if you want them to 
communicate with one another through mutually shared objects. I am aware of the 
dist-gem5 changeset (1640dd68b0a4); it is a nice addition and could supplant 
this whole effort by allowing us to create two seperate gem5 processes each 
with a single system invocation and then communicate through the dist-gem5 
interface. (Some folks here at AMD already use dist-gem5 for their research.) 
However, there's no reason why we shouldn't also be able to achieve a 
multi-instance run without having to create different processes. With a single 
process, we could theoretically take advantage of shared memory with 
multi-threading __(on the host)__ with different real hardware threads 
servicing gem5 event queues. (Preliminary work on multi-threaded and multiple 
event queues has already been done and checked in. We have internal patches 
that we would eventually like to push to the public tree to extend it.) 
Ideally, dist-gem5 and this work could even be complementary. We might run a 
multi-threaded gem5 process with multiple simulated systems coexisting together 
in the process and then scale that out to multiple processes on different host 
nodes which communicate with one another through dist-gem5; in this way, we 
could harness a cluster and use the threading within nodes.

__Ruby problems:__ The issues are related to global variables and static class 
members. Since the variables are visible to the entire gem5 process, they're 
used by each independent simulated system even though that's not really what we 
want. For example, the deal breaker (which caused initial failed assertions) 
was how the component mappings were being determined post python configuration. 
There were global counters tracking how many times the controller classes had 
been instantiated to figure out how many controllers (cache, directory, etc.) 
were being created. This information was used to determine how to route 
messages between the controllers. With multiple simulated systems being 
created, the extra class invocations were throwing things off and tripping 
assertions because it was seeing too many controller registrations.

__Block size:__ I agree with what you're saying that it should be the same 
within a system; the alternative does not make sense to me. (I'm going to 
subject you to a bit of Ruby background to try to explain why the block size 
changes look they way that they do.) There are System objects and RubySystem 
objects and there is a 1:1 correlation between the two. Essentially, they both 
serve as an aggregation point for all system wide attributes. The only thing 
that's special about the RubySystem object is that it contains Ruby specific 
parameters. So you might ask, "Why not place the block size (or other 
attributes) in RubySystem and access it (them) indirectly through that object?" 
I tried that with a set of patches last year; it was the most obvious initial 
solution. The patches were not well received -- partly because of the code 
organization/quality, partly because of Ruby specific eccentriticites, and 
partly because of the following connundrum: the block size is needed by 
_everything_. I need to pass a parameter through deeply-nested function 
parameter lists from within controller objects that are created independently 
in Python configuration scripts __and__ dynamically created entry objects which 
exist independently and which are owned by the controller objects __and__ 
messages. Should all of those objects be exposed to a centralized object like 
RubySystem?

__Why is that necessary to pass the block size around like this? I still don't 
get it. Type some more!:__ It is necessary because the controllers call 
functions that rely on knowing the block size. (For instance, the 
makeLineAddress function needs to know the block size to figure out the base 
cache line address; makeLineAddress is used _everywhere_.) Each controller 
needs either a RubySystem handle to access it indirectly or a block size 
provided directly as a parameter. You might say, "Well, the indirect access 
still doesn't seem like a big deal with the controllers!" Correct. RubySystem 
objects are accessed indirectly through several "top level" objects already; 
that's the point. The problem is that passing the RubySystem pointer around to 
the other objects that I mentioned is _nasty_ (and Nilay claims inefficient due 
to the extra dereferences on so many of the accesses). The entries are an 
incredible pain to deal with; they are templated types that may or may not have 
DataBlks or WriteMasks. (The DataBlk class is essentially a vector that holds 
data inside messages, TBEs (MSHRs), caches, and other assorted _fun_ objects. 
WriteMasks were introduced with the recent AMD GPU patches as a way of copying 
only portions of a DataBlk. Both rely on the block size to know how to size the 
two objects. Previously, they both just used the global in their constructor so 
that they were always sized appropriately; the global went away so _fun_.) Not 
only are the entries templated types, but it was previously not possible to 
pass the data block in the constructor so there's a patch in this list that 
makes that possible. Otherwise, we'd have to create the DataBlk or WriteMask as 
an empty vector and then subsequently initialize it after the entry was 
created... which is what we do with messages... _fun_! Message do not have a 
constructor available to them directly within the slicc language. They are 
created as part of the "enqueue" keyword and their fields are initialized 
subsequently. (It might be possible to pass in the data block implicitly by 
modifiying "enqueue" but I didn't bother with it.) __So there, that's why.__

__Reality__: Currently, we can run two or more simulated systems in the same 
process with almost identical configurations; pretty much everything stays the 
same except the workloads/kernels which are provided to each system. _For now, 
this is fine for us here at AMD, but I can imagine that someone else will want 
homogenous systems._ One problem is writing a configuration script which allows 
us to specify two completely different systems in a clean way. Right now, we go 
through the process of making one system and then duplicate that system. 
Another problem is how the PROTOCOL is specified at build time; I have not put 
much thought into solving this yet, but it seems like a hard problem. Maybe we 
can make a meta-protocol that encompasses all of the existing ones? Seems 
crazy. Finally, I don't think that I have completely solved the problem of 
statics/globals. I found that the number of virtual networks is still a global 
definition somewhere in the code the other day. If the PROTOCOL issue gets 
solved, it will need to be changed as well. There are also probably other 
things that I am overlooking.


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3428/#review8153
-----------------------------------------------------------


On April 4, 2016, 11:42 p.m., Brandon Potter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3428/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 11:42 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11430:ad5965016b8e
> ---------------------------
> ruby: pass in block size to ENTRY objects with block size
> 
> 
> Diffs
> -----
> 
>   src/mem/protocol/MOESI_CMP_token-L2cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_token-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_hammer-cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_hammer-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/ruby/structures/TBETable.hh 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/slicc/symbols/Type.py cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-L3cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-Region-CorePair.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-Region-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-RegionBuffer.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-RegionDir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-probeFilter.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-L2cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-dma.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_token-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER-TCP.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER_Region-TCC.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Three_Level-L0cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Three_Level-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Two_Level-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Two_Level-L2cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Two_Level-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MI_example-cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MI_example-dir.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-CorePair.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-SQC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-TCC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-TCCdir.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-TCP.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER-SQC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER-TCC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
> 
> Diff: http://reviews.gem5.org/r/3428/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Brandon Potter
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to