I think you didn't understand my point. I'll explain it with an example. >> A receive >> tick of a packet cannot fall into the current quantum so every packet >>can >> get scheduled for eceive properly even if a checkpoint/restore happens >> during a quantum.
This assumption is true when "quantum size <= link_latency". But link_latency is not fixed, it's a parameter. Assume you take checkpoint with q=10 and have 3 nodes and take checkpoint @tick=11 on node0. Then assume this is the tick value of nodes when you take unsync ckpt: node0:11, node1:20, node2:20. If you restore with quantum smaller than 10, then your above statement does not hold. So you cannot restore from a checkpoint with link_latency smaller than the value that you took checkpoint with! Mohammad On Tue, Jul 7, 2015 at 11:05 AM, Gabor Dozsa <gabor.do...@arm.com> wrote: > Mohammad, I’m not sure what you mean by “taking a checkpoint with quantum > size smaller than link latency”. > > In multi-gem5, thequantum size and the checkpoint is completely > independent. The quantum is the number of ticks simulated between two > consecutive periodic sync - that’s why every periodic sync happens at the > same tick at each gem5 process. A checkpoint can be taken at any point > within a quantum. After the checkpoint is taken, each gem5 rocess > completes what remained from the current quantum and then enters the next > periodic sync. > > When fast-forwarding, you can increase link latency to allow larger > quantum and reduce periodic sync overhead. Does that make sense? > > - Gabor > > On 7/7/15, 4:11 PM, "Mohammad Alian" <al...@wisc.edu> wrote: > > >Then you are assuming taking checkpoint with quantum size smaller than > >link > >latency which contradicts your initial motivation for unsync checkpoint!: > >(I copied this sentence from earlier messages in the thread as a reminder) > >"Shortening the quantum canhelp, but usually the snapshot is being taken > >while 'fast-forwarding', i.e. simulating as fast s possible, which would > >motivate a longer quantum." > > > >What if somebody wants t relax synchronization and take checkpoint? > > > >On Tue, Jul 7, 2015 at 7:38 AM, Gabor Dozsa <gabor.do...@arm.com> wrote: > > > >> > >> Hi Mohammad and all, > >> > >> gem5 processes may restore at a different tick from a checkpoint but the > >> next periodic sync will hapen at the same tick in all gem5. A receive > >> tick of a packet cannot fall into the current quantum so every packet > >>can > >> get scheduled for eceive properly even if a checkpoint/restore happens > >> during a quantum. > >> > >> Regarding your multi-threaded dual config, my understanding is that > >> EtherLink is not prepared to work with multi threading as it lacks > >>thread > >> safety. The multiple event queues/threads config only works if the > >>systems > >> are independent. > >> > >> One possible way to fix that is to provide a "multi-thread” based > >> implementation for MultiIface ;-) > >> > >> - Gabor > >> > >> On 7/7/15, 6:29 AM, "Mohammad Alian" <al...@wisc.edu> wrote: > >> > >> >Gabor- My concern about unsync checkpoint is that when you restore > >>from an > >> >unsync checkpoin, you'll have gem5 processes that each is running in > >> >different tick. Then how do you handle accurate delivery of packets > >> >between > >> >these gem5 processes? It willalso make it harder to integrate > >> >multi/pd-gem5 with current multi-threaded gem5. The problem with sync > >> >checkpoint is that you cannot exactl take checkpoint at ROI, but I > >>think > >> >unsync checkpoint introduces some other problems. Considering the > >> >necessary > >> >warmup periodbefore starting stat collection, I think we don't need to > >> >exactly pinpoint the ROI. Please correct me if I'm wrong. > >> > > >> >I'm trying to run a multi-threaded experiment with pd-gem5, but I got > >>an > >> >error when I tried to partition dual mode simulation on two threads. I > >> >posted that in gem5 users mailing list. Please help me on that if you > >>can. > >> > > >> >Thank you, > >> >Mohammad > >> > > >> >On Mon, Jul 6, 2015 at 11:45 AM, Gabor Dozsa <gabor.do...@arm.com> > >>wrote: > >> > > >> >> Thank you Steve for the detaile elaboration on the issues. > >> >> > >> >> > >> >> Regarding the “unsynchronized checkpoints”, the terminology might be > >>a > >> >>bit > >> >> confusing. In fact, w always need to do a global synchronization > >>among > >> >> the gem5 processes bfore taking a distributed checkpoint (in order > >>to > >> >> avoid in-flight packets). The global synchronization here means that > >> >>each > >> >> gem5 has to suspend the simulation and wait until every in-flight > >> >>packets > >> >> arrives (and is stored) at the destination gem5 process. If that > >>global > >> >> synchronization step happens at the same simulated tick in each gem5 > >> >>then > >> >> the we call the checkpoint “synchronous” otherwise it is an > >> >>“asynchronous” > >> >> checkpoint. > >> >> > >> >> In the MPI application example I mentioned before the checkpoint > >>should > >> >>be > >> >> triggered as soon as the “slowest” MPI process reaches the > >> >>MPI_barrier(). > >> >> The problem is that the “slowest” MPI process usually does not reach > >>the > >> >> MPI_barrier() right at the end of the current quantum. If we let the > >> >> simulation continue until the quantum completes (to ensure that the > >> >> checkpoint is taken at the same simulated tick n each gem5) then the > >> >>MPI > >> >> processes will complete the MPI_barrier and start executing the ROI > >>code > >> >> already. > >> >> > >> >> Regarding the integration of multi-threaded/multi-host simulation, > >> >> multi-gem5 does not support fine grainsimulation of hierarchical > >> >>switches > >> >> (or any other network topologies except a single crossbar) or > >>multiple > >> >> synchronization domains currently. > >> >> > >> >> However, I'm a bit confused about your statement that you don’t see > >> >>value > >> >> in ever building a shared-memory transport for MultiIface. > >>MultiIface in > >> >> my view is just an abstract interface for “multi-(ether)-link" > >>objects > >> >> which are link objects for connecting multiple (i.e. more than two) > >> >> systems. It aims to encapsulate the API necessary for any Link object > >> >> in a any multi-system configuration - provided that we partition the > >> >> systems across network links during run time. > >> >> > >> >> An orthogonal issue is if we want to include a simple crossbar switch > >> >> model in a MultiIface implementation or we want to provide a > >> >>‘standalone' > >> >> fine grain model for the switch (e.g. the pd-gem5 approach). > >> >> > >> >> Thanks, > >> >> - Gabor > >> >> > >> >> > >> >> > >> >> On 7/3/15, 7:33 PM, "Steve Reinhardt" <ste...@gmail.com> wrote: > >> >> > >> >> >Thanks Mohammad & Gabor for the responses. > >> >> > > >> >> >I think there's still some misunderstanding on what I mean by the > >> >> >integration of multi-threaded and multi-host simulation based on > >> >>Gabor's > >> >> >response above and Andreas's response in the other thread. > >> >> > > >> >> >The primary example scenario I'm proposing is as Mohammad described: > >> >> >within > >> >> >each host node, we're simulating an entire rack + top-of-rack switch > >> >>in a > >> >> >single gem5 process, with separate event queues/threads being used > >>to > >> >> >parallelize across nodes within the rack. The switch may or may not > >>be > >> >>on > >> >> >its own thread as well. The synchronization among the threads only > >> >>needs > >> >> >to be at the granularity of the intra-rack network latency. > >> >> > > >> >> >Now we want to expand this by using pd-gem5 or multi-gem5 to > >> >>parallelize > >> >> >multiple of these rack-level simulations across hosts, so we can > >> >>simulate > >> >> >a > >> >> >whole row of a datacenter. Only the uplinks from the TOR switches > >> >>would > >> >> >need to go over sockets between processes, and the switch being > >> >>modeled by > >> >> >pd-gem5 or multi-gem5 would be the end-of-row switch. The > >> >>synchronization > >> >> >delay among the multiple gem5 processes would be based on the > >> >>inter-rack > >> >> >latency. > >> >> > > >> >> >So the basic question is: Is this feasible with pd-gem5 / > >>multi-gem5, > >> >>and > >> >> >if not, how much work would it take to make it so? > >> >> > > >> >> >However, my larger point is that I still don't see value in ever > >> >>building > >> >> >a > >> >> >shared-memory transport for MultiIface. For this model, there is > >> >>clearly > >> >> >no > >> >> >need for it. Things get more complicated if we want to do something > >> >>like > >> >> >have N nodes connected to a single switch and split that over two > >>hosts > >> >> >(with N/2 nodes simulated on each), but even in that case, I think > >> >>it's a > >> >> >better idea to make the switch model deal with having half of its > >>links > >> >> >internal and half external (since we already want the same model to > >> >>work > >> >> >in > >> >> >both the all-internal and all-external cases). Not that I'm worried > >> >>that > >> >> >someone is about to go off and build this shared-memory transport, > >>but > >> >>I > >> >> >think it's important to reach an understanding here, since it's > >> >> >fundamental > >> >> >to defining the strategic relationship between these capabilities > >>going > >> >> >forward. > >> >> > > >> >> >Stepping back a little further, it would be nice to have a model > >>that > >> >>is > >> >> >as > >> >> >generic as the multi-threading model, where it's really just a > >>matter > >> >>of > >> >> >taking a simulation, partitioning the components among the threads, > >>and > >> >> >setting the synchronization quantum, and it works. Of course, even > >>with > >> >> >the > >> >> >multi-threaded model, if you don't choose your partitioning and your > >> >> >quantum wisely, you're not going to get much speedup or a > >>deterministic > >> >> >simulation, but the fundamental implementation is oblivious to that. > >> >>I'm > >> >> >not saying we really need to go all the way to this extreme---it's > >> >>pretty > >> >> >reasonable to assume that no one in the near future will want to > >> >>partition > >> >> >across hosts anywhere other than on a simulated network link---but I > >> >>think > >> >> >we should keep this ideal in mind as a guiding principle as we > >>choose > >> >>how > >> >> >to go forward from here. > >> >> > > >> >> >This ties in to my point #4, which is that if we're really building > >>a > >> >> >mechanism to partition a simulation across multiple hosts, then you > >> >>should > >> >> >be able to run the same simulation in a single gem5 process and get > >>the > >> >> >same results. I think this is the strength of pd-gem5; > >>correspondingly > >> >>the > >> >> >main weakness of multi-gem5 is that it architecturally feels more > >>like > >> >> >tying together a set of mostly independent gem5 simulations than > >>like > >> >> >partitioning a single gem5 simulation. (Of course, they both end > >>up at > >> >> >roughly the same point in the middle.) > >> >> > > >> >> >On the flip side, multi-gem5 has some clear advantages in terms of > >>the > >> >> >better separation of the communication layer (and I can imagine it > >> >>being > >> >> >very useful to port to MPI and perhaps some RDMA API for InfiniBand > >> >> >clusters). Also I think the integrated sockets for communication and > >> >> >syncrhonization are the superior design; while the separate sockets > >> >>used > >> >> >by > >> >> >pd-gem5 may only very rarely cause problems, I agree with Andreas > >>that > >> >> >that's not good enough, and I don't see any real advantage > >>either---if > >> >>you > >> >> >have to flush the data sockets (or wait for them to drain) before > >> >> >synchronizing, then you might as well just have the synchronization > >> >> >messages queue up behind the data messages. > >> >> > > >> >> >Regarding unsynchronized checkpoints: Thanks for the example, but > >>I'm > >> >> >still > >> >> >a little confused. If all the processes are about to execute an > >> >> >MPI_Barrier(), doesn't that mean they'll all be synchronized shortly > >> >> >anyway? So what's the harm until waiting until they're synchronized > >>and > >> >> >then checkpointing? > >> >> > > >> >> >Regarding the simulation of non-Ethernet networks: I agree that the > >> >> >biggest > >> >> >obstacle to this is the lack of generality of the current gem5 > >>network > >> >> >components. I tried to take a step toward supporting other link > >>types > >> >>two > >> >> >years ago (see http://reviews.gem5.org/r/1922) but someone shot me > >> down > >> >> >;). > >> >> >We shouldn't try and fix that here, but we should also consciously > >>try > >> >>not > >> >> >to make it any worse... > >> >> > > >> >> >Thanks for reading all the way to the end! > >> >> > > >> >> >Steve > >> >> > > >> >> > > >> >> >On Fri, Jul 3, 2015 at 7:11 AM Gabor Dozsa <gabor.do...@arm.com> > >> wrote: > >> >> > > >> >> >>Hi all, > >> >> >> > >> >> >>Thank you Steve for the thorough review. > >> >> >> > >> >> >>First, let me elaborate a bit on Andreas’s 3rd point about > >> >> >>non-synchronous > >> >> >>checkpoints. Let’s assume that we aim to simulate MPI applications > >> >>(HPC > >> >> >>workloads). The ROI in an MPI application is typically starts with > >>a > >> >> >>global MPI_Barrier() call. We want to take the checkpoint when > >>*every* > >> >> >>gem5 process is reached that MPI_Barrier() in the simulated code > >>but > >> >> >>that > >> >> >>may not happen at the same tick in each gem5 (due to load imbalance > >> >> >>among > >> >> >>the simulated nodes). That’s why multi-gem5 implements the > >> >> >>non-synchronous > >> >> >>checkpoint support. > >> >> >> > >> >> >>My answers to your questions are as follows. > >> >> >> > >> >> >>1. The only change necessary to use multi-gem5 with a non Ethernet > >> >> >>(simulated) network is to replace the Ethernet packet type with > >> >>another > >> >> >>packet type in MultiIface. > >> >> >>In fact, the first implementation of MultiIface was a template > >> >> >>that took EthPacketData as parameter because I plan to support > >> >>different > >> >> >>network types. When I realized that currently only Ethernet is > >> >>supported > >> >> >>by gem5 I dropped the template param to keep the implementation > >> >> >>simpler. I > >> >> >>have also realized in the meantime that the right approach would > >> >> >>probably > >> >> >>be to create a pure virtual ‘base' class for network packets from > >> >>which > >> >> >>Ethernet (and other types of) packets could be derived. Then > >> >>MultiIface > >> >> >>could simply use that base class to provide support for different > >> >> >>network > >> >> >>types. The interface provided by the base packet class could be > >>very > >> >> >>simple. Beside the total size() of the packet, multi-gem5 only > >>needs a > >> >> >>method to ‘extract' the source/destination address. Those addresses > >> >>are > >> >> >>used in MultiIface as opaque byte arrays so they are quite network > >> >>type > >> >> >>agnostic already. > >> >> >> > >> >> >>2. That’s right, we have designed the MultiIface/TCPIface split > >>with > >> >> >>different underlaying messaging systems in mind. > >> >> >> > >> >> >>3. Multi-gem5 can work together with > >>multi-threaded/multi-event-queue > >> >> >>gem5 > >> >> >>configs. The current TCPIface/tcp_server components would still use > >> >> >>sockets to send around the packets. So it is possible to put > >>together > >> >>a > >> >> >>multi-gem5 simulation where each gem5 process has multiple event > >> >>queues > >> >> >>(and an independent simulation thread per event queue) but all the > >> >> >>simulated Ethernet links would use sockets to forward every > >>Ethernet > >> >> >>packet to the tcp_server. > >> >> >> > >> >> >>If someone wanted to run only a single gem5 process to simulate an > >> >> >>entire > >> >> >>cluster (using one thread/event-queue per cluster node) then the > >> >>current > >> >> >>multi-gem5 implementation using sockets/tcp_server is not optimal. > >>In > >> >> >>that > >> >> >>case, a better solution would be to provide a shared memory based > >> >> >>implementation of the MultiIface virtual communication methods > >> >> >>sendRaw()/recvRaw()/syncRaw() (i.e. a shared memory equivalent of > >> >> >>TCPIface). In that implementation, the entire discrete tcp_sever > >> >> >>component > >> >> >>could be replaced with a shared data structure. > >> >> >> > >> >> >>4. You are right, the current implementation does not make it > >>possible > >> >> >>to > >> >> >>construct an equivalent single-process simulation model for a > >> >>multi-gem5 > >> >> >>run. However, a possible solution is a shared memory based > >> >> >>implementation > >> >> >>of the MultiIface virtual communication methods just as I > >>described in > >> >> >>the > >> >> >>previous paragraph. The same implementation could then work with > >>both > >> >> >>multi-threaded/multi-event-queues and > >>single-thread/single-event-queue > >> >> >>gem5 configs. > >> >> >> > >> >> >>Thanks, > >> >> >>- Gabor > >> >> >> > >> >> >>On 7/2/15, 7:20 PM, "Steve Reinhardt" <ste...@gmail.com> wrote: > >> >> >> > >> >> >>>Hi everyone, > >> >> >>> > >> >> >>>Sorry for taking so long to engage. This is a great development > >>and I > >> >> >>>think > >> >> >>>both these patches are terrific contributions. Thanks to Mohammad, > >> >> >>Gabor, > >> >> >>>and everyone else involved. > >> >> >>> > >> >> >>>I agree with Andreas that we should start with some top-level > >>goals & > >> >> >>>assumptions, agree on those, and then we can sort out the detailed > >> >> >>issues > >> >> >>>based on a consistent view. > >> >> >>> > >> >> >>>I definitely agree with Andreas's first two points. The third one > >> >> >>seems a > >> >> >>>little surprising; I'd like to hear more about the motivation > >>before > >> >> >>>expressing an opinion. I can see where non-synchronous > >>checkpointing > >> >> >>could > >> >> >>>be useful, but it's also clear from the associated patch that it's > >> >>not > >> >> >>>trivial to implement either. How much would be lost by requiring a > >> >> >>>synchronization before a checkpoint? > >> >> >>> > >> >> >>>From my personal perspective, I would like to see whatever we do > >>here > >> >> >>be a > >> >> >>>first step toward a more general distributed simulation platform. > >> >>Both > >> >> >>of > >> >> >>>these patches seem pretty Ethernet-centric in different ways. > >>This is > >> >> >>not > >> >> >>>terrible; part of the problem is that gem5's current internal > >> >> >>networking > >> >> >>>support is already overly Ethernet-centric IMO. But it would be > >>nice > >> >>to > >> >> >>>avoid baking that in even further. Rather than assume I have > >> >>understood > >> >> >>>all > >> >> >>>the code completely, I'll phrase things in the form of questions, > >>and > >> >> >>>people can comment on how those questions would be answered in the > >> >> >>context > >> >> >>>of the two different approaches. > >> >> >>> > >> >> >>>1. How much effort would be required to simulate a non-Ethernet > >> >> >>network? > >> >> >>>My > >> >> >>>impression is that pd-gem5 has a leg up here, since a gem5 switch > >> >>model > >> >> >>>for > >> >> >>>a non-Ethernet network (which you'd have to write anyway if you > >>were > >> >> >>>simulating a different network) could be used in place of the > >>current > >> >> >>>Ethernet switch, where for multi-gem5 I think that the > >> >> >>>util/multi//tcp_server.cc code would have to be modified (i.e., > >> >> >>there'd be > >> >> >>>additional work above and beyond what you'd need to get the > >>network > >> >> >>>modeled > >> >> >>>in base gem5). > >> >> >>> > >> >> >>>2. How much effort is required to run on a non-Ethernet network > >>(or > >> >> >>>equivalently using a non-sockets API)? The MultiIface/TCPIface > >>split > >> >> >>in > >> >> >>>the multi-gem5 code looks like it addresses this nicely, but > >>pd-gem5 > >> >> >>seems > >> >> >>>pretty tied to an Ethernet host fabric. > >> >> >>> > >> >> >>>3. Do both of these patches work with the existing multithreaded > >> >> >>>multiple-event-queue simulation? I think multi-gem5 does (though > >>it > >> >> >>would > >> >> >>>be nice to have a confirmation), but it's not clear about > >>pd-gem5. I > >> >> >>don't > >> >> >>>see a benefit to having multiple gem5 processes on a single host > >>vs. > >> >>a > >> >> >>>single multithreaded gem5 process using the existing support. I > >>think > >> >> >>this > >> >> >>>could be particularly valuable with a hierarchical network; e.g., > >> >> >>maybe I > >> >> >>>would want to model a rack in multithreaded mode on a single > >> >>multicore > >> >> >>>server, then use pd-gem5 or multi-gem5 to build up a simulation of > >> >> >>>multiple > >> >> >>>racks. Would this work out of the box with either of these > >>patches, > >> >> >>and if > >> >> >>>not, what would need to be done? > >> >> >>> > >> >> >>>4. Is it possible to construct a single-process simulation model > >> >>that's > >> >> >>>identical to the distributed simulation? It would be very valuable > >> >>for > >> >> >>>verification to be able to take a single simulation run and do it > >> >>both > >> >> >>>within a single process and also across multiple processes and > >>verify > >> >> >>that > >> >> >>>identical results are achieved. This seems like a big drawback to > >>the > >> >> >>>multi-gem5 tcp_server approach, IMO. > >> >> >>> > >> >> >>>I'm definitely not saying that all these issues need to be > >>resolved > >> >> >>before > >> >> >>>anything gets committed, but if we can agree that these are valid > >> >> >>goals, > >> >> >>>then we can evaluate detailed issues based on whether they move us > >> >> >>toward > >> >> >>>or away from those goals. > >> >> >>> > >> >> >>>Thanks, > >> >> >>> > >> >> >>>Steve > >> >> >>> > >> >> >>> > >> >> >>>On Thu, Jul 2, 2015 at 8:34 AM Andreas Hansson > >> >> >><andreas.hans...@arm.com> > >> >> >>>wrote: > >> >> >>> > >> >> >>>>Hi all, > >> >> >>>> > >> >> >>>>I think we need to up-level this a bit. From our perspective > >>(and I > >> >> >>>>suspect in general): > >> >> >>>> > >> >> >>>>1. Robustness is important. Having a design that _may_ break, > >> >>however > >> >> >>>>unlikely is simply not an option. > >> >> >>>> > >> >> >>>>2. Performance and scaling is important. We can compare actual > >> >>numbers > >> >> >>>>here, and I am fairly sure the two solutions are on par. Let’s > >> >> >>quantify > >> >> >>>>that though. > >> >> >>>> > >> >> >>>>3. Checkpointing must not rely on synchronicity. It is vital for > >> >> >>several > >> >> >>>>workloads that we can checkpoint the various gem5 instances at > >> >> >>different > >> >> >>>>Ticks (due to the way the workloads are constructed). > >> >> >>>> > >> >> >>>>Andreas > >> >> >>>> > >> >> >>>>On 01/07/2015 21:41, "gem5-dev on behalf of Mohammad Alian" > >> >> >>>><gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu> wrote: > >> >> >>>> > >> >> >>>>>Thanks Gabor for the reply. > >> >> >>>>> > >> >> >>>>>I feel this conversation is useful as we can find out pros/cons > >>of > >> >> >>each > >> >> >>>>>design. > >> >> >>>>>Please find my response in-lined below. > >> >> >>>>> > >> >> >>>>>Thank you, > >> >> >>>>>Mohammad > >> >> >>>>> > >> >> >>>>>On Wed, Jul 1, 2015 at 6:44 AM, Gabor Dozsa > >><gabor.do...@arm.com> > >> >> >>>>wrote: > >> >> >>>>> > >> >> >>>>>>Hi All, > >> >> >>>>>> > >> >> >>>>>>Sorry for the missing indentation in my previous e-mail! (This > >>was > >> >> >>my > >> >> >>>>>>first e-mail to the dev-list so I could not simply use > >>“reply"). > >> >> >>>>Below > >> >> >>>>>>is > >> >> >>>>>>the same message, hopefully in more readable form. > >> >> >>>>>> > >> >> >>>>>>==================================== > >> >> >>>>>> > >> >> >>>>>>Hi All, > >> >> >>>>>> > >> >> >>>>>>Thank you Mohammad for your elaboration on the issues! > >> >> >>>>>> > >> >> >>>>>>I have written most of the multi-gem5 patch so let me add some > >> >>more > >> >> >>>>>>clarifications and answer to your concerns. My comments are > >> >>inline > >> >> >>>>>>below. > >> >> >>>>>> > >> >> >>>>>>Thanks, > >> >> >>>>>>- Gabor > >> >> >>>>>> > >> >> >>>>>>On 6/27/15, 10:20 AM, "Mohammad Alian" <al...@wisc.edu> wrote: > >> >> >>>>>> > >> >> >>>>>>>Hi All, > >> >> >>>>>>> > >> >> >>>>>>>Curtis-Thank you for listing some of the differences. I was > >> >> >>waiting > >> >> >>>>for > >> >> >>>>>>>the > >> >> >>>>>>>completed multi-gem5 patch before I send my review. Please > >>see my > >> >> >>>>>>inline > >> >> >>>>>>>response below. I¹ve addressed the concerns that you¹ve > >>raised. > >> >> >>>>Also, > >> >> >>>>>>I¹ve > >> >> >>>>>>>added a bit more to the comparison. > >> >> >>>>>>> > >> >> >>>>>>>-* Synchronization. > >> >> >>>>>>> > >> >> >>>>>>>pd-gem5 implements this in Python (not a problem in itself; > >> >> >>>>>>aesthetically > >> >> >>>>>>> > >> >> >>>>>>>this is nice, but...). The issue is that pd-gem5's data > >>packets > >> >> >>and > >> >> >>>>>>> > >> >> >>>>>>>barrier messages travel over different sockets. Since pd-gem5 > >> >> >>could > >> >> >>>>>>see > >> >> >>>>>>> > >> >> >>>>>>>data packets passing synchronization barriers, it could > >>create an > >> >> >>>>>>> > >> >> >>>>>>>inconsistent checkpoint. > >> >> >>>>>>> > >> >> >>>>>>>multi-gem5's synchronization is implemented in C++ using sync > >> >> >>>>events, > >> >> >>>>>>but > >> >> >>>>>>> > >> >> >>>>>>>more importantly, the messages queue up in the same stream > >>and so > >> >> >>>>>>cannot > >> >> >>>>>>> > >> >> >>>>>>>have the issue just described. (Event ordering is often > >>crucial > >> >> >>in > >> >> >>>>>>> > >> >> >>>>>>>snapshot protocols.) Therefore we feel that multi-gem5 is a > >>more > >> >> >>>>robust > >> >> >>>>>>> > >> >> >>>>>>>solution in this respect. > >> >> >>>>>>> > >> >> >>>>>>>Each packet in pd-gem5 has a time-stamp. So even if data > >>packets > >> >> >>>>pass > >> >> >>>>>>>synchronization barriers (in another word data packets arrive > >> >> >>early > >> >> >>>>at > >> >> >>>>>>the > >> >> >>>>>>>destination node), destination node process packets based on > >> >>their > >> >> >>>>>>>timestamp. Actually allowing data packets to pass sync > >>barriers > >> >> >>is a > >> >> >>>>>>nice > >> >> >>>>>>>feature that can reduce the likelihood of late packet > >>reception. > >> >> >>>>>>Ordering > >> >> >>>>>>>of data messages that flow over pd-gem5 nodes is also > >>preserved > >> >>in > >> >> >>>>>>pd-gem5 > >> >> >>>>>>>implementation. > >> >> >>>>>> > >> >> >>>>>>This seems to be a misunderstanding. Maybe the wording was not > >> >> >>>>precise > >> >> >>>>>>before.The problem is not a data packet that “passing" a sync > >> >> >>barrier > >> >> >>>>>>but the other way around, a sync barrier that can pass a data > >> >> >>packet > >> >> >>>>>>(e.g. while the data packet is waiting in the host operating > >> >>system > >> >> >>>>>>socket layer). If that happens, the packet will arrive later > >>than > >> >> >>it > >> >> >>>>>>was > >> >> >>>>>>supposed to and it may miss the computed receive tick. > >> >> >>>>>> > >> >> >>>>>>For instance, let’s assume that the quantum coincides with the > >> >> >>>>simulated > >> >> >>>>>>Ether link delay. (This is the optimal choice of quantum to > >> >> >>minimize > >> >> >>>>the > >> >> >>>>>>number of sync barriers.) If a data packet is sent right at > >>the > >> >> >>>>>>beginning > >> >> >>>>>>of a quantum then this packet must arrive at the destination > >>gem5 > >> >> >>>>>>process > >> >> >>>>>>within the same quantum in order not to miss its receive tick > >>at > >> >> >>the > >> >> >>>>>>very > >> >> >>>>>>beginning of the next quantum. If the sync barrier can pass the > >> >> >>data > >> >> >>>>>>packet > >> >> >>>>>>then the data packet may arrive only during the next quantum > >>(or > >> >> >>in > >> >> >>>>>>extreme conditions even later than that) so when it arrives the > >> >> >>>>receiver > >> >> >>>>>>gem5 may pass already the receive tick. > >> >> >>>>>> > >> >> >>>>>>This argument makes more sense than the previous one. Note that > >> >> >>gem5 > >> >> >>>>is > >> >> >>>>>>a > >> >> >>>>>cycle accurate simulator and it runs orders of magnitude slower > >> >>that > >> >> >>>>real > >> >> >>>>>hardware. So it's almost impossible that the flight time of > >>packet > >> >> >>>>through > >> >> >>>>>real network turns to be more that simulation time of one > >>quantum. > >> >>We > >> >> >>>>ran > >> >> >>>>>a > >> >> >>>>>set of experiments just for this purpose: with quantum size > >>equal > >> >>to > >> >> >>>>>etherlink delay, we never got any late arrival violation (what > >>you > >> >> >>>>>described) for full NAS benchmarks suit (please refer to the > >> >>paper). > >> >> >>>>> > >> >> >>>>>multi-gem5 is optimized for a case that almost never happens! > >>and > >> >> >>>>>scarifying speedup for no gain. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>Time-stamping does help with this issue. Also, if a data > >>packet is > >> >> >>>>>>waiting > >> >> >>>>>>in the host operating system socket layer when the simulation > >> >> >>thread > >> >> >>>>>>exits > >> >> >>>>>>to python to complete the next sync barrier then the packet > >>will > >> >> >>>>not go > >> >> >>>>>>into the checkpoint that may follow that sync barrier. > >> >> >>>>>> > >> >> >>>>>>That's a good point. Current pd-gem5 checkpointing mechanism > >>might > >> >> >>>>miss > >> >> >>>>>packets that have been sent during previous quantum and are > >>waiting > >> >> >>in > >> >> >>>>OS > >> >> >>>>>socket buffer. I should add some code inside ethertap > >>serialization > >> >> >>>>>function to drain ethertap socket before writing checkpoint. I > >>will > >> >> >>>>update > >> >> >>>>>pd-gem5 patch accordingly. > >> >> >>>>> > >> >> >>>>>> > >> >> >>>>>>>What you mentioned as an advantage for multi-gem5 is actually > >>a > >> >> >>key > >> >> >>>>>>>disadvantage: buffering sync messages behind data packets can > >>add > >> >> >>>>up to > >> >> >>>>>>>the > >> >> >>>>>>>synchronization overhead and slow down simulation > >>significantly. > >> >> >>>>>> > >> >> >>>>>>The purpose of sync messages is to make sure that the data > >>packets > >> >> >>>>>>arrive > >> >> >>>>>>in time (in terms of simulated time) at the destination so they > >> >>can > >> >> >>>>be > >> >> >>>>>>scheduled for being received at the proper computed tick. Sync > >> >> >>>>messages > >> >> >>>>>>also make sure that no data packets are in flight when a sync > >> >> >>barrier > >> >> >>>>>>completes before we take a checkpoint. They definitely add > >> >> >>overhead > >> >> >>>>for > >> >> >>>>>>the simulation but they are necessary for the correctness of > >>the > >> >> >>>>>>simulation. > >> >> >>>>>> > >> >> >>>>>>The receive thread in multi-gem5 reads out packets from the > >>socket > >> >> >>in > >> >> >>>>>>parallel with the simulation thread so packets normally will > >>not > >> >>be > >> >> >>>>>>"queueing up” before a sync barrier message. There is > >>definitely > >> >> >>>>room > >> >> >>>>>>for improvements in the current implementation for reducing the > >> >> >>>>>>synchronization overhead but that is likely true for pd-gem5, > >>too. > >> >> >>>>>>The important thing here is that the solution must provide > >> >> >>>>correctness > >> >> >>>>>>(robustness) first. > >> >> >>>>>> > >> >> >>>>>>pd-gem5 provides correctness. Please read my previous comment. > >>The > >> >> >>>>whole > >> >> >>>>>purpose of multi/pd-gem5 is to parallelize simulation with > >>minimal > >> >> >>>>>overhead > >> >> >>>>>and gain speedup. If you fail to do so, nobody will use your > >>tool. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>>Also, > >> >> >>>>>>>multi-gem5 send huge sized messages (multiHeaderPkt) through > >> >> >>>>network to > >> >> >>>>>>>perform each synchronization point, which increases > >> >> >>synchronization > >> >> >>>>>>>overhead further. In pd-gem5, we choose to send just one > >> >>character > >> >> >>>>as > >> >> >>>>>>sync > >> >> >>>>>>>message through a separate socket to reduce synchronization > >> >> >>>>overhead. > >> >> >>>>>> > >> >> >>>>>>The TCP/IP message size is unlikely the bottleneck here. > >> >>Multi-gem5 > >> >> >>>>will > >> >> >>>>>>send ~50 bytes more in a sync barrier message than pd-gem5 but > >> >>that > >> >> >>>>>>bigger > >> >> >>>>>>sync message still fits into a single ethernet frame on the > >>wire. > >> >> >>The > >> >> >>>>>>end-to-end latency overhead that is caused by 50 bytes extra > >> >> >>payload > >> >> >>>>for > >> >> >>>>>>a small single frame TCP/IP message is likely to fall into the > >> >> >>>>“noise" > >> >> >>>>>>category if one tries to measure it in a real cluster. > >> >> >>>>>> > >> >> >>>>>>You should prove your hypothesis experimentally. Each gem5 > >>process > >> >> >>>>>send/receive sync messages at the end of every quantum. Say you > >>are > >> >> >>>>>simulating "N" node computer cluster with "M" different > >> >> >>configuration. > >> >> >>>>>Then > >> >> >>>>>you will have N*M gem5 processes that send/receive these 50 > >>Bytes > >> >>(it > >> >> >>>>>think > >> >> >>>>>it's more) extra data at the same time over network ... > >> >> >>>>> > >> >> >>>>>Furthermore, multi-gem5 send a header before each data message. > >> >> >>>>Comparing > >> >> >>>>>with pd-gem5, pd-gem5 just add 12 Bytes (each time-stamp is 12 > >> >>least > >> >> >>>>>significant digits of the Tick) to each data packet. I don't > >>know > >> >> >>>>exactly > >> >> >>>>>how large are these "MultiHeaderPkt", but it just has two Tick > >> >>field > >> >> >>>>that > >> >> >>>>>each is 64 Bytes! Also, header packets are separate TCP > >>packets, so > >> >> >>you > >> >> >>>>>pay > >> >> >>>>>for sending two separate packets for each data packet. And > >>worst, > >> >>you > >> >> >>>>>serialize all of these with sync messages. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>* Packet handling. > >> >> >>>>>>> > >> >> >>>>>>>pd-gem5 uses EtherTap for data packets but changed the polling > >> >> >>>>>>mechanism > >> >> >>>>>>> > >> >> >>>>>>>to go through the main event queue. Since this rate is > >>actually > >> >> >>>>linked > >> >> >>>>>>> > >> >> >>>>>>>with simulator progress, it cannot guarantee that the packets > >>are > >> >> >>>>>>>serviced > >> >> >>>>>>> > >> >> >>>>>>>at regular intervals of real time. This can lead to packets > >> >> >>>>queueing > >> >> >>>>>>up > >> >> >>>>>>> > >> >> >>>>>>>which would contribute to the synchronization issues mentioned > >> >> >>>>above. > >> >> >>>>>>> > >> >> >>>>>>>multi-gem5 uses plain sockets with separate receive threads > >>and > >> >>so > >> >> >>>>does > >> >> >>>>>>>not > >> >> >>>>>>> > >> >> >>>>>>>have this issue. > >> >> >>>>>>> > >> >> >>>>>>>I think again you are pointing to your first concern that I¹ve > >> >> >>>>>>explained > >> >> >>>>>>>above. Packets that have queued up in EtherTap socket, will be > >> >> >>>>>>processed > >> >> >>>>>>>and delivered to simulation environment at the beginning of > >>next > >> >> >>>>>>>simulation > >> >> >>>>>>>quantum. > >> >> >>>>>>> > >> >> >>>>>>>Please notice that multi-gem5 introduces a new simObjects to > >> >> >>>>interface > >> >> >>>>>>>simulation environment to real world which is redundant. This > >> >> >>>>>>>functionality > >> >> >>>>>>>is already there by EtherTap. > >> >> >>>>>> > >> >> >>>>>>Except that the EtherTap solution does not provide a correct > >> >> >>(robust) > >> >> >>>>>>solution for the synchronization problem. > >> >> >>>>>> > >> >> >>>>>>Please read my first/second comments. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>* Checkpoint accuracy. > >> >> >>>>>>> > >> >> >>>>>>>A user would like to have a checkpoint at precisely the time > >>the > >> >> >>>>>>> > >> >> >>>>>>>'m5 checkpoint' operation is executed so as to not miss any of > >> >>the > >> >> >>>>>>> > >> >> >>>>>>>area of interest in his application. > >> >> >>>>>>> > >> >> >>>>>>>pd-gem5 requires that simulation finish the current quantum > >> >> >>>>>>> > >> >> >>>>>>>before checkpointing, so it cannot provide this. > >> >> >>>>>>> > >> >> >>>>>>>(Shortening the quantum can help, but usually the snapshot is > >> >> >>being > >> >> >>>>>>taken > >> >> >>>>>>> > >> >> >>>>>>>while 'fast-forwarding', i.e. simulating as fast as possible, > >> >> >>which > >> >> >>>>>>would > >> >> >>>>>>> > >> >> >>>>>>>motivate a longer quantum.) > >> >> >>>>>>> > >> >> >>>>>>>multi-gem5 can enter the drain cycle immediately upon > >>receiving a > >> >> >>>>>>> > >> >> >>>>>>>checkpoint request. We find this accuracy highly desirable. > >> >> >>>>>>> > >> >> >>>>>>>It¹s true that if you have a large quantum size then there > >>would > >> >> >>be > >> >> >>>>>>some > >> >> >>>>>>>discrepancy between the m5_ckpt instruction tick and the > >>actual > >> >> >>dump > >> >> >>>>>>tick. > >> >> >>>>>>>Based on multi-gem5 code, my understanding is that you send > >>async > >> >> >>>>>>>checkpoint message as soon as one of the gem5 processes > >>encounter > >> >> >>>>>>m5_ckpt > >> >> >>>>>>>instruction. But I¹m not sure how you fix the aforementioned > >> >> >>issue, > >> >> >>>>>>>because > >> >> >>>>>>>you have to sync all gem5 processes before you start dumping > >> >> >>>>>>checkpoint, > >> >> >>>>>>>which necessitate a global synchronization beforehand. > >> >> >>>>>> > >> >> >>>>>>In multi-gem5, the gem5 process who encounters the m5_ckpt > >> >> >>>>instruction > >> >> >>>>>>sends out an async checkpoint notification for the peer gem5 > >> >> >>>>processes > >> >> >>>>>>and > >> >> >>>>>>then it starts the draining immediately (at the same tick). So > >> >>the > >> >> >>>>>>checkpoint will be taken at the exact tick form the initiator > >> >> >>process > >> >> >>>>>>point of view. The global synchronisation with the peer > >>processes > >> >> >>>>takes > >> >> >>>>>>place while the initiator process is still waiting at the same > >> >>tick > >> >> >>>>(i.e > >> >> >>>>>>the simulation thread is suspended). However, the receiver > >>thread > >> >> >>>>>>Continues reading out the socket - while waiting for the global > >> >> >>sync > >> >> >>>>to > >> >> >>>>>>complete- to make sure that in-flight data packets from peer > >>gem5 > >> >> >>>>>>processes > >> >> >>>>>>are stored properly and saved into the checkpoint. > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>So you mean multi-gem5 ends up with having gem5 processes with > >> >> >>>>different > >> >> >>>>>ticks after checkpoint? In pd-gem5 we make sure that all gem5 > >> >> >>processes > >> >> >>>>>start dumping checkpoint at the same tick. Are you sure that > >>this > >> >>is > >> >> >>>>>correct to have each gem5 process dump checkpoint at different > >> >> >>ticks??? > >> >> >>>>> > >> >> >>>>>I don't think this a correct checkpointing design. However, if > >>you > >> >> >>>>feel it > >> >> >>>>>is correct, I can change a couple of lines in "Simulation.py" > >>and > >> >> >>>>barrier > >> >> >>>>>scripts to implement the same functionality in pd-gem5. One > >>thing > >> >> >>that > >> >> >>>>you > >> >> >>>>>are obsessed about is to make sure that there is no in-flight > >> >>packets > >> >> >>>>>while > >> >> >>>>>we start dumping checkpoint, and you have all these complex > >> >> >>mechanisms > >> >> >>>>in > >> >> >>>>>place to insure that! I think you can 99.99999% make sure that > >> >>there > >> >> >>>>is no > >> >> >>>>>in-flight packet by waiting for 1 second after all gem5 > >>processes > >> >> >>>>finished > >> >> >>>>>their quantum simulation and then dump checkpoint. Do you really > >> >> >>think > >> >> >>>>>that > >> >> >>>>>delivering a tcp packet would take more than 1 second in today's > >> >> >>>>systems!? > >> >> >>>>>Always go for simple solutions ... > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>By the way, we have a fix for this issue by introducing a new > >>m5 > >> >> >>>>pseudo > >> >> >>>>>>>instruction. > >> >> >>>>>> > >> >> >>>>>>I fail to see how a new pseudo instruction can solve the > >>problem > >> >>of > >> >> >>>>>>completing the full quantum in pd-gem5 before a checkpoint can > >>be > >> >> >>>>taken. > >> >> >>>>>>Could you please elaborate on that? > >> >> >>>>>> > >> >> >>>>>>As we take checkpoint while fast-forwarding and it is likely > >>that > >> >> >>we > >> >> >>>>>>relax > >> >> >>>>>synchronization for speedup purpose, a new pseudo instruction > >>that > >> >> >>can > >> >> >>>>set > >> >> >>>>>quantum size (m5_qset) can be helpful. So, one can insert > >>m5_qset > >> >>in > >> >> >>>>his > >> >> >>>>>benchmark source code before entering ROI that contains m5_ckpt > >>to > >> >> >>>>>decrease > >> >> >>>>>quantum size beforehand and reduce the discrepancy between > >>m5_ckpt > >> >> >>tick > >> >> >>>>>and > >> >> >>>>>actual checkpoint tick. This is not included in pd-gem5 patch > >>right > >> >> >>>>now. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>* Implementation of network topology. > >> >> >>>>>>> > >> >> >>>>>>>pd-gem5 uses a separate gem5 process to act as a switch > >>whereas > >> >> >>>>>>multi-gem5 > >> >> >>>>>>> > >> >> >>>>>>>uses a standalone packet relay process. > >> >> >>>>>>> > >> >> >>>>>>>We haven't measured the overhead of pd-gem5's simulated switch > >> >> >>yet, > >> >> >>>>but > >> >> >>>>>>> > >> >> >>>>>>>we're confident that our approach is at least as fast and more > >> >> >>>>>>scalable. > >> >> >>>>>>> > >> >> >>>>>>>There is this flexibility in pd-gem5 to simulate a switch box > >> >> >>>>alongside > >> >> >>>>>>>one > >> >> >>>>>>>of the other gem5 processes. However, it might make that gem5 > >> >> >>>>process > >> >> >>>>>>the > >> >> >>>>>>>simulation bottleneck. One of the advantages of pd-gem5 over > >> >> >>>>>>multi-gem5 is > >> >> >>>>>>>that we use gem5 to simulate a switch box, which allows us to > >> >> >>model > >> >> >>>>any > >> >> >>>>>>>network topology by instantiating several Switch simObjects > >>and > >> >> >>>>>>>interconnect them with EhterLink in an arbitrary fashion. A > >> >> >>>>standalone > >> >> >>>>>>tcp > >> >> >>>>>>>server just can provide switch functionality (forwarding > >>packets > >> >> >>to > >> >> >>>>>>>destinations) and model a star network topology. Furthermore, > >>it > >> >> >>>>cannot > >> >> >>>>>>>model various network timings such as queueing delay, > >>congestion, > >> >> >>>>and > >> >> >>>>>>>routing latency. Also it has some accuracy issues that I will > >> >> >>point > >> >> >>>>out > >> >> >>>>>>>next. > >> >> >>>>>> > >> >> >>>>>>I agree with the complex topology argument. We already > >>mentioned > >> >> >>that > >> >> >>>>>>before as an advantage for pd-gem5 from the point of view of > >> >>future > >> >> >>>>>>extensions. However, I do not agree that multi-gem5 cannot > >>model > >> >> >>>>>>queueing > >> >> >>>>>>delays and congestions. For a simple crossbar switch, it can > >>model > >> >> >>>>>>queueing > >> >> >>>>>>delays and congestions, but the receive queues are distributed > >> >> >>among > >> >> >>>>the > >> >> >>>>>>gem5 processes. > >> >> >>>>>> > >> >> >>>>>>It's true that you can model queuing delay of a simple > >>crossbar by > >> >> >>>>>distributing queues across gem5 processes (end points). But to > >>be > >> >> >>able > >> >> >>>>to > >> >> >>>>>do so you have to ensure the ordering of packets that you > >>enqueue > >> >>in > >> >> >>>>the > >> >> >>>>>distributed queues. It is almost impossible without a > >>synchronized > >> >> >>>>switch > >> >> >>>>>box. You should have a reorder queue that reorders packets > >> >> >>dynamically > >> >> >>>>and > >> >> >>>>>updates timing parameter for each packet as well. I don't know > >>how > >> >> >>much > >> >> >>>>>progress have you had to ensure ordering scheme in multi-gem5 > >>but > >> >>you > >> >> >>>>may > >> >> >>>>>already realized that how complex and error prone it can be. > >>This > >> >> >>>>argument > >> >> >>>>>is also related to my next argument for "Broken network timing". > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>* Broken network timing: > >> >> >>>>>>> > >> >> >>>>>>>Forwarding packets between gem5 processes using a standalone > >>tcp > >> >> >>>>server > >> >> >>>>>>>can > >> >> >>>>>>>cause reordering between packets that have different source > >>but > >> >> >>same > >> >> >>>>>>>destination. It causes inaccurate network timing and worse of > >> >>all > >> >> >>>>>>>non-deterministic simulation. pd-gem5 resolve this by > >>reordering > >> >> >>>>>>packets > >> >> >>>>>>>at > >> >> >>>>>>>Switch process and then send them to their destination (it¹s > >> >> >>>>possible > >> >> >>>>>>as > >> >> >>>>>>>switch is synchronized with the rest of the nodes). > >> >> >>>>>> > >> >> >>>>>>In multi-gem5, there is always a HeaderPkt that contains some > >>meta > >> >> >>>>>>information for each data packet. The meta information include > >>the > >> >> >>>>send > >> >> >>>>>>tick and the sender rank (i.e. a unique ID of the sender gem5 > >> >> >>>>process). > >> >> >>>>>>We use those information to define a well defined ordering of > >> >> >>packets > >> >> >>>>>>even > >> >> >>>>>>if packets are arriving at the same receiver from different > >> >> >>senders. > >> >> >>>>>>This > >> >> >>>>>>packet ordering scheme is still being tested so the > >>corresponding > >> >> >>>>patch > >> >> >>>>>>is > >> >> >>>>>>not on the RB yet. > >> >> >>>>>> > >> >> >>>>>>Please read my previous comment. The most important part of > >> >> >>>>>>multi/pd-gem5 > >> >> >>>>>extension is ensuring accurate and deterministic simulation. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>* Amount of changes > >> >> >>>>>>> > >> >> >>>>>>>pd-gem5 introduce different modes in etherlink just to provide > >> >> >>>>accurate > >> >> >>>>>>>timing for each component in the network subsystem (NIC, link, > >> >> >>>>switch) > >> >> >>>>>>as > >> >> >>>>>>>well as capability of modeling different network topologies > >> >>(mesh, > >> >> >>>>>>ring, > >> >> >>>>>>>fat tree, etc). To enable a simple functionality, like what > >> >> >>>>multi-gem5 > >> >> >>>>>>>provides, the amount of changes in gem5 can be limited to > >> >> >>>>time-stamping > >> >> >>>>>>>packets and providing synchronization through python scripts. > >> >> >>>>However, > >> >> >>>>>>>multi-gem5 re-implements functionalists that are already in > >>gem5. > >> >> >>>>>> > >> >> >>>>>>This argument holds only if both implementations are correct > >> >> >>>>(robust). > >> >> >>>>>>It > >> >> >>>>>>still seems to me that pd-gem5 does not provide correctness for > >> >>the > >> >> >>>>>>synchronization/checkpointing parts. > >> >> >>>>>> > >> >> >>>>>>Again, please read my first comment for correctness of pd-gem5. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>>> > >> >> >>>>>>>* Integrating with gem5 mainstream: > >> >> >>>>>>> > >> >> >>>>>>>pd-gem5 launch script is written in python which is suited for > >> >> >>>>>>integration > >> >> >>>>>>>with gem5 python scripts. However multi-gem5 uses bash script. > >> >> >>Also, > >> >> >>>>>>all > >> >> >>>>>>>source files in pd-gem5 are already parts of gem5 mainstream. > >> >> >>>>However > >> >> >>>>>>>multi-gem5 has tcp_server.cc/hh that is a standalone process > >>and > >> >> >>>>cannot > >> >> >>>>>>be > >> >> >>>>>>>part of gem5. > >> >> >>>>>> > >> >> >>>>>>The multi-gem5 launch script is simply enough to rely only on > >>the > >> >> >>>>>>shell. It > >> >> >>>>>>can obviously be easily re-written in python if that added any > >> >> >>value. > >> >> >>>>>>The > >> >> >>>>>>tcp_server component is only a utility (like the "m5" utility > >>that > >> >> >>is > >> >> >>>>>>also > >> >> >>>>>>part of gem5). > >> >> >>>>>> > >> >> >>>>>>The thing is that it's more likely that users want to add some > >> >> >>>>>functionality to the run-script of multi/pd-gem5. E.g. pd-gem5 > >> >> >>>>run-script > >> >> >>>>>supports launching simulations using a simulation pool > >>management > >> >> >>>>>software ( > >> >> >>>>>http://research.cs.wisc.edu/htcondor/). Using python enables > >>users > >> >>to > >> >> >>>>>easily add these kind of supports. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>> > >> >> >>>>>>Cheers, > >> >> >>>>>>- Gabor > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>>>On Fri, Jun 26, 2015 at 8:40 PM, Curtis Dunham > >> >> >>>><curtis.dun...@arm.com> > >> >> >>>>>>>wrote: > >> >> >>>>>>> > >> >> >>>>>>>>Hello everyone, > >> >> >>>>>>>>We have taken a look at how pd-gem5 compares with multi-gem5. > >> >> >>>>While > >> >> >>>>>>>>intending > >> >> >>>>>>>>to deliver the same functionality, there are some crucial > >> >> >>>>differences: > >> >> >>>>>>>> > >> >> >>>>>>>>* Synchronization. > >> >> >>>>>>>> > >> >> >>>>>>>> pd-gem5 implements this in Python (not a problem in > >>itself; > >> >> >>>>>>>>aesthetically > >> >> >>>>>>>> this is nice, but...). The issue is that pd-gem5's data > >> >> >>>>packets > >> >> >>>>>>and > >> >> >>>>>>>> barrier messages travel over different sockets. Since > >> >> >>pd-gem5 > >> >> >>>>>>could > >> >> >>>>>>>>see > >> >> >>>>>>>> data packets passing synchronization barriers, it could > >> >> >>create > >> >> >>>>an > >> >> >>>>>>>> inconsistent checkpoint. > >> >> >>>>>>>> > >> >> >>>>>>>> multi-gem5's synchronization is implemented in C++ using > >> >>sync > >> >> >>>>>>events, > >> >> >>>>>>>>but > >> >> >>>>>>>> more importantly, the messages queue up in the same > >>stream > >> >> >>and > >> >> >>>>so > >> >> >>>>>>>>cannot > >> >> >>>>>>>> have the issue just described. (Event ordering is often > >> >> >>>>crucial > >> >> >>>>>>in > >> >> >>>>>>>> snapshot protocols.) Therefore we feel that multi-gem5 > >>is a > >> >> >>>>more > >> >> >>>>>>>>robust > >> >> >>>>>>>> solution in this respect. > >> >> >>>>>>>> > >> >> >>>>>>>>* Packet handling. > >> >> >>>>>>>> > >> >> >>>>>>>> pd-gem5 uses EtherTap for data packets but changed the > >> >> >>polling > >> >> >>>>>>>>mechanism > >> >> >>>>>>>> to go through the main event queue. Since this rate is > >> >> >>>>actually > >> >> >>>>>>>>linked > >> >> >>>>>>>> with simulator progress, it cannot guarantee that the > >> >>packets > >> >> >>>>are > >> >> >>>>>>>>serviced > >> >> >>>>>>>> at regular intervals of real time. This can lead to > >>packets > >> >> >>>>>>>>queueing up > >> >> >>>>>>>> which would contribute to the synchronization issues > >> >> >>mentioned > >> >> >>>>>>above. > >> >> >>>>>>>> > >> >> >>>>>>>> multi-gem5 uses plain sockets with separate receive > >>threads > >> >> >>>>and so > >> >> >>>>>>>>does > >> >> >>>>>>>>not > >> >> >>>>>>>> have this issue. > >> >> >>>>>>>> > >> >> >>>>>>>>* Checkpoint accuracy. > >> >> >>>>>>>> > >> >> >>>>>>>> A user would like to have a checkpoint at precisely the > >>time > >> >> >>the > >> >> >>>>>>>> 'm5 checkpoint' operation is executed so as to not miss > >>any > >> >>of > >> >> >>>>the > >> >> >>>>>>>> area of interest in his application. > >> >> >>>>>>>> > >> >> >>>>>>>> pd-gem5 requires that simulation finish the current > >>quantum > >> >> >>>>>>>> before checkpointing, so it cannot provide this. > >> >> >>>>>>>> > >> >> >>>>>>>> (Shortening the quantum can help, but usually the > >>snapshot is > >> >> >>>>being > >> >> >>>>>>>>taken > >> >> >>>>>>>> while 'fast-forwarding', i.e. simulating as fast as > >>possible, > >> >> >>>>which > >> >> >>>>>>>>would > >> >> >>>>>>>> motivate a longer quantum.) > >> >> >>>>>>>> > >> >> >>>>>>>> multi-gem5 can enter the drain cycle immediately upon > >> >> >>receiving > >> >> >>>>a > >> >> >>>>>>>> checkpoint request. We find this accuracy highly > >>desirable. > >> >> >>>>>>>> > >> >> >>>>>>>>* Implementation of network topology. > >> >> >>>>>>>> > >> >> >>>>>>>> pd-gem5 uses a separate gem5 process to act as a switch > >> >> >>whereas > >> >> >>>>>>>>multi-gem5 > >> >> >>>>>>>> uses a standalone packet relay process. > >> >> >>>>>>>> > >> >> >>>>>>>> We haven't measured the overhead of pd-gem5's simulated > >> >>switch > >> >> >>>>yet, > >> >> >>>>>>>>but > >> >> >>>>>>>> we're confident that our approach is at least as fast and > >> >>more > >> >> >>>>>>>>scalable. > >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>>Thanks, > >> >> >>>>>>>>Curtis > >> >> >>>>>>>>________________________________________ > >> >> >>>>>>>>From: gem5-dev [gem5-dev-boun...@gem5.org] On Behalf Of > >> Mohammad > >> >> >>>>>>Alian [ > >> >> >>>>>>>>al...@wisc.edu] > >> >> >>>>>>>>Sent: Friday, June 26, 2015 7:37 PM > >> >> >>>>>>>>To: gem5 Developer List > >> >> >>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a > >> >> >>parallel/distributed > >> >> >>>>>>>>system > >> >> >>>>>>>>on multiple physical hosts > >> >> >>>>>>>> > >> >> >>>>>>>>Hi Anthony, > >> >> >>>>>>>> > >> >> >>>>>>>>I think that would be a good option, then I can add pd-gem5 > >> >> >>>>>>>>functionality > >> >> >>>>>>>>on top of that. Right now I've simplified your > >>implementation. > >> >> >>>>Also, I > >> >> >>>>>>>>think I had found some bugs in your patch that I cannot > >>remember > >> >> >>>>now. > >> >> >>>>>>If > >> >> >>>>>>>>you decided to ship EtherSwitch patch, let me know to give > >>you a > >> >> >>>>>>review > >> >> >>>>>>>>on > >> >> >>>>>>>>that. > >> >> >>>>>>>> > >> >> >>>>>>>>Thanks, > >> >> >>>>>>>>Mohammad > >> >> >>>>>>>> > >> >> >>>>>>>>On Thu, Jun 25, 2015 at 8:36 PM, Gutierrez, Anthony < > >> >> >>>>>>>>anthony.gutier...@amd.com> wrote: > >> >> >>>>>>>> > >> >> >>>>>>>>>Would it make sense for me to ship the EtherSwitch patch > >>first, > >> >> >>>>since > >> >> >>>>>>>>it > >> >> >>>>>>>>>has utility on its own, and then we can decide which of the > >> >> >>>>>>>>"multi-gem5" > >> >> >>>>>>>>>approaches is best, or if it's some combination of both? > >> >> >>>>>>>>> > >> >> >>>>>>>>>The only reason I never shipped it was because Steve raised > >>an > >> >> >>>>issue > >> >> >>>>>>>>that > >> >> >>>>>>>>>I didn't have a good alternative for, and didn't have the > >>time > >> >> >>to > >> >> >>>>>>look > >> >> >>>>>>>>into > >> >> >>>>>>>>>one at that time. > >> >> >>>>>>>>>________________________________________ > >> >> >>>>>>>>>From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of > >> >>Mohammad > >> >> >>>>>>>>Alian [ > >> >> >>>>>>>>>al...@wisc.edu] > >> >> >>>>>>>>>Sent: Wednesday, June 24, 2015 12:43 PM > >> >> >>>>>>>>>To: gem5 Developer List > >> >> >>>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a > >> >> >>parallel/distributed > >> >> >>>>>>>>system > >> >> >>>>>>>>>on multiple physical hosts > >> >> >>>>>>>>> > >> >> >>>>>>>>>Hi Andreas, > >> >> >>>>>>>>> > >> >> >>>>>>>>>Thanks for the comment. > >> >> >>>>>>>>>I think the checkpointing support in both works is the same. > >> >> >>Here > >> >> >>>>is > >> >> >>>>>>>>how > >> >> >>>>>>>>>checkpointing support is implemented in pd-gem5: > >> >> >>>>>>>>> > >> >> >>>>>>>>>Whenever one of gem5 processes encounter an m5-checkpoint > >> >>pseudo > >> >> >>>>>>>>>instruction, it will send a ³recv-ckpt² signal to the > >> >> >>>>>>>>>³barrier² process. Then the ³barrier² process sends a > >> >> >>³take-ckpt² > >> >> >>>>>>>>signal > >> >> >>>>>>>>to > >> >> >>>>>>>>>all the simulated nodes > >> >> >>>>>>>>>(including the node that encountered m5-checkpoint) at the > >>end > >> >> >>of > >> >> >>>>the > >> >> >>>>>>>>>current simulation quantum. On the reception of > >> >> >>>>>>>>>³take-ckpt² signal, gem5 processes start dumping > >>check-points. > >> >> >>>>This > >> >> >>>>>>>>makes > >> >> >>>>>>>>>each simulated node dump a checkpoint > >> >> >>>>>>>>>at the same simulated time point while ensuring there is no > >> >> >>>>in-flight > >> >> >>>>>>>>>packets. > >> >> >>>>>>>>> > >> >> >>>>>>>>>I believe this is the same as multi-gem5 patch approach for > >> >> >>>>>>checkpoint > >> >> >>>>>>>>>support (based on the commit message of > >> >> >>>>>>>>http://reviews.gem5.org/r/2865/ > >> >> >>>>>>>>). > >> >> >>>>>>>>>Also, we have tested our mechanism with several benchmarks > >>and > >> >> >>it > >> >> >>>>>>>>works. > >> >> >>>>>>>>As > >> >> >>>>>>>>>Steve suggested, I'll look into Curtis's patch and try to > >> >>review > >> >> >>>>it > >> >> >>>>>>as > >> >> >>>>>>>>>well. > >> >> >>>>>>>>>But as Nilay also mentioned earlier, there are some codes > >> >> >>missing > >> >> >>>>in > >> >> >>>>>>>>>Curtis's patch. I prefer to first run multi-gem5 before > >> >>starting > >> >> >>>>to > >> >> >>>>>>>>review > >> >> >>>>>>>>>it. > >> >> >>>>>>>>> > >> >> >>>>>>>>>Thank you, > >> >> >>>>>>>>>Mohammad > >> >> >>>>>>>>> > >> >> >>>>>>>>>On Wed, Jun 24, 2015 at 7:25 AM, Andreas Hansson < > >> >> >>>>>>>>andreas.hans...@arm.com> > >> >> >>>>>>>>>wrote: > >> >> >>>>>>>>> > >> >> >>>>>>>>>>Hi Steve, > >> >> >>>>>>>>>> > >> >> >>>>>>>>>>Apologies for the confusion. We are on the same page. My > >>point > >> >> >>is > >> >> >>>>>>>>that > >> >> >>>>>>>>we > >> >> >>>>>>>>>>cannot simply take a little bit of patch A and a little > >>bit of > >> >> >>>>>>>>patch B. > >> >> >>>>>>>>>>This change involves a lot of code, and we need to approach > >> >> >>this > >> >> >>>>in > >> >> >>>>>>>>a > >> >> >>>>>>>>>>structured fashion. My proposal is to do it bottom up, and > >> >> >>start > >> >> >>>>by > >> >> >>>>>>>>>>getting the basic support in place. Since > >> >> >>>>>>>>>http://reviews.gem5.org/r/2826/ > >> >> >>>>>>>>>>has already been on the review board for a few months, I am > >> >> >>>>merely > >> >> >>>>>>>>>>suggesting that the it would be a good start to relate the > >> >> >>newly > >> >> >>>>>>>>posted > >> >> >>>>>>>>>>patches to what is already there. > >> >> >>>>>>>>>> > >> >> >>>>>>>>>>Andreas > >> >> >>>>>>>>>> > >> >> >>>>>>>>>> > >> >> >>>>>>>>>> > >> >> >>>>>>>>>>On 24/06/2015 13:11, "gem5-dev on behalf of Steve > >>Reinhardt" > >> >> >>>>>>>>>><gem5-dev-boun...@gem5.org on behalf of ste...@gmail.com> > >> >> >>wrote: > >> >> >>>>>>>>>> > >> >> >>>>>>>>>>>Hi Andreas, > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>I'm a little confused by your email---you say you're > >> >> >>>>fundamentally > >> >> >>>>>>>>>opposed > >> >> >>>>>>>>>>>to looking at both patches and picking the best features, > >> >>then > >> >> >>>>you > >> >> >>>>>>>>point > >> >> >>>>>>>>>>>out that the patches Curtis posted have the feature of > >>better > >> >> >>>>>>>>>>>checkpointing > >> >> >>>>>>>>>>>support so we should pick that :). > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>Obviously we can't just pick patch A from Mohammad's set > >>and > >> >> >>>>patch > >> >> >>>>>>>>B > >> >> >>>>>>>>>from > >> >> >>>>>>>>>>>Curtis's set and expect them to work together, but I think > >> >> >>that > >> >> >>>>>>>>having > >> >> >>>>>>>>>>>both > >> >> >>>>>>>>>>>sets of patches available and comparing and contrasting > >>the > >> >> >>two > >> >> >>>>>>>>>>>implementations should enable us to get to a single > >> >> >>>>implementation > >> >> >>>>>>>>>that's > >> >> >>>>>>>>>>>the best of both. Someone will have to make the effort of > >> >> >>>>>>>>integrating > >> >> >>>>>>>>>the > >> >> >>>>>>>>>>>better ideas from one set into the other set to create a > >>new > >> >> >>>>>>>>unified > >> >> >>>>>>>>set > >> >> >>>>>>>>>>>of > >> >> >>>>>>>>>>>patches; (or maybe we commit one set and then integrate > >>the > >> >> >>>>best of > >> >> >>>>>>>>the > >> >> >>>>>>>>>>>other set as patches on top of that), but the first step > >>is > >> >>to > >> >> >>>>>>>>identify > >> >> >>>>>>>>>>>what "the best of both" is. Having Mohammad look at > >>Curtis's > >> >> >>>>>>>>patches, > >> >> >>>>>>>>>and > >> >> >>>>>>>>>>>Curtis (or someone else from ARM) closely examine > >>Mohammad's > >> >> >>>>>>>>patches > >> >> >>>>>>>>>would > >> >> >>>>>>>>>>>be a great start. I intend to review them both, though > >> >> >>>>>>>>unfortunately > >> >> >>>>>>>>my > >> >> >>>>>>>>>>>time has been scarce lately---I'm hoping to squeeze that > >>in > >> >> >>>>later > >> >> >>>>>>>>this > >> >> >>>>>>>>>>>week. > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>Once we've had a few people look at both, we can discuss > >>the > >> >> >>>>pros > >> >> >>>>>>>>and > >> >> >>>>>>>>>cons > >> >> >>>>>>>>>>>of each, then discuss the strategy for getting the best > >> >> >>features > >> >> >>>>>>>>in. > >> >> >>>>>>>>So > >> >> >>>>>>>>>>>far I've heard that Mohammad's patches have a better > >>network > >> >> >>>>model > >> >> >>>>>>>>but > >> >> >>>>>>>>>the > >> >> >>>>>>>>>>>ARM patches have better checkpointing support; that seems > >> >> >>like a > >> >> >>>>>>>>good > >> >> >>>>>>>>>>>start. > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>Steve > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>On Wed, Jun 24, 2015 at 12:26 AM Andreas Hansson < > >> >> >>>>>>>>>andreas.hans...@arm.com > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>wrote: > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>>>Hi all, > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>Great work. However, I fundamentally do not believe in > >>the > >> >> >>>>>>>>approach > >> >> >>>>>>>>of > >> >> >>>>>>>>>>>>Œletting reviewers pick the best features¹. There is no > >>way > >> >> >>we > >> >> >>>>>>>>would > >> >> >>>>>>>>>>>>ever > >> >> >>>>>>>>>>>>get something working out if it. We need to get _one_ > >> >>working > >> >> >>>>>>>>solution > >> >> >>>>>>>>>>>>here, and figure out how to best get there. I would > >>propose > >> >> >>to > >> >> >>>>>>>>do it > >> >> >>>>>>>>>>>>bottom up, starting with the basic multi-simulator > >>instance > >> >> >>>>>>>>support, > >> >> >>>>>>>>>>>>checkpointing support, and then move on to the network > >> >> >>between > >> >> >>>>>>>>the > >> >> >>>>>>>>>>>>simulator instances. > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>Thus, I propose we go with the low-level plumbing and > >> >> >>>>checkpoint > >> >> >>>>>>>>>support > >> >> >>>>>>>>>>>>from what Curtis has posted. I believe proper > >>checkpointing > >> >> >>>>>>>>support > >> >> >>>>>>>>to > >> >> >>>>>>>>>>>>be > >> >> >>>>>>>>>>>>the most challenging, and from what I can tell this is > >>far > >> >> >>more > >> >> >>>>>>>>>limited > >> >> >>>>>>>>>>>>in > >> >> >>>>>>>>>>>>what you just posted Mohammad. Could you perhaps review > >> >> >>Curtis > >> >> >>>>>>>>patches > >> >> >>>>>>>>>>>>based on your insights, and we can try and get these > >>patches > >> >> >>in > >> >> >>>>>>>>shape > >> >> >>>>>>>>>>>>and > >> >> >>>>>>>>>>>>committed asap. > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>Once we have the baseline functionality in place, then we > >> >>can > >> >> >>>>>>>>start > >> >> >>>>>>>>>>>>looking at the more elaborate network models. > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>Does this sound reasonable? > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>Thanks, > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>Andreas > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>On 24/06/2015 05:05, "gem5-dev on behalf of Mohammad > >>Alian" > >> >> >>>>>>>>>>>><gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu> > >> >> >>wrote: > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>>Hello All, > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>I have submitted a chain of patches which enables gem5 > >>to > >> >> >>>>>>>>simulate > >> >> >>>>>>>>a > >> >> >>>>>>>>>>>>>cluster on multiple physical hosts: > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2909/ > >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2910/ > >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2912/ > >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2913/ > >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2914/ > >> >> >>>>>>>><http://reviews.gem5.org/r/2914/> > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>and a patch that contains run scripts for a simple > >> >> >>experiment: > >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2915/ > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>We have run several benchmarks using this > >>infrastructure, > >> >> >>>>>>>>including > >> >> >>>>>>>>>NAS > >> >> >>>>>>>>>>>>>parallel benchmarks (MPI) and DCBench-hadoop > >> >> >>>>>>>>>>>>>(http://prof.ict.ac.cn/DCBench/), > >> >> >>>>>>>>>>>>>and would be happy to share scripts/diskimages. > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>We call this *pd-gem5*. *pd-gem5 *functionality is more > >>or > >> >> >>>>less > >> >> >>>>>>>>the > >> >> >>>>>>>>>>>>same > >> >> >>>>>>>>>>>>>as > >> >> >>>>>>>>>>>>>Curtis's patch for *multi-gem5.* However, I feel > >>*pd-gem5 > >> >> >>>>>>>>*network > >> >> >>>>>>>>>>>>model > >> >> >>>>>>>>>>>>>is > >> >> >>>>>>>>>>>>>more thorough; it also enables modeling different > >>network > >> >> >>>>>>>>topologies. > >> >> >>>>>>>>>>>>>Having both set of changes together let reviewers to > >>pick > >> >> >>best > >> >> >>>>>>>>>features > >> >> >>>>>>>>>>>>>from both works. > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>Thank you, > >> >> >>>>>>>>>>>>>Mohammad Alian > >> >> >>>>>>>>>>>>>_______________________________________________ > >> >> >>>>>>>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >> >>>>>>>>attachments > >> >> >>>>>>>>>are > >> >> >>>>>>>>>>>>confidential and may also be privileged. If you are not > >>the > >> >> >>>>>>>>intended > >> >> >>>>>>>>>>>>recipient, please notify the sender immediately and do > >>not > >> >> >>>>>>>>disclose > >> >> >>>>>>>>>the > >> >> >>>>>>>>>>>>contents to any other person, use it for any purpose, or > >> >> >>store > >> >> >>>>or > >> >> >>>>>>>>copy > >> >> >>>>>>>>>>>>the > >> >> >>>>>>>>>>>>information in any medium. Thank you. > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, > >>Cambridge > >> >> >>CB1 > >> >> >>>>>>>>9NJ, > >> >> >>>>>>>>>>>>Registered in England & Wales, Company No: 2557590 > >> >> >>>>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, > >> >> >>>>Cambridge > >> >> >>>>>>>>CB1 > >> >> >>>>>>>>>>>>9NJ, > >> >> >>>>>>>>>>>>Registered in England & Wales, Company No: 2548782 > >> >> >>>>>>>>>>>>_______________________________________________ > >> >> >>>>>>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>_______________________________________________ > >> >> >>>>>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>>>> > >> >> >>>>>>>>>> > >> >> >>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >> >>>>attachments > >> >> >>>>>>>>are > >> >> >>>>>>>>>>confidential and may also be privileged. If you are not the > >> >> >>>>intended > >> >> >>>>>>>>>>recipient, please notify the sender immediately and do not > >> >> >>>>disclose > >> >> >>>>>>>>the > >> >> >>>>>>>>>>contents to any other person, use it for any purpose, or > >>store > >> >> >>or > >> >> >>>>>>>>copy > >> >> >>>>>>>>>the > >> >> >>>>>>>>>>information in any medium. Thank you. > >> >> >>>>>>>>>> > >> >> >>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge > >> >>CB1 > >> >> >>>>9NJ, > >> >> >>>>>>>>>>Registered in England & Wales, Company No: 2557590 > >> >> >>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, > >> >> >>Cambridge > >> >> >>>>CB1 > >> >> >>>>>>>>9NJ, > >> >> >>>>>>>>>>Registered in England & Wales, Company No: 2548782 > >> >> >>>>>>>>>>_______________________________________________ > >> >> >>>>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>>>> > >> >> >>>>>>>>>_______________________________________________ > >> >> >>>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>>>_______________________________________________ > >> >> >>>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>>> > >> >> >>>>>>>>_______________________________________________ > >> >> >>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>> > >> >> >>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >> >>attachments > >> >> >>>>>>are > >> >> >>>>>>>>confidential and may also be privileged. If you are not the > >> >> >>>>intended > >> >> >>>>>>>>recipient, please notify the sender immediately and do not > >> >> >>disclose > >> >> >>>>>>the > >> >> >>>>>>>>contents to any other person, use it for any purpose, or > >>store > >> >>or > >> >> >>>>copy > >> >> >>>>>>>>the > >> >> >>>>>>>>information in any medium. Thank you. > >> >> >>>>>>>> > >> >> >>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge > >>CB1 > >> >> >>>>9NJ, > >> >> >>>>>>>>Registered in England & Wales, Company No: 2557590 > >> >> >>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, > >>Cambridge > >> >> >>>>CB1 > >> >> >>>>>>>>9NJ, > >> >> >>>>>>>>Registered in England & Wales, Company No: 2548782 > >> >> >>>>>>>> > >> >> >>>>>>>>_______________________________________________ > >> >> >>>>>>>>gem5-dev mailing list > >> >> >>>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>>>> > >> >> >>>>>>>_______________________________________________ > >> >> >>>>>>>gem5-dev mailing list > >> >> >>>>>>>gem5-dev@gem5.org > >> >> >>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >>attachments > >> >> >>>>are > >> >> >>>>>>confidential and may also be privileged. If you are not the > >> >> >>intended > >> >> >>>>>>recipient, please notify the sender immediately and do not > >> >>disclose > >> >> >>>>the > >> >> >>>>>>contents to any other person, use it for any purpose, or store > >>or > >> >> >>>>copy > >> >> >>>>>>the > >> >> >>>>>>information in any medium. Thank you. > >> >> >>>>>> > >> >> >>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > >> >> >>9NJ, > >> >> >>>>>>Registered in England & Wales, Company No: 2557590 > >> >> >>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, > >>Cambridge > >> >> >>CB1 > >> >> >>>>>>9NJ, > >> >> >>>>>>Registered in England & Wales, Company No: 2548782 > >> >> >>>>>>_______________________________________________ > >> >> >>>>>>gem5-dev mailing list > >> >> >>>>>>gem5-dev@gem5.org > >> >> >>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>>>> > >> >> >>>>>_______________________________________________ > >> >> >>>>>gem5-dev mailing list > >> >> >>>>>gem5-dev@gem5.org > >> >> >>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>> > >> >> >>>> > >> >> >>>>-- IMPORTANT NOTICE: The contents of this email and any > >>attachments > >> >> >>are > >> >> >>>>confidential and may also be privileged. If you are not the > >>intended > >> >> >>>>recipient, please notify the sender immediately and do not > >>disclose > >> >> >>the > >> >> >>>>contents to any other person, use it for any purpose, or store or > >> >>copy > >> >> >>>>the > >> >> >>>>information in any medium. Thank you. > >> >> >>>> > >> >> >>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > >>9NJ, > >> >> >>>>Registered in England & Wales, Company No: 2557590 > >> >> >>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge > >>CB1 > >> >> >>>>9NJ, > >> >> >>>>Registered in England & Wales, Company No: 2548782 > >> >> >>>>_______________________________________________ > >> >> >>>>gem5-dev mailing list > >> >> >>>>gem5-dev@gem5.org > >> >> >>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >>>> > >> >> >>>_______________________________________________ > >> >> >>>gem5-dev mailing list > >> >> >>>gem5-dev@gem5.org > >> >> >>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >>-- IMPORTANT NOTICE: The contents of this email and any attachments > >> >>are > >> >> >>confidential and may also be privileged. If you are not the > >>intended > >> >> >>recipient, please notify the sender immediately and do not disclose > >> >>the > >> >> >>contents to any other person, use it for any purpose, or store or > >>copy > >> >> >>the > >> >> >>information in any medium. Thank you. > >> >> >> > >> >> >>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > >>9NJ, > >> >> >>Registered in England & Wales, Company No: 2557590 > >> >> >>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge > >>CB1 > >> >> >>9NJ, > >> >> >>Registered in England & Wales, Company No: 2548782 > >> >> >>_______________________________________________ > >> >> >>gem5-dev mailing list > >> >> >>gem5-dev@gem5.org > >> >> >>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> >> > >> >> >_______________________________________________ > >> >> >gem5-dev mailing list > >> >> >gem5-dev@gem5.org > >> >> >http://m5sim.org/mailman/listinfo/gem5-dev > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments > >>are > >> >> confidential and may also be privileged. If you are not the intended > >> >> recipient, please notify the sender immediately and do not disclose > >>the > >> >> contents to any other person, use it for any purpose, or store or > >>copy > >> >>the > >> >> information in any medium. Thank you. > >> >> > >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > >> >> Registered in England & Wales, Company No: 2557590 > >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > >> >>9NJ, > >> >> Registered in England & Wales, Company No: 2548782 > >> >> _______________________________________________ > >> >> gem5-dev mailing list > >> >> gem5-dev@gem5.org > >> >> http://m5sim.org/mailman/listinfo/gem5-dev > >> >> > >> >_______________________________________________ > >> >gem5-dev mailing list > >> >gem5-dev@gem5.org > >> >http://m5sim.org/mailman/listinfo/gem5-dev > >> > >> > >> -- IMPORTANT NOTICE: The contents of this email and any attachments are > >> confidential and may also be privileged. If you are not the intended > >> recipient, please notify the sender immediately and do not disclose the > >> contents to any other person, use it for any purpose, or store or copy > >>the > >> information in any medium. Thank you. > >> > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > >> Registered in England & Wales, Company No: 2557590 > >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > >>9NJ, > >> Registered in England & Wales, Company No: 2548782 > >> _______________________________________________ > >> gem5-dev mailing list > >> gem5-dev@gem5.org > >> http://m5sim.org/mailman/listinfo/gem5-dev > >> > >_______________________________________________ > >gem5-dev mailing list > >gem5-dev@gem5.org > >http://m5sim.org/mailman/listinfo/gem5-dev > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 > _______________________________________________ > gem5-dev mailing list > gem5-dev@gem5.org > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev