Then you are assuming taking checkpoint with quantum size smaller than link latency which contradicts your initial motivation for unsync checkpoint!: (I copied this sentence from earlier messages in the thread as a reminder) "Shortening the quantum can help, but usually the snapshot is being taken while 'fast-forwarding', i.e. simulating as fast as possible, which would motivate a longer quantum."
What if somebody wants to relax synchronization and take checkpoint? On Tue, Jul 7, 2015 at 7:38 AM, Gabor Dozsa <[email protected]> wrote: > > Hi Mohammad and all, > > gem5 processes may restore at a different tick from a checkpoint but the > next periodic sync will happen at the same tick in all gem5. A receive > tick of a packet cannot fall into the current quantum so every packet can > get scheduled for receive properly even if a checkpoint/restore happens > during a quantum. > > Regarding your multi-threaded dual config, my understanding is that > EtherLink is not prepared to work with multi threading as it lacks thread > safety. The multiple event queues/threads config only works if the systems > are independent. > > One possible way to fix that is to provide a "multi-thread” based > implementation for MultiIface ;-) > > - Gabor > > On 7/7/15, 6:29 AM, "Mohammad Alian" <[email protected]> wrote: > > >Gabor- My concern about unsync checkpoint is that when you restore from an > >unsync checkpoint, you'll have gem5 processes that each is running in > >different tick. Then how do you handle accurate delivery of packets > >between > >these gem5 processes? It will also make it harder to integrate > >multi/pd-gem5 with current multi-threaded gem5. The problem with sync > >checkpoint is that you cannot exactly take checkpoint at ROI, but I think > >unsync checkpoint introduces some other problems. Considering the > >necessary > >warmup period before starting stat collection, I think we don't need to > >exactly pinpoint the ROI. Please correct me if I'm wrong. > > > >I'm trying to run a multi-threaded experiment with pd-gem5, but I got an > >error when I tried to partition dual mode simulation on two threads. I > >posted that in gem5 users mailing list. Please help me on that if you can. > > > >Thank you, > >Mohammad > > > >On Mon, Jul 6, 2015 at 11:45 AM, Gabor Dozsa <[email protected]> wrote: > > > >> Thank you Steve for the detailed elaboration on the issues. > >> > >> > >> Regarding the “unsynchronized checkpoints”, the terminology might be a > >>bit > >> confusing. In fact, we always need to do a global synchronization among > >> the gem5 processes before taking a distributed checkpoint (in order to > >> avoid in-flight packets). The global synchronization here means that > >>each > >> gem5 has to suspend the simulation and wait until every in-flight > >>packets > >> arrives (and is stored) at the destination gem5 process. If that global > >> synchronization step happens at the same simulated tick in each gem5 > >>then > >> the we call the checkpoint “synchronous” otherwise it is an > >>“asynchronous” > >> checkpoint. > >> > >> In the MPI application example I mentioned before the checkpoint should > >>be > >> triggered as soon as the “slowest” MPI process reaches the > >>MPI_barrier(). > >> The problem is that the “slowest” MPI process usually does not reach the > >> MPI_barrier() right at the end of the current quantum. If we let the > >> simulation continue until the quantum completes (to ensure that the > >> checkpoint is taken at the same simulated tick in each gem5) then the > >>MPI > >> processes will complete the MPI_barrier and start executing the ROI code > >> already. > >> > >> Regarding the integration of multi-threaded/multi-host simulation, > >> multi-gem5 does not support fine grain simulation of hierarchical > >>switches > >> (or any other network topologies except a single crossbar) or multiple > >> synchronization domains currently. > >> > >> However, I'm a bit confused about your statement that you don’t see > >>value > >> in ever building a shared-memory transport for MultiIface. MultiIface in > >> my view is just an abstract interface for “multi-(ether)-link" objects > >> which are link objects for connecting multiple (i.e. more than two) > >> systems. It aims to encapsulate the API necessary for any Link object > >> in a any multi-system configuration - provided that we partition the > >> systems across network links during run time. > >> > >> An orthogonal issue is if we want to include a simple crossbar switch > >> model in a MultiIface implementation or we want to provide a > >>‘standalone' > >> fine grain model for the switch (e.g. the pd-gem5 approach). > >> > >> Thanks, > >> - Gabor > >> > >> > >> > >> On 7/3/15, 7:33 PM, "Steve Reinhardt" <[email protected]> wrote: > >> > >> >Thanks Mohammad & Gabor for the responses. > >> > > >> >I think there's still some misunderstanding on what I mean by the > >> >integration of multi-threaded and multi-host simulation based on > >>Gabor's > >> >response above and Andreas's response in the other thread. > >> > > >> >The primary example scenario I'm proposing is as Mohammad described: > >> >within > >> >each host node, we're simulating an entire rack + top-of-rack switch > >>in a > >> >single gem5 process, with separate event queues/threads being used to > >> >parallelize across nodes within the rack. The switch may or may not be > >>on > >> >its own thread as well. The synchronization among the threads only > >>needs > >> >to be at the granularity of the intra-rack network latency. > >> > > >> >Now we want to expand this by using pd-gem5 or multi-gem5 to > >>parallelize > >> >multiple of these rack-level simulations across hosts, so we can > >>simulate > >> >a > >> >whole row of a datacenter. Only the uplinks from the TOR switches > >>would > >> >need to go over sockets between processes, and the switch being > >>modeled by > >> >pd-gem5 or multi-gem5 would be the end-of-row switch. The > >>synchronization > >> >delay among the multiple gem5 processes would be based on the > >>inter-rack > >> >latency. > >> > > >> >So the basic question is: Is this feasible with pd-gem5 / multi-gem5, > >>and > >> >if not, how much work would it take to make it so? > >> > > >> >However, my larger point is that I still don't see value in ever > >>building > >> >a > >> >shared-memory transport for MultiIface. For this model, there is > >>clearly > >> >no > >> >need for it. Things get more complicated if we want to do something > >>like > >> >have N nodes connected to a single switch and split that over two hosts > >> >(with N/2 nodes simulated on each), but even in that case, I think > >>it's a > >> >better idea to make the switch model deal with having half of its links > >> >internal and half external (since we already want the same model to > >>work > >> >in > >> >both the all-internal and all-external cases). Not that I'm worried > >>that > >> >someone is about to go off and build this shared-memory transport, but > >>I > >> >think it's important to reach an understanding here, since it's > >> >fundamental > >> >to defining the strategic relationship between these capabilities going > >> >forward. > >> > > >> >Stepping back a little further, it would be nice to have a model that > >>is > >> >as > >> >generic as the multi-threading model, where it's really just a matter > >>of > >> >taking a simulation, partitioning the components among the threads, and > >> >setting the synchronization quantum, and it works. Of course, even with > >> >the > >> >multi-threaded model, if you don't choose your partitioning and your > >> >quantum wisely, you're not going to get much speedup or a deterministic > >> >simulation, but the fundamental implementation is oblivious to that. > >>I'm > >> >not saying we really need to go all the way to this extreme---it's > >>pretty > >> >reasonable to assume that no one in the near future will want to > >>partition > >> >across hosts anywhere other than on a simulated network link---but I > >>think > >> >we should keep this ideal in mind as a guiding principle as we choose > >>how > >> >to go forward from here. > >> > > >> >This ties in to my point #4, which is that if we're really building a > >> >mechanism to partition a simulation across multiple hosts, then you > >>should > >> >be able to run the same simulation in a single gem5 process and get the > >> >same results. I think this is the strength of pd-gem5; correspondingly > >>the > >> >main weakness of multi-gem5 is that it architecturally feels more like > >> >tying together a set of mostly independent gem5 simulations than like > >> >partitioning a single gem5 simulation. (Of course, they both end up at > >> >roughly the same point in the middle.) > >> > > >> >On the flip side, multi-gem5 has some clear advantages in terms of the > >> >better separation of the communication layer (and I can imagine it > >>being > >> >very useful to port to MPI and perhaps some RDMA API for InfiniBand > >> >clusters). Also I think the integrated sockets for communication and > >> >syncrhonization are the superior design; while the separate sockets > >>used > >> >by > >> >pd-gem5 may only very rarely cause problems, I agree with Andreas that > >> >that's not good enough, and I don't see any real advantage either---if > >>you > >> >have to flush the data sockets (or wait for them to drain) before > >> >synchronizing, then you might as well just have the synchronization > >> >messages queue up behind the data messages. > >> > > >> >Regarding unsynchronized checkpoints: Thanks for the example, but I'm > >> >still > >> >a little confused. If all the processes are about to execute an > >> >MPI_Barrier(), doesn't that mean they'll all be synchronized shortly > >> >anyway? So what's the harm until waiting until they're synchronized and > >> >then checkpointing? > >> > > >> >Regarding the simulation of non-Ethernet networks: I agree that the > >> >biggest > >> >obstacle to this is the lack of generality of the current gem5 network > >> >components. I tried to take a step toward supporting other link types > >>two > >> >years ago (see http://reviews.gem5.org/r/1922) but someone shot me > down > >> >;). > >> >We shouldn't try and fix that here, but we should also consciously try > >>not > >> >to make it any worse... > >> > > >> >Thanks for reading all the way to the end! > >> > > >> >Steve > >> > > >> > > >> >On Fri, Jul 3, 2015 at 7:11 AM Gabor Dozsa <[email protected]> > wrote: > >> > > >> >>Hi all, > >> >> > >> >>Thank you Steve for the thorough review. > >> >> > >> >>First, let me elaborate a bit on Andreas’s 3rd point about > >> >>non-synchronous > >> >>checkpoints. Let’s assume that we aim to simulate MPI applications > >>(HPC > >> >>workloads). The ROI in an MPI application is typically starts with a > >> >>global MPI_Barrier() call. We want to take the checkpoint when *every* > >> >>gem5 process is reached that MPI_Barrier() in the simulated code but > >> >>that > >> >>may not happen at the same tick in each gem5 (due to load imbalance > >> >>among > >> >>the simulated nodes). That’s why multi-gem5 implements the > >> >>non-synchronous > >> >>checkpoint support. > >> >> > >> >>My answers to your questions are as follows. > >> >> > >> >>1. The only change necessary to use multi-gem5 with a non Ethernet > >> >>(simulated) network is to replace the Ethernet packet type with > >>another > >> >>packet type in MultiIface. > >> >>In fact, the first implementation of MultiIface was a template > >> >>that took EthPacketData as parameter because I plan to support > >>different > >> >>network types. When I realized that currently only Ethernet is > >>supported > >> >>by gem5 I dropped the template param to keep the implementation > >> >>simpler. I > >> >>have also realized in the meantime that the right approach would > >> >>probably > >> >>be to create a pure virtual ‘base' class for network packets from > >>which > >> >>Ethernet (and other types of) packets could be derived. Then > >>MultiIface > >> >>could simply use that base class to provide support for different > >> >>network > >> >>types. The interface provided by the base packet class could be very > >> >>simple. Beside the total size() of the packet, multi-gem5 only needs a > >> >>method to ‘extract' the source/destination address. Those addresses > >>are > >> >>used in MultiIface as opaque byte arrays so they are quite network > >>type > >> >>agnostic already. > >> >> > >> >>2. That’s right, we have designed the MultiIface/TCPIface split with > >> >>different underlaying messaging systems in mind. > >> >> > >> >>3. Multi-gem5 can work together with multi-threaded/multi-event-queue > >> >>gem5 > >> >>configs. The current TCPIface/tcp_server components would still use > >> >>sockets to send around the packets. So it is possible to put together > >>a > >> >>multi-gem5 simulation where each gem5 process has multiple event > >>queues > >> >>(and an independent simulation thread per event queue) but all the > >> >>simulated Ethernet links would use sockets to forward every Ethernet > >> >>packet to the tcp_server. > >> >> > >> >>If someone wanted to run only a single gem5 process to simulate an > >> >>entire > >> >>cluster (using one thread/event-queue per cluster node) then the > >>current > >> >>multi-gem5 implementation using sockets/tcp_server is not optimal. In > >> >>that > >> >>case, a better solution would be to provide a shared memory based > >> >>implementation of the MultiIface virtual communication methods > >> >>sendRaw()/recvRaw()/syncRaw() (i.e. a shared memory equivalent of > >> >>TCPIface). In that implementation, the entire discrete tcp_sever > >> >>component > >> >>could be replaced with a shared data structure. > >> >> > >> >>4. You are right, the current implementation does not make it possible > >> >>to > >> >>construct an equivalent single-process simulation model for a > >>multi-gem5 > >> >>run. However, a possible solution is a shared memory based > >> >>implementation > >> >>of the MultiIface virtual communication methods just as I described in > >> >>the > >> >>previous paragraph. The same implementation could then work with both > >> >>multi-threaded/multi-event-queues and single-thread/single-event-queue > >> >>gem5 configs. > >> >> > >> >>Thanks, > >> >>- Gabor > >> >> > >> >>On 7/2/15, 7:20 PM, "Steve Reinhardt" <[email protected]> wrote: > >> >> > >> >>>Hi everyone, > >> >>> > >> >>>Sorry for taking so long to engage. This is a great development and I > >> >>>think > >> >>>both these patches are terrific contributions. Thanks to Mohammad, > >> >>Gabor, > >> >>>and everyone else involved. > >> >>> > >> >>>I agree with Andreas that we should start with some top-level goals & > >> >>>assumptions, agree on those, and then we can sort out the detailed > >> >>issues > >> >>>based on a consistent view. > >> >>> > >> >>>I definitely agree with Andreas's first two points. The third one > >> >>seems a > >> >>>little surprising; I'd like to hear more about the motivation before > >> >>>expressing an opinion. I can see where non-synchronous checkpointing > >> >>could > >> >>>be useful, but it's also clear from the associated patch that it's > >>not > >> >>>trivial to implement either. How much would be lost by requiring a > >> >>>synchronization before a checkpoint? > >> >>> > >> >>>From my personal perspective, I would like to see whatever we do here > >> >>be a > >> >>>first step toward a more general distributed simulation platform. > >>Both > >> >>of > >> >>>these patches seem pretty Ethernet-centric in different ways. This is > >> >>not > >> >>>terrible; part of the problem is that gem5's current internal > >> >>networking > >> >>>support is already overly Ethernet-centric IMO. But it would be nice > >>to > >> >>>avoid baking that in even further. Rather than assume I have > >>understood > >> >>>all > >> >>>the code completely, I'll phrase things in the form of questions, and > >> >>>people can comment on how those questions would be answered in the > >> >>context > >> >>>of the two different approaches. > >> >>> > >> >>>1. How much effort would be required to simulate a non-Ethernet > >> >>network? > >> >>>My > >> >>>impression is that pd-gem5 has a leg up here, since a gem5 switch > >>model > >> >>>for > >> >>>a non-Ethernet network (which you'd have to write anyway if you were > >> >>>simulating a different network) could be used in place of the current > >> >>>Ethernet switch, where for multi-gem5 I think that the > >> >>>util/multi//tcp_server.cc code would have to be modified (i.e., > >> >>there'd be > >> >>>additional work above and beyond what you'd need to get the network > >> >>>modeled > >> >>>in base gem5). > >> >>> > >> >>>2. How much effort is required to run on a non-Ethernet network (or > >> >>>equivalently using a non-sockets API)? The MultiIface/TCPIface split > >> >>in > >> >>>the multi-gem5 code looks like it addresses this nicely, but pd-gem5 > >> >>seems > >> >>>pretty tied to an Ethernet host fabric. > >> >>> > >> >>>3. Do both of these patches work with the existing multithreaded > >> >>>multiple-event-queue simulation? I think multi-gem5 does (though it > >> >>would > >> >>>be nice to have a confirmation), but it's not clear about pd-gem5. I > >> >>don't > >> >>>see a benefit to having multiple gem5 processes on a single host vs. > >>a > >> >>>single multithreaded gem5 process using the existing support. I think > >> >>this > >> >>>could be particularly valuable with a hierarchical network; e.g., > >> >>maybe I > >> >>>would want to model a rack in multithreaded mode on a single > >>multicore > >> >>>server, then use pd-gem5 or multi-gem5 to build up a simulation of > >> >>>multiple > >> >>>racks. Would this work out of the box with either of these patches, > >> >>and if > >> >>>not, what would need to be done? > >> >>> > >> >>>4. Is it possible to construct a single-process simulation model > >>that's > >> >>>identical to the distributed simulation? It would be very valuable > >>for > >> >>>verification to be able to take a single simulation run and do it > >>both > >> >>>within a single process and also across multiple processes and verify > >> >>that > >> >>>identical results are achieved. This seems like a big drawback to the > >> >>>multi-gem5 tcp_server approach, IMO. > >> >>> > >> >>>I'm definitely not saying that all these issues need to be resolved > >> >>before > >> >>>anything gets committed, but if we can agree that these are valid > >> >>goals, > >> >>>then we can evaluate detailed issues based on whether they move us > >> >>toward > >> >>>or away from those goals. > >> >>> > >> >>>Thanks, > >> >>> > >> >>>Steve > >> >>> > >> >>> > >> >>>On Thu, Jul 2, 2015 at 8:34 AM Andreas Hansson > >> >><[email protected]> > >> >>>wrote: > >> >>> > >> >>>>Hi all, > >> >>>> > >> >>>>I think we need to up-level this a bit. From our perspective (and I > >> >>>>suspect in general): > >> >>>> > >> >>>>1. Robustness is important. Having a design that _may_ break, > >>however > >> >>>>unlikely is simply not an option. > >> >>>> > >> >>>>2. Performance and scaling is important. We can compare actual > >>numbers > >> >>>>here, and I am fairly sure the two solutions are on par. Let’s > >> >>quantify > >> >>>>that though. > >> >>>> > >> >>>>3. Checkpointing must not rely on synchronicity. It is vital for > >> >>several > >> >>>>workloads that we can checkpoint the various gem5 instances at > >> >>different > >> >>>>Ticks (due to the way the workloads are constructed). > >> >>>> > >> >>>>Andreas > >> >>>> > >> >>>>On 01/07/2015 21:41, "gem5-dev on behalf of Mohammad Alian" > >> >>>><[email protected] on behalf of [email protected]> wrote: > >> >>>> > >> >>>>>Thanks Gabor for the reply. > >> >>>>> > >> >>>>>I feel this conversation is useful as we can find out pros/cons of > >> >>each > >> >>>>>design. > >> >>>>>Please find my response in-lined below. > >> >>>>> > >> >>>>>Thank you, > >> >>>>>Mohammad > >> >>>>> > >> >>>>>On Wed, Jul 1, 2015 at 6:44 AM, Gabor Dozsa <[email protected]> > >> >>>>wrote: > >> >>>>> > >> >>>>>>Hi All, > >> >>>>>> > >> >>>>>>Sorry for the missing indentation in my previous e-mail! (This was > >> >>my > >> >>>>>>first e-mail to the dev-list so I could not simply use “reply"). > >> >>>>Below > >> >>>>>>is > >> >>>>>>the same message, hopefully in more readable form. > >> >>>>>> > >> >>>>>>==================================== > >> >>>>>> > >> >>>>>>Hi All, > >> >>>>>> > >> >>>>>>Thank you Mohammad for your elaboration on the issues! > >> >>>>>> > >> >>>>>>I have written most of the multi-gem5 patch so let me add some > >>more > >> >>>>>>clarifications and answer to your concerns. My comments are > >>inline > >> >>>>>>below. > >> >>>>>> > >> >>>>>>Thanks, > >> >>>>>>- Gabor > >> >>>>>> > >> >>>>>>On 6/27/15, 10:20 AM, "Mohammad Alian" <[email protected]> wrote: > >> >>>>>> > >> >>>>>>>Hi All, > >> >>>>>>> > >> >>>>>>>Curtis-Thank you for listing some of the differences. I was > >> >>waiting > >> >>>>for > >> >>>>>>>the > >> >>>>>>>completed multi-gem5 patch before I send my review. Please see my > >> >>>>>>inline > >> >>>>>>>response below. I¹ve addressed the concerns that you¹ve raised. > >> >>>>Also, > >> >>>>>>I¹ve > >> >>>>>>>added a bit more to the comparison. > >> >>>>>>> > >> >>>>>>>-* Synchronization. > >> >>>>>>> > >> >>>>>>>pd-gem5 implements this in Python (not a problem in itself; > >> >>>>>>aesthetically > >> >>>>>>> > >> >>>>>>>this is nice, but...). The issue is that pd-gem5's data packets > >> >>and > >> >>>>>>> > >> >>>>>>>barrier messages travel over different sockets. Since pd-gem5 > >> >>could > >> >>>>>>see > >> >>>>>>> > >> >>>>>>>data packets passing synchronization barriers, it could create an > >> >>>>>>> > >> >>>>>>>inconsistent checkpoint. > >> >>>>>>> > >> >>>>>>>multi-gem5's synchronization is implemented in C++ using sync > >> >>>>events, > >> >>>>>>but > >> >>>>>>> > >> >>>>>>>more importantly, the messages queue up in the same stream and so > >> >>>>>>cannot > >> >>>>>>> > >> >>>>>>>have the issue just described. (Event ordering is often crucial > >> >>in > >> >>>>>>> > >> >>>>>>>snapshot protocols.) Therefore we feel that multi-gem5 is a more > >> >>>>robust > >> >>>>>>> > >> >>>>>>>solution in this respect. > >> >>>>>>> > >> >>>>>>>Each packet in pd-gem5 has a time-stamp. So even if data packets > >> >>>>pass > >> >>>>>>>synchronization barriers (in another word data packets arrive > >> >>early > >> >>>>at > >> >>>>>>the > >> >>>>>>>destination node), destination node process packets based on > >>their > >> >>>>>>>timestamp. Actually allowing data packets to pass sync barriers > >> >>is a > >> >>>>>>nice > >> >>>>>>>feature that can reduce the likelihood of late packet reception. > >> >>>>>>Ordering > >> >>>>>>>of data messages that flow over pd-gem5 nodes is also preserved > >>in > >> >>>>>>pd-gem5 > >> >>>>>>>implementation. > >> >>>>>> > >> >>>>>>This seems to be a misunderstanding. Maybe the wording was not > >> >>>>precise > >> >>>>>>before.The problem is not a data packet that “passing" a sync > >> >>barrier > >> >>>>>>but the other way around, a sync barrier that can pass a data > >> >>packet > >> >>>>>>(e.g. while the data packet is waiting in the host operating > >>system > >> >>>>>>socket layer). If that happens, the packet will arrive later than > >> >>it > >> >>>>>>was > >> >>>>>>supposed to and it may miss the computed receive tick. > >> >>>>>> > >> >>>>>>For instance, let’s assume that the quantum coincides with the > >> >>>>simulated > >> >>>>>>Ether link delay. (This is the optimal choice of quantum to > >> >>minimize > >> >>>>the > >> >>>>>>number of sync barriers.) If a data packet is sent right at the > >> >>>>>>beginning > >> >>>>>>of a quantum then this packet must arrive at the destination gem5 > >> >>>>>>process > >> >>>>>>within the same quantum in order not to miss its receive tick at > >> >>the > >> >>>>>>very > >> >>>>>>beginning of the next quantum. If the sync barrier can pass the > >> >>data > >> >>>>>>packet > >> >>>>>>then the data packet may arrive only during the next quantum (or > >> >>in > >> >>>>>>extreme conditions even later than that) so when it arrives the > >> >>>>receiver > >> >>>>>>gem5 may pass already the receive tick. > >> >>>>>> > >> >>>>>>This argument makes more sense than the previous one. Note that > >> >>gem5 > >> >>>>is > >> >>>>>>a > >> >>>>>cycle accurate simulator and it runs orders of magnitude slower > >>that > >> >>>>real > >> >>>>>hardware. So it's almost impossible that the flight time of packet > >> >>>>through > >> >>>>>real network turns to be more that simulation time of one quantum. > >>We > >> >>>>ran > >> >>>>>a > >> >>>>>set of experiments just for this purpose: with quantum size equal > >>to > >> >>>>>etherlink delay, we never got any late arrival violation (what you > >> >>>>>described) for full NAS benchmarks suit (please refer to the > >>paper). > >> >>>>> > >> >>>>>multi-gem5 is optimized for a case that almost never happens! and > >> >>>>>scarifying speedup for no gain. > >> >>>>> > >> >>>>> > >> >>>>>>Time-stamping does help with this issue. Also, if a data packet is > >> >>>>>>waiting > >> >>>>>>in the host operating system socket layer when the simulation > >> >>thread > >> >>>>>>exits > >> >>>>>>to python to complete the next sync barrier then the packet will > >> >>>>not go > >> >>>>>>into the checkpoint that may follow that sync barrier. > >> >>>>>> > >> >>>>>>That's a good point. Current pd-gem5 checkpointing mechanism might > >> >>>>miss > >> >>>>>packets that have been sent during previous quantum and are waiting > >> >>in > >> >>>>OS > >> >>>>>socket buffer. I should add some code inside ethertap serialization > >> >>>>>function to drain ethertap socket before writing checkpoint. I will > >> >>>>update > >> >>>>>pd-gem5 patch accordingly. > >> >>>>> > >> >>>>>> > >> >>>>>>>What you mentioned as an advantage for multi-gem5 is actually a > >> >>key > >> >>>>>>>disadvantage: buffering sync messages behind data packets can add > >> >>>>up to > >> >>>>>>>the > >> >>>>>>>synchronization overhead and slow down simulation significantly. > >> >>>>>> > >> >>>>>>The purpose of sync messages is to make sure that the data packets > >> >>>>>>arrive > >> >>>>>>in time (in terms of simulated time) at the destination so they > >>can > >> >>>>be > >> >>>>>>scheduled for being received at the proper computed tick. Sync > >> >>>>messages > >> >>>>>>also make sure that no data packets are in flight when a sync > >> >>barrier > >> >>>>>>completes before we take a checkpoint. They definitely add > >> >>overhead > >> >>>>for > >> >>>>>>the simulation but they are necessary for the correctness of the > >> >>>>>>simulation. > >> >>>>>> > >> >>>>>>The receive thread in multi-gem5 reads out packets from the socket > >> >>in > >> >>>>>>parallel with the simulation thread so packets normally will not > >>be > >> >>>>>>"queueing up” before a sync barrier message. There is definitely > >> >>>>room > >> >>>>>>for improvements in the current implementation for reducing the > >> >>>>>>synchronization overhead but that is likely true for pd-gem5, too. > >> >>>>>>The important thing here is that the solution must provide > >> >>>>correctness > >> >>>>>>(robustness) first. > >> >>>>>> > >> >>>>>>pd-gem5 provides correctness. Please read my previous comment. The > >> >>>>whole > >> >>>>>purpose of multi/pd-gem5 is to parallelize simulation with minimal > >> >>>>>overhead > >> >>>>>and gain speedup. If you fail to do so, nobody will use your tool. > >> >>>>> > >> >>>>> > >> >>>>>>>Also, > >> >>>>>>>multi-gem5 send huge sized messages (multiHeaderPkt) through > >> >>>>network to > >> >>>>>>>perform each synchronization point, which increases > >> >>synchronization > >> >>>>>>>overhead further. In pd-gem5, we choose to send just one > >>character > >> >>>>as > >> >>>>>>sync > >> >>>>>>>message through a separate socket to reduce synchronization > >> >>>>overhead. > >> >>>>>> > >> >>>>>>The TCP/IP message size is unlikely the bottleneck here. > >>Multi-gem5 > >> >>>>will > >> >>>>>>send ~50 bytes more in a sync barrier message than pd-gem5 but > >>that > >> >>>>>>bigger > >> >>>>>>sync message still fits into a single ethernet frame on the wire. > >> >>The > >> >>>>>>end-to-end latency overhead that is caused by 50 bytes extra > >> >>payload > >> >>>>for > >> >>>>>>a small single frame TCP/IP message is likely to fall into the > >> >>>>“noise" > >> >>>>>>category if one tries to measure it in a real cluster. > >> >>>>>> > >> >>>>>>You should prove your hypothesis experimentally. Each gem5 process > >> >>>>>send/receive sync messages at the end of every quantum. Say you are > >> >>>>>simulating "N" node computer cluster with "M" different > >> >>configuration. > >> >>>>>Then > >> >>>>>you will have N*M gem5 processes that send/receive these 50 Bytes > >>(it > >> >>>>>think > >> >>>>>it's more) extra data at the same time over network ... > >> >>>>> > >> >>>>>Furthermore, multi-gem5 send a header before each data message. > >> >>>>Comparing > >> >>>>>with pd-gem5, pd-gem5 just add 12 Bytes (each time-stamp is 12 > >>least > >> >>>>>significant digits of the Tick) to each data packet. I don't know > >> >>>>exactly > >> >>>>>how large are these "MultiHeaderPkt", but it just has two Tick > >>field > >> >>>>that > >> >>>>>each is 64 Bytes! Also, header packets are separate TCP packets, so > >> >>you > >> >>>>>pay > >> >>>>>for sending two separate packets for each data packet. And worst, > >>you > >> >>>>>serialize all of these with sync messages. > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>* Packet handling. > >> >>>>>>> > >> >>>>>>>pd-gem5 uses EtherTap for data packets but changed the polling > >> >>>>>>mechanism > >> >>>>>>> > >> >>>>>>>to go through the main event queue. Since this rate is actually > >> >>>>linked > >> >>>>>>> > >> >>>>>>>with simulator progress, it cannot guarantee that the packets are > >> >>>>>>>serviced > >> >>>>>>> > >> >>>>>>>at regular intervals of real time. This can lead to packets > >> >>>>queueing > >> >>>>>>up > >> >>>>>>> > >> >>>>>>>which would contribute to the synchronization issues mentioned > >> >>>>above. > >> >>>>>>> > >> >>>>>>>multi-gem5 uses plain sockets with separate receive threads and > >>so > >> >>>>does > >> >>>>>>>not > >> >>>>>>> > >> >>>>>>>have this issue. > >> >>>>>>> > >> >>>>>>>I think again you are pointing to your first concern that I¹ve > >> >>>>>>explained > >> >>>>>>>above. Packets that have queued up in EtherTap socket, will be > >> >>>>>>processed > >> >>>>>>>and delivered to simulation environment at the beginning of next > >> >>>>>>>simulation > >> >>>>>>>quantum. > >> >>>>>>> > >> >>>>>>>Please notice that multi-gem5 introduces a new simObjects to > >> >>>>interface > >> >>>>>>>simulation environment to real world which is redundant. This > >> >>>>>>>functionality > >> >>>>>>>is already there by EtherTap. > >> >>>>>> > >> >>>>>>Except that the EtherTap solution does not provide a correct > >> >>(robust) > >> >>>>>>solution for the synchronization problem. > >> >>>>>> > >> >>>>>>Please read my first/second comments. > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>* Checkpoint accuracy. > >> >>>>>>> > >> >>>>>>>A user would like to have a checkpoint at precisely the time the > >> >>>>>>> > >> >>>>>>>'m5 checkpoint' operation is executed so as to not miss any of > >>the > >> >>>>>>> > >> >>>>>>>area of interest in his application. > >> >>>>>>> > >> >>>>>>>pd-gem5 requires that simulation finish the current quantum > >> >>>>>>> > >> >>>>>>>before checkpointing, so it cannot provide this. > >> >>>>>>> > >> >>>>>>>(Shortening the quantum can help, but usually the snapshot is > >> >>being > >> >>>>>>taken > >> >>>>>>> > >> >>>>>>>while 'fast-forwarding', i.e. simulating as fast as possible, > >> >>which > >> >>>>>>would > >> >>>>>>> > >> >>>>>>>motivate a longer quantum.) > >> >>>>>>> > >> >>>>>>>multi-gem5 can enter the drain cycle immediately upon receiving a > >> >>>>>>> > >> >>>>>>>checkpoint request. We find this accuracy highly desirable. > >> >>>>>>> > >> >>>>>>>It¹s true that if you have a large quantum size then there would > >> >>be > >> >>>>>>some > >> >>>>>>>discrepancy between the m5_ckpt instruction tick and the actual > >> >>dump > >> >>>>>>tick. > >> >>>>>>>Based on multi-gem5 code, my understanding is that you send async > >> >>>>>>>checkpoint message as soon as one of the gem5 processes encounter > >> >>>>>>m5_ckpt > >> >>>>>>>instruction. But I¹m not sure how you fix the aforementioned > >> >>issue, > >> >>>>>>>because > >> >>>>>>>you have to sync all gem5 processes before you start dumping > >> >>>>>>checkpoint, > >> >>>>>>>which necessitate a global synchronization beforehand. > >> >>>>>> > >> >>>>>>In multi-gem5, the gem5 process who encounters the m5_ckpt > >> >>>>instruction > >> >>>>>>sends out an async checkpoint notification for the peer gem5 > >> >>>>processes > >> >>>>>>and > >> >>>>>>then it starts the draining immediately (at the same tick). So > >>the > >> >>>>>>checkpoint will be taken at the exact tick form the initiator > >> >>process > >> >>>>>>point of view. The global synchronisation with the peer processes > >> >>>>takes > >> >>>>>>place while the initiator process is still waiting at the same > >>tick > >> >>>>(i.e > >> >>>>>>the simulation thread is suspended). However, the receiver thread > >> >>>>>>Continues reading out the socket - while waiting for the global > >> >>sync > >> >>>>to > >> >>>>>>complete- to make sure that in-flight data packets from peer gem5 > >> >>>>>>processes > >> >>>>>>are stored properly and saved into the checkpoint. > >> >>>>>> > >> >>>>>> > >> >>>>>So you mean multi-gem5 ends up with having gem5 processes with > >> >>>>different > >> >>>>>ticks after checkpoint? In pd-gem5 we make sure that all gem5 > >> >>processes > >> >>>>>start dumping checkpoint at the same tick. Are you sure that this > >>is > >> >>>>>correct to have each gem5 process dump checkpoint at different > >> >>ticks??? > >> >>>>> > >> >>>>>I don't think this a correct checkpointing design. However, if you > >> >>>>feel it > >> >>>>>is correct, I can change a couple of lines in "Simulation.py" and > >> >>>>barrier > >> >>>>>scripts to implement the same functionality in pd-gem5. One thing > >> >>that > >> >>>>you > >> >>>>>are obsessed about is to make sure that there is no in-flight > >>packets > >> >>>>>while > >> >>>>>we start dumping checkpoint, and you have all these complex > >> >>mechanisms > >> >>>>in > >> >>>>>place to insure that! I think you can 99.99999% make sure that > >>there > >> >>>>is no > >> >>>>>in-flight packet by waiting for 1 second after all gem5 processes > >> >>>>finished > >> >>>>>their quantum simulation and then dump checkpoint. Do you really > >> >>think > >> >>>>>that > >> >>>>>delivering a tcp packet would take more than 1 second in today's > >> >>>>systems!? > >> >>>>>Always go for simple solutions ... > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>By the way, we have a fix for this issue by introducing a new m5 > >> >>>>pseudo > >> >>>>>>>instruction. > >> >>>>>> > >> >>>>>>I fail to see how a new pseudo instruction can solve the problem > >>of > >> >>>>>>completing the full quantum in pd-gem5 before a checkpoint can be > >> >>>>taken. > >> >>>>>>Could you please elaborate on that? > >> >>>>>> > >> >>>>>>As we take checkpoint while fast-forwarding and it is likely that > >> >>we > >> >>>>>>relax > >> >>>>>synchronization for speedup purpose, a new pseudo instruction that > >> >>can > >> >>>>set > >> >>>>>quantum size (m5_qset) can be helpful. So, one can insert m5_qset > >>in > >> >>>>his > >> >>>>>benchmark source code before entering ROI that contains m5_ckpt to > >> >>>>>decrease > >> >>>>>quantum size beforehand and reduce the discrepancy between m5_ckpt > >> >>tick > >> >>>>>and > >> >>>>>actual checkpoint tick. This is not included in pd-gem5 patch right > >> >>>>now. > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>* Implementation of network topology. > >> >>>>>>> > >> >>>>>>>pd-gem5 uses a separate gem5 process to act as a switch whereas > >> >>>>>>multi-gem5 > >> >>>>>>> > >> >>>>>>>uses a standalone packet relay process. > >> >>>>>>> > >> >>>>>>>We haven't measured the overhead of pd-gem5's simulated switch > >> >>yet, > >> >>>>but > >> >>>>>>> > >> >>>>>>>we're confident that our approach is at least as fast and more > >> >>>>>>scalable. > >> >>>>>>> > >> >>>>>>>There is this flexibility in pd-gem5 to simulate a switch box > >> >>>>alongside > >> >>>>>>>one > >> >>>>>>>of the other gem5 processes. However, it might make that gem5 > >> >>>>process > >> >>>>>>the > >> >>>>>>>simulation bottleneck. One of the advantages of pd-gem5 over > >> >>>>>>multi-gem5 is > >> >>>>>>>that we use gem5 to simulate a switch box, which allows us to > >> >>model > >> >>>>any > >> >>>>>>>network topology by instantiating several Switch simObjects and > >> >>>>>>>interconnect them with EhterLink in an arbitrary fashion. A > >> >>>>standalone > >> >>>>>>tcp > >> >>>>>>>server just can provide switch functionality (forwarding packets > >> >>to > >> >>>>>>>destinations) and model a star network topology. Furthermore, it > >> >>>>cannot > >> >>>>>>>model various network timings such as queueing delay, congestion, > >> >>>>and > >> >>>>>>>routing latency. Also it has some accuracy issues that I will > >> >>point > >> >>>>out > >> >>>>>>>next. > >> >>>>>> > >> >>>>>>I agree with the complex topology argument. We already mentioned > >> >>that > >> >>>>>>before as an advantage for pd-gem5 from the point of view of > >>future > >> >>>>>>extensions. However, I do not agree that multi-gem5 cannot model > >> >>>>>>queueing > >> >>>>>>delays and congestions. For a simple crossbar switch, it can model > >> >>>>>>queueing > >> >>>>>>delays and congestions, but the receive queues are distributed > >> >>among > >> >>>>the > >> >>>>>>gem5 processes. > >> >>>>>> > >> >>>>>>It's true that you can model queuing delay of a simple crossbar by > >> >>>>>distributing queues across gem5 processes (end points). But to be > >> >>able > >> >>>>to > >> >>>>>do so you have to ensure the ordering of packets that you enqueue > >>in > >> >>>>the > >> >>>>>distributed queues. It is almost impossible without a synchronized > >> >>>>switch > >> >>>>>box. You should have a reorder queue that reorders packets > >> >>dynamically > >> >>>>and > >> >>>>>updates timing parameter for each packet as well. I don't know how > >> >>much > >> >>>>>progress have you had to ensure ordering scheme in multi-gem5 but > >>you > >> >>>>may > >> >>>>>already realized that how complex and error prone it can be. This > >> >>>>argument > >> >>>>>is also related to my next argument for "Broken network timing". > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>* Broken network timing: > >> >>>>>>> > >> >>>>>>>Forwarding packets between gem5 processes using a standalone tcp > >> >>>>server > >> >>>>>>>can > >> >>>>>>>cause reordering between packets that have different source but > >> >>same > >> >>>>>>>destination. It causes inaccurate network timing and worse of > >>all > >> >>>>>>>non-deterministic simulation. pd-gem5 resolve this by reordering > >> >>>>>>packets > >> >>>>>>>at > >> >>>>>>>Switch process and then send them to their destination (it¹s > >> >>>>possible > >> >>>>>>as > >> >>>>>>>switch is synchronized with the rest of the nodes). > >> >>>>>> > >> >>>>>>In multi-gem5, there is always a HeaderPkt that contains some meta > >> >>>>>>information for each data packet. The meta information include the > >> >>>>send > >> >>>>>>tick and the sender rank (i.e. a unique ID of the sender gem5 > >> >>>>process). > >> >>>>>>We use those information to define a well defined ordering of > >> >>packets > >> >>>>>>even > >> >>>>>>if packets are arriving at the same receiver from different > >> >>senders. > >> >>>>>>This > >> >>>>>>packet ordering scheme is still being tested so the corresponding > >> >>>>patch > >> >>>>>>is > >> >>>>>>not on the RB yet. > >> >>>>>> > >> >>>>>>Please read my previous comment. The most important part of > >> >>>>>>multi/pd-gem5 > >> >>>>>extension is ensuring accurate and deterministic simulation. > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>* Amount of changes > >> >>>>>>> > >> >>>>>>>pd-gem5 introduce different modes in etherlink just to provide > >> >>>>accurate > >> >>>>>>>timing for each component in the network subsystem (NIC, link, > >> >>>>switch) > >> >>>>>>as > >> >>>>>>>well as capability of modeling different network topologies > >>(mesh, > >> >>>>>>ring, > >> >>>>>>>fat tree, etc). To enable a simple functionality, like what > >> >>>>multi-gem5 > >> >>>>>>>provides, the amount of changes in gem5 can be limited to > >> >>>>time-stamping > >> >>>>>>>packets and providing synchronization through python scripts. > >> >>>>However, > >> >>>>>>>multi-gem5 re-implements functionalists that are already in gem5. > >> >>>>>> > >> >>>>>>This argument holds only if both implementations are correct > >> >>>>(robust). > >> >>>>>>It > >> >>>>>>still seems to me that pd-gem5 does not provide correctness for > >>the > >> >>>>>>synchronization/checkpointing parts. > >> >>>>>> > >> >>>>>>Again, please read my first comment for correctness of pd-gem5. > >> >>>>> > >> >>>>> > >> >>>>>>> > >> >>>>>>>* Integrating with gem5 mainstream: > >> >>>>>>> > >> >>>>>>>pd-gem5 launch script is written in python which is suited for > >> >>>>>>integration > >> >>>>>>>with gem5 python scripts. However multi-gem5 uses bash script. > >> >>Also, > >> >>>>>>all > >> >>>>>>>source files in pd-gem5 are already parts of gem5 mainstream. > >> >>>>However > >> >>>>>>>multi-gem5 has tcp_server.cc/hh that is a standalone process and > >> >>>>cannot > >> >>>>>>be > >> >>>>>>>part of gem5. > >> >>>>>> > >> >>>>>>The multi-gem5 launch script is simply enough to rely only on the > >> >>>>>>shell. It > >> >>>>>>can obviously be easily re-written in python if that added any > >> >>value. > >> >>>>>>The > >> >>>>>>tcp_server component is only a utility (like the "m5" utility that > >> >>is > >> >>>>>>also > >> >>>>>>part of gem5). > >> >>>>>> > >> >>>>>>The thing is that it's more likely that users want to add some > >> >>>>>functionality to the run-script of multi/pd-gem5. E.g. pd-gem5 > >> >>>>run-script > >> >>>>>supports launching simulations using a simulation pool management > >> >>>>>software ( > >> >>>>>http://research.cs.wisc.edu/htcondor/). Using python enables users > >>to > >> >>>>>easily add these kind of supports. > >> >>>>> > >> >>>>> > >> >>>>>> > >> >>>>>>Cheers, > >> >>>>>>- Gabor > >> >>>>>> > >> >>>>>> > >> >>>>>>>On Fri, Jun 26, 2015 at 8:40 PM, Curtis Dunham > >> >>>><[email protected]> > >> >>>>>>>wrote: > >> >>>>>>> > >> >>>>>>>>Hello everyone, > >> >>>>>>>>We have taken a look at how pd-gem5 compares with multi-gem5. > >> >>>>While > >> >>>>>>>>intending > >> >>>>>>>>to deliver the same functionality, there are some crucial > >> >>>>differences: > >> >>>>>>>> > >> >>>>>>>>* Synchronization. > >> >>>>>>>> > >> >>>>>>>> pd-gem5 implements this in Python (not a problem in itself; > >> >>>>>>>>aesthetically > >> >>>>>>>> this is nice, but...). The issue is that pd-gem5's data > >> >>>>packets > >> >>>>>>and > >> >>>>>>>> barrier messages travel over different sockets. Since > >> >>pd-gem5 > >> >>>>>>could > >> >>>>>>>>see > >> >>>>>>>> data packets passing synchronization barriers, it could > >> >>create > >> >>>>an > >> >>>>>>>> inconsistent checkpoint. > >> >>>>>>>> > >> >>>>>>>> multi-gem5's synchronization is implemented in C++ using > >>sync > >> >>>>>>events, > >> >>>>>>>>but > >> >>>>>>>> more importantly, the messages queue up in the same stream > >> >>and > >> >>>>so > >> >>>>>>>>cannot > >> >>>>>>>> have the issue just described. (Event ordering is often > >> >>>>crucial > >> >>>>>>in > >> >>>>>>>> snapshot protocols.) Therefore we feel that multi-gem5 is a > >> >>>>more > >> >>>>>>>>robust > >> >>>>>>>> solution in this respect. > >> >>>>>>>> > >> >>>>>>>>* Packet handling. > >> >>>>>>>> > >> >>>>>>>> pd-gem5 uses EtherTap for data packets but changed the > >> >>polling > >> >>>>>>>>mechanism > >> >>>>>>>> to go through the main event queue. Since this rate is > >> >>>>actually > >> >>>>>>>>linked > >> >>>>>>>> with simulator progress, it cannot guarantee that the > >>packets > >> >>>>are > >> >>>>>>>>serviced > >> >>>>>>>> at regular intervals of real time. This can lead to packets > >> >>>>>>>>queueing up > >> >>>>>>>> which would contribute to the synchronization issues > >> >>mentioned > >> >>>>>>above. > >> >>>>>>>> > >> >>>>>>>> multi-gem5 uses plain sockets with separate receive threads > >> >>>>and so > >> >>>>>>>>does > >> >>>>>>>>not > >> >>>>>>>> have this issue. > >> >>>>>>>> > >> >>>>>>>>* Checkpoint accuracy. > >> >>>>>>>> > >> >>>>>>>> A user would like to have a checkpoint at precisely the time > >> >>the > >> >>>>>>>> 'm5 checkpoint' operation is executed so as to not miss any > >>of > >> >>>>the > >> >>>>>>>> area of interest in his application. > >> >>>>>>>> > >> >>>>>>>> pd-gem5 requires that simulation finish the current quantum > >> >>>>>>>> before checkpointing, so it cannot provide this. > >> >>>>>>>> > >> >>>>>>>> (Shortening the quantum can help, but usually the snapshot is > >> >>>>being > >> >>>>>>>>taken > >> >>>>>>>> while 'fast-forwarding', i.e. simulating as fast as possible, > >> >>>>which > >> >>>>>>>>would > >> >>>>>>>> motivate a longer quantum.) > >> >>>>>>>> > >> >>>>>>>> multi-gem5 can enter the drain cycle immediately upon > >> >>receiving > >> >>>>a > >> >>>>>>>> checkpoint request. We find this accuracy highly desirable. > >> >>>>>>>> > >> >>>>>>>>* Implementation of network topology. > >> >>>>>>>> > >> >>>>>>>> pd-gem5 uses a separate gem5 process to act as a switch > >> >>whereas > >> >>>>>>>>multi-gem5 > >> >>>>>>>> uses a standalone packet relay process. > >> >>>>>>>> > >> >>>>>>>> We haven't measured the overhead of pd-gem5's simulated > >>switch > >> >>>>yet, > >> >>>>>>>>but > >> >>>>>>>> we're confident that our approach is at least as fast and > >>more > >> >>>>>>>>scalable. > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>>Thanks, > >> >>>>>>>>Curtis > >> >>>>>>>>________________________________________ > >> >>>>>>>>From: gem5-dev [[email protected]] On Behalf Of > Mohammad > >> >>>>>>Alian [ > >> >>>>>>>>[email protected]] > >> >>>>>>>>Sent: Friday, June 26, 2015 7:37 PM > >> >>>>>>>>To: gem5 Developer List > >> >>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a > >> >>parallel/distributed > >> >>>>>>>>system > >> >>>>>>>>on multiple physical hosts > >> >>>>>>>> > >> >>>>>>>>Hi Anthony, > >> >>>>>>>> > >> >>>>>>>>I think that would be a good option, then I can add pd-gem5 > >> >>>>>>>>functionality > >> >>>>>>>>on top of that. Right now I've simplified your implementation. > >> >>>>Also, I > >> >>>>>>>>think I had found some bugs in your patch that I cannot remember > >> >>>>now. > >> >>>>>>If > >> >>>>>>>>you decided to ship EtherSwitch patch, let me know to give you a > >> >>>>>>review > >> >>>>>>>>on > >> >>>>>>>>that. > >> >>>>>>>> > >> >>>>>>>>Thanks, > >> >>>>>>>>Mohammad > >> >>>>>>>> > >> >>>>>>>>On Thu, Jun 25, 2015 at 8:36 PM, Gutierrez, Anthony < > >> >>>>>>>>[email protected]> wrote: > >> >>>>>>>> > >> >>>>>>>>>Would it make sense for me to ship the EtherSwitch patch first, > >> >>>>since > >> >>>>>>>>it > >> >>>>>>>>>has utility on its own, and then we can decide which of the > >> >>>>>>>>"multi-gem5" > >> >>>>>>>>>approaches is best, or if it's some combination of both? > >> >>>>>>>>> > >> >>>>>>>>>The only reason I never shipped it was because Steve raised an > >> >>>>issue > >> >>>>>>>>that > >> >>>>>>>>>I didn't have a good alternative for, and didn't have the time > >> >>to > >> >>>>>>look > >> >>>>>>>>into > >> >>>>>>>>>one at that time. > >> >>>>>>>>>________________________________________ > >> >>>>>>>>>From: gem5-dev [[email protected]] on behalf of > >>Mohammad > >> >>>>>>>>Alian [ > >> >>>>>>>>>[email protected]] > >> >>>>>>>>>Sent: Wednesday, June 24, 2015 12:43 PM > >> >>>>>>>>>To: gem5 Developer List > >> >>>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a > >> >>parallel/distributed > >> >>>>>>>>system > >> >>>>>>>>>on multiple physical hosts > >> >>>>>>>>> > >> >>>>>>>>>Hi Andreas, > >> >>>>>>>>> > >> >>>>>>>>>Thanks for the comment. > >> >>>>>>>>>I think the checkpointing support in both works is the same. > >> >>Here > >> >>>>is > >> >>>>>>>>how > >> >>>>>>>>>checkpointing support is implemented in pd-gem5: > >> >>>>>>>>> > >> >>>>>>>>>Whenever one of gem5 processes encounter an m5-checkpoint > >>pseudo > >> >>>>>>>>>instruction, it will send a ³recv-ckpt² signal to the > >> >>>>>>>>>³barrier² process. Then the ³barrier² process sends a > >> >>³take-ckpt² > >> >>>>>>>>signal > >> >>>>>>>>to > >> >>>>>>>>>all the simulated nodes > >> >>>>>>>>>(including the node that encountered m5-checkpoint) at the end > >> >>of > >> >>>>the > >> >>>>>>>>>current simulation quantum. On the reception of > >> >>>>>>>>>³take-ckpt² signal, gem5 processes start dumping check-points. > >> >>>>This > >> >>>>>>>>makes > >> >>>>>>>>>each simulated node dump a checkpoint > >> >>>>>>>>>at the same simulated time point while ensuring there is no > >> >>>>in-flight > >> >>>>>>>>>packets. > >> >>>>>>>>> > >> >>>>>>>>>I believe this is the same as multi-gem5 patch approach for > >> >>>>>>checkpoint > >> >>>>>>>>>support (based on the commit message of > >> >>>>>>>>http://reviews.gem5.org/r/2865/ > >> >>>>>>>>). > >> >>>>>>>>>Also, we have tested our mechanism with several benchmarks and > >> >>it > >> >>>>>>>>works. > >> >>>>>>>>As > >> >>>>>>>>>Steve suggested, I'll look into Curtis's patch and try to > >>review > >> >>>>it > >> >>>>>>as > >> >>>>>>>>>well. > >> >>>>>>>>>But as Nilay also mentioned earlier, there are some codes > >> >>missing > >> >>>>in > >> >>>>>>>>>Curtis's patch. I prefer to first run multi-gem5 before > >>starting > >> >>>>to > >> >>>>>>>>review > >> >>>>>>>>>it. > >> >>>>>>>>> > >> >>>>>>>>>Thank you, > >> >>>>>>>>>Mohammad > >> >>>>>>>>> > >> >>>>>>>>>On Wed, Jun 24, 2015 at 7:25 AM, Andreas Hansson < > >> >>>>>>>>[email protected]> > >> >>>>>>>>>wrote: > >> >>>>>>>>> > >> >>>>>>>>>>Hi Steve, > >> >>>>>>>>>> > >> >>>>>>>>>>Apologies for the confusion. We are on the same page. My point > >> >>is > >> >>>>>>>>that > >> >>>>>>>>we > >> >>>>>>>>>>cannot simply take a little bit of patch A and a little bit of > >> >>>>>>>>patch B. > >> >>>>>>>>>>This change involves a lot of code, and we need to approach > >> >>this > >> >>>>in > >> >>>>>>>>a > >> >>>>>>>>>>structured fashion. My proposal is to do it bottom up, and > >> >>start > >> >>>>by > >> >>>>>>>>>>getting the basic support in place. Since > >> >>>>>>>>>http://reviews.gem5.org/r/2826/ > >> >>>>>>>>>>has already been on the review board for a few months, I am > >> >>>>merely > >> >>>>>>>>>>suggesting that the it would be a good start to relate the > >> >>newly > >> >>>>>>>>posted > >> >>>>>>>>>>patches to what is already there. > >> >>>>>>>>>> > >> >>>>>>>>>>Andreas > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>On 24/06/2015 13:11, "gem5-dev on behalf of Steve Reinhardt" > >> >>>>>>>>>><[email protected] on behalf of [email protected]> > >> >>wrote: > >> >>>>>>>>>> > >> >>>>>>>>>>>Hi Andreas, > >> >>>>>>>>>>> > >> >>>>>>>>>>>I'm a little confused by your email---you say you're > >> >>>>fundamentally > >> >>>>>>>>>opposed > >> >>>>>>>>>>>to looking at both patches and picking the best features, > >>then > >> >>>>you > >> >>>>>>>>point > >> >>>>>>>>>>>out that the patches Curtis posted have the feature of better > >> >>>>>>>>>>>checkpointing > >> >>>>>>>>>>>support so we should pick that :). > >> >>>>>>>>>>> > >> >>>>>>>>>>>Obviously we can't just pick patch A from Mohammad's set and > >> >>>>patch > >> >>>>>>>>B > >> >>>>>>>>>from > >> >>>>>>>>>>>Curtis's set and expect them to work together, but I think > >> >>that > >> >>>>>>>>having > >> >>>>>>>>>>>both > >> >>>>>>>>>>>sets of patches available and comparing and contrasting the > >> >>two > >> >>>>>>>>>>>implementations should enable us to get to a single > >> >>>>implementation > >> >>>>>>>>>that's > >> >>>>>>>>>>>the best of both. Someone will have to make the effort of > >> >>>>>>>>integrating > >> >>>>>>>>>the > >> >>>>>>>>>>>better ideas from one set into the other set to create a new > >> >>>>>>>>unified > >> >>>>>>>>set > >> >>>>>>>>>>>of > >> >>>>>>>>>>>patches; (or maybe we commit one set and then integrate the > >> >>>>best of > >> >>>>>>>>the > >> >>>>>>>>>>>other set as patches on top of that), but the first step is > >>to > >> >>>>>>>>identify > >> >>>>>>>>>>>what "the best of both" is. Having Mohammad look at Curtis's > >> >>>>>>>>patches, > >> >>>>>>>>>and > >> >>>>>>>>>>>Curtis (or someone else from ARM) closely examine Mohammad's > >> >>>>>>>>patches > >> >>>>>>>>>would > >> >>>>>>>>>>>be a great start. I intend to review them both, though > >> >>>>>>>>unfortunately > >> >>>>>>>>my > >> >>>>>>>>>>>time has been scarce lately---I'm hoping to squeeze that in > >> >>>>later > >> >>>>>>>>this > >> >>>>>>>>>>>week. > >> >>>>>>>>>>> > >> >>>>>>>>>>>Once we've had a few people look at both, we can discuss the > >> >>>>pros > >> >>>>>>>>and > >> >>>>>>>>>cons > >> >>>>>>>>>>>of each, then discuss the strategy for getting the best > >> >>features > >> >>>>>>>>in. > >> >>>>>>>>So > >> >>>>>>>>>>>far I've heard that Mohammad's patches have a better network > >> >>>>model > >> >>>>>>>>but > >> >>>>>>>>>the > >> >>>>>>>>>>>ARM patches have better checkpointing support; that seems > >> >>like a > >> >>>>>>>>good > >> >>>>>>>>>>>start. > >> >>>>>>>>>>> > >> >>>>>>>>>>>Steve > >> >>>>>>>>>>> > >> >>>>>>>>>>>On Wed, Jun 24, 2015 at 12:26 AM Andreas Hansson < > >> >>>>>>>>>[email protected] > >> >>>>>>>>>>> > >> >>>>>>>>>>>wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>>>Hi all, > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>Great work. However, I fundamentally do not believe in the > >> >>>>>>>>approach > >> >>>>>>>>of > >> >>>>>>>>>>>>Œletting reviewers pick the best features¹. There is no way > >> >>we > >> >>>>>>>>would > >> >>>>>>>>>>>>ever > >> >>>>>>>>>>>>get something working out if it. We need to get _one_ > >>working > >> >>>>>>>>solution > >> >>>>>>>>>>>>here, and figure out how to best get there. I would propose > >> >>to > >> >>>>>>>>do it > >> >>>>>>>>>>>>bottom up, starting with the basic multi-simulator instance > >> >>>>>>>>support, > >> >>>>>>>>>>>>checkpointing support, and then move on to the network > >> >>between > >> >>>>>>>>the > >> >>>>>>>>>>>>simulator instances. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>Thus, I propose we go with the low-level plumbing and > >> >>>>checkpoint > >> >>>>>>>>>support > >> >>>>>>>>>>>>from what Curtis has posted. I believe proper checkpointing > >> >>>>>>>>support > >> >>>>>>>>to > >> >>>>>>>>>>>>be > >> >>>>>>>>>>>>the most challenging, and from what I can tell this is far > >> >>more > >> >>>>>>>>>limited > >> >>>>>>>>>>>>in > >> >>>>>>>>>>>>what you just posted Mohammad. Could you perhaps review > >> >>Curtis > >> >>>>>>>>patches > >> >>>>>>>>>>>>based on your insights, and we can try and get these patches > >> >>in > >> >>>>>>>>shape > >> >>>>>>>>>>>>and > >> >>>>>>>>>>>>committed asap. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>Once we have the baseline functionality in place, then we > >>can > >> >>>>>>>>start > >> >>>>>>>>>>>>looking at the more elaborate network models. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>Does this sound reasonable? > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>Thanks, > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>Andreas > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>On 24/06/2015 05:05, "gem5-dev on behalf of Mohammad Alian" > >> >>>>>>>>>>>><[email protected] on behalf of [email protected]> > >> >>wrote: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>>Hello All, > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>I have submitted a chain of patches which enables gem5 to > >> >>>>>>>>simulate > >> >>>>>>>>a > >> >>>>>>>>>>>>>cluster on multiple physical hosts: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2909/ > >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2910/ > >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2912/ > >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2913/ > >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2914/ > >> >>>>>>>><http://reviews.gem5.org/r/2914/> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>and a patch that contains run scripts for a simple > >> >>experiment: > >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2915/ > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>We have run several benchmarks using this infrastructure, > >> >>>>>>>>including > >> >>>>>>>>>NAS > >> >>>>>>>>>>>>>parallel benchmarks (MPI) and DCBench-hadoop > >> >>>>>>>>>>>>>(http://prof.ict.ac.cn/DCBench/), > >> >>>>>>>>>>>>>and would be happy to share scripts/diskimages. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>We call this *pd-gem5*. *pd-gem5 *functionality is more or > >> >>>>less > >> >>>>>>>>the > >> >>>>>>>>>>>>same > >> >>>>>>>>>>>>>as > >> >>>>>>>>>>>>>Curtis's patch for *multi-gem5.* However, I feel *pd-gem5 > >> >>>>>>>>*network > >> >>>>>>>>>>>>model > >> >>>>>>>>>>>>>is > >> >>>>>>>>>>>>>more thorough; it also enables modeling different network > >> >>>>>>>>topologies. > >> >>>>>>>>>>>>>Having both set of changes together let reviewers to pick > >> >>best > >> >>>>>>>>>features > >> >>>>>>>>>>>>>from both works. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>Thank you, > >> >>>>>>>>>>>>>Mohammad Alian > >> >>>>>>>>>>>>>_______________________________________________ > >> >>>>>>>>>>>>>gem5-dev mailing list > >> >>>>>>>>>>>>>[email protected] > >> >>>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >>>>>>>>attachments > >> >>>>>>>>>are > >> >>>>>>>>>>>>confidential and may also be privileged. If you are not the > >> >>>>>>>>intended > >> >>>>>>>>>>>>recipient, please notify the sender immediately and do not > >> >>>>>>>>disclose > >> >>>>>>>>>the > >> >>>>>>>>>>>>contents to any other person, use it for any purpose, or > >> >>store > >> >>>>or > >> >>>>>>>>copy > >> >>>>>>>>>>>>the > >> >>>>>>>>>>>>information in any medium. Thank you. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge > >> >>CB1 > >> >>>>>>>>9NJ, > >> >>>>>>>>>>>>Registered in England & Wales, Company No: 2557590 > >> >>>>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, > >> >>>>Cambridge > >> >>>>>>>>CB1 > >> >>>>>>>>>>>>9NJ, > >> >>>>>>>>>>>>Registered in England & Wales, Company No: 2548782 > >> >>>>>>>>>>>>_______________________________________________ > >> >>>>>>>>>>>>gem5-dev mailing list > >> >>>>>>>>>>>>[email protected] > >> >>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>>>>>> > >> >>>>>>>>>>>_______________________________________________ > >> >>>>>>>>>>>gem5-dev mailing list > >> >>>>>>>>>>>[email protected] > >> >>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >>>>attachments > >> >>>>>>>>are > >> >>>>>>>>>>confidential and may also be privileged. If you are not the > >> >>>>intended > >> >>>>>>>>>>recipient, please notify the sender immediately and do not > >> >>>>disclose > >> >>>>>>>>the > >> >>>>>>>>>>contents to any other person, use it for any purpose, or store > >> >>or > >> >>>>>>>>copy > >> >>>>>>>>>the > >> >>>>>>>>>>information in any medium. Thank you. > >> >>>>>>>>>> > >> >>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge > >>CB1 > >> >>>>9NJ, > >> >>>>>>>>>>Registered in England & Wales, Company No: 2557590 > >> >>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, > >> >>Cambridge > >> >>>>CB1 > >> >>>>>>>>9NJ, > >> >>>>>>>>>>Registered in England & Wales, Company No: 2548782 > >> >>>>>>>>>>_______________________________________________ > >> >>>>>>>>>>gem5-dev mailing list > >> >>>>>>>>>>[email protected] > >> >>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>>>> > >> >>>>>>>>>_______________________________________________ > >> >>>>>>>>>gem5-dev mailing list > >> >>>>>>>>>[email protected] > >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>>>_______________________________________________ > >> >>>>>>>>>gem5-dev mailing list > >> >>>>>>>>>[email protected] > >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>>> > >> >>>>>>>>_______________________________________________ > >> >>>>>>>>gem5-dev mailing list > >> >>>>>>>>[email protected] > >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>> > >> >>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >> >>attachments > >> >>>>>>are > >> >>>>>>>>confidential and may also be privileged. If you are not the > >> >>>>intended > >> >>>>>>>>recipient, please notify the sender immediately and do not > >> >>disclose > >> >>>>>>the > >> >>>>>>>>contents to any other person, use it for any purpose, or store > >>or > >> >>>>copy > >> >>>>>>>>the > >> >>>>>>>>information in any medium. Thank you. > >> >>>>>>>> > >> >>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > >> >>>>9NJ, > >> >>>>>>>>Registered in England & Wales, Company No: 2557590 > >> >>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge > >> >>>>CB1 > >> >>>>>>>>9NJ, > >> >>>>>>>>Registered in England & Wales, Company No: 2548782 > >> >>>>>>>> > >> >>>>>>>>_______________________________________________ > >> >>>>>>>>gem5-dev mailing list > >> >>>>>>>>[email protected] > >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>>>> > >> >>>>>>>_______________________________________________ > >> >>>>>>>gem5-dev mailing list > >> >>>>>>>[email protected] > >> >>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>>-- IMPORTANT NOTICE: The contents of this email and any > >>attachments > >> >>>>are > >> >>>>>>confidential and may also be privileged. If you are not the > >> >>intended > >> >>>>>>recipient, please notify the sender immediately and do not > >>disclose > >> >>>>the > >> >>>>>>contents to any other person, use it for any purpose, or store or > >> >>>>copy > >> >>>>>>the > >> >>>>>>information in any medium. Thank you. > >> >>>>>> > >> >>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > >> >>9NJ, > >> >>>>>>Registered in England & Wales, Company No: 2557590 > >> >>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge > >> >>CB1 > >> >>>>>>9NJ, > >> >>>>>>Registered in England & Wales, Company No: 2548782 > >> >>>>>>_______________________________________________ > >> >>>>>>gem5-dev mailing list > >> >>>>>>[email protected] > >> >>>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>>>> > >> >>>>>_______________________________________________ > >> >>>>>gem5-dev mailing list > >> >>>>>[email protected] > >> >>>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>> > >> >>>> > >> >>>>-- IMPORTANT NOTICE: The contents of this email and any attachments > >> >>are > >> >>>>confidential and may also be privileged. If you are not the intended > >> >>>>recipient, please notify the sender immediately and do not disclose > >> >>the > >> >>>>contents to any other person, use it for any purpose, or store or > >>copy > >> >>>>the > >> >>>>information in any medium. Thank you. > >> >>>> > >> >>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > >> >>>>Registered in England & Wales, Company No: 2557590 > >> >>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > >> >>>>9NJ, > >> >>>>Registered in England & Wales, Company No: 2548782 > >> >>>>_______________________________________________ > >> >>>>gem5-dev mailing list > >> >>>>[email protected] > >> >>>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >>>> > >> >>>_______________________________________________ > >> >>>gem5-dev mailing list > >> >>>[email protected] > >> >>>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> > >> >> > >> >> > >> >> > >> >>-- IMPORTANT NOTICE: The contents of this email and any attachments > >>are > >> >>confidential and may also be privileged. If you are not the intended > >> >>recipient, please notify the sender immediately and do not disclose > >>the > >> >>contents to any other person, use it for any purpose, or store or copy > >> >>the > >> >>information in any medium. Thank you. > >> >> > >> >>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > >> >>Registered in England & Wales, Company No: 2557590 > >> >>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > >> >>9NJ, > >> >>Registered in England & Wales, Company No: 2548782 > >> >>_______________________________________________ > >> >>gem5-dev mailing list > >> >>[email protected] > >> >>http://m5sim.org/mailman/listinfo/gem5-dev > >> >> > >> >_______________________________________________ > >> >gem5-dev mailing list > >> >[email protected] > >> >http://m5sim.org/mailman/listinfo/gem5-dev > >> > >> > >> > >> > >> > >> > >> -- IMPORTANT NOTICE: The contents of this email and any attachments are > >> confidential and may also be privileged. If you are not the intended > >> recipient, please notify the sender immediately and do not disclose the > >> contents to any other person, use it for any purpose, or store or copy > >>the > >> information in any medium. Thank you. > >> > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > >> Registered in England & Wales, Company No: 2557590 > >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > >>9NJ, > >> Registered in England & Wales, Company No: 2548782 > >> _______________________________________________ > >> gem5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/listinfo/gem5-dev > >> > >_______________________________________________ > >gem5-dev mailing list > >[email protected] > >http://m5sim.org/mailman/listinfo/gem5-dev > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
