Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system on multiple physical hosts

Mohammad Alian Tue, 07 Jul 2015 08:12:15 -0700

Then you are assuming taking checkpoint with quantum size smaller than link
latency which contradicts your initial motivation for unsync checkpoint!:
(I copied this sentence from earlier messages in the thread as a reminder)
"Shortening the quantum can help, but usually the snapshot is being taken
while 'fast-forwarding', i.e. simulating as fast as possible, which would
motivate a longer quantum."


What if somebody wants to relax synchronization and take checkpoint?

On Tue, Jul 7, 2015 at 7:38 AM, Gabor Dozsa <[email protected]> wrote:

>
> Hi Mohammad and all,
>
> gem5 processes may restore at a different tick from a checkpoint but the
> next periodic sync will happen at the same tick in all gem5. A receive
> tick of a packet cannot fall into the current quantum so every packet can
> get scheduled for receive properly even if a checkpoint/restore happens
> during a quantum.
>
> Regarding your multi-threaded dual config, my understanding is that
> EtherLink is not prepared to work with multi threading as it lacks thread
> safety. The multiple event queues/threads config only works if the systems
> are independent.
>
> One possible way to fix that is to provide a "multi-thread” based
> implementation for MultiIface ;-)
>
> - Gabor
>
> On 7/7/15, 6:29 AM, "Mohammad Alian" <[email protected]> wrote:
>
> >Gabor- My concern about unsync checkpoint is that when you restore from an
> >unsync checkpoint, you'll have gem5 processes that each is running in
> >different tick. Then how do you handle accurate delivery of packets
> >between
> >these gem5 processes? It will also make it harder to integrate
> >multi/pd-gem5 with current multi-threaded gem5. The problem with sync
> >checkpoint is that you cannot exactly take checkpoint at ROI, but I think
> >unsync checkpoint introduces some other problems. Considering the
> >necessary
> >warmup period before starting stat collection, I think we don't need to
> >exactly pinpoint the ROI. Please correct me if I'm wrong.
> >
> >I'm trying to run a multi-threaded experiment with pd-gem5, but I got an
> >error when I tried to partition dual mode simulation on two threads. I
> >posted that in gem5 users mailing list. Please help me on that if you can.
> >
> >Thank you,
> >Mohammad
> >
> >On Mon, Jul 6, 2015 at 11:45 AM, Gabor Dozsa <[email protected]> wrote:
> >
> >> Thank you Steve for the detailed elaboration on the issues.
> >>
> >>
> >> Regarding the “unsynchronized checkpoints”, the terminology might be a
> >>bit
> >> confusing. In fact, we always need to do a global synchronization among
> >> the gem5 processes before taking a distributed checkpoint (in order to
> >> avoid in-flight packets). The global synchronization here means that
> >>each
> >> gem5 has to suspend the simulation and wait until every in-flight
> >>packets
> >> arrives (and is stored) at the destination gem5 process. If that global
> >> synchronization step happens at the same simulated tick in each gem5
> >>then
> >> the we call the checkpoint “synchronous” otherwise it is an
> >>“asynchronous”
> >> checkpoint.
> >>
> >> In the MPI application example I mentioned before the checkpoint should
> >>be
> >> triggered as soon as the “slowest” MPI process reaches the
> >>MPI_barrier().
> >> The problem is that the “slowest” MPI process usually does not reach the
> >> MPI_barrier() right at the end of the current quantum. If we let the
> >> simulation continue until the quantum completes (to ensure that the
> >> checkpoint is taken at the same simulated tick in each gem5) then the
> >>MPI
> >> processes will complete the MPI_barrier and start executing the ROI code
> >> already.
> >>
> >> Regarding the integration of multi-threaded/multi-host simulation,
> >> multi-gem5 does not support fine grain simulation of hierarchical
> >>switches
> >> (or any other network topologies except a single crossbar) or multiple
> >> synchronization domains currently.
> >>
> >> However, I'm a bit confused about your statement that you don’t see
> >>value
> >> in ever building a shared-memory transport for MultiIface. MultiIface in
> >> my view is just an abstract interface for “multi-(ether)-link" objects
> >> which are link objects for connecting multiple (i.e. more than two)
> >> systems. It aims to encapsulate the API necessary for any Link object
> >> in a any multi-system configuration - provided that we partition the
> >> systems across network links during run time.
> >>
> >> An orthogonal issue is if we want to include a simple crossbar switch
> >> model in a MultiIface implementation or we want to provide a
> >>‘standalone'
> >> fine  grain model for the switch (e.g. the pd-gem5 approach).
> >>
> >> Thanks,
> >> - Gabor
> >>
> >>
> >>
> >> On 7/3/15, 7:33 PM, "Steve Reinhardt" <[email protected]> wrote:
> >>
> >> >Thanks Mohammad & Gabor for the responses.
> >> >
> >> >I think there's still some misunderstanding on what I mean by the
> >> >integration of multi-threaded and multi-host simulation based on
> >>Gabor's
> >> >response above and Andreas's response in the other thread.
> >> >
> >> >The primary example scenario I'm proposing is as Mohammad described:
> >> >within
> >> >each host node, we're simulating an entire rack + top-of-rack switch
> >>in a
> >> >single gem5 process, with separate event queues/threads being used to
> >> >parallelize across nodes within the rack. The switch may or may not be
> >>on
> >> >its own thread as well.  The synchronization among the threads only
> >>needs
> >> >to be at the granularity of the intra-rack network latency.
> >> >
> >> >Now we want to expand this by using pd-gem5 or multi-gem5 to
> >>parallelize
> >> >multiple of these rack-level simulations across hosts, so we can
> >>simulate
> >> >a
> >> >whole row of a datacenter.  Only the uplinks from the TOR switches
> >>would
> >> >need to go over sockets between processes, and the switch being
> >>modeled by
> >> >pd-gem5 or multi-gem5 would be the end-of-row switch. The
> >>synchronization
> >> >delay among the multiple gem5 processes would be based on the
> >>inter-rack
> >> >latency.
> >> >
> >> >So the basic question is: Is this feasible with pd-gem5 / multi-gem5,
> >>and
> >> >if not, how much work would it take to make it so?
> >> >
> >> >However, my larger point is that I still don't see value in ever
> >>building
> >> >a
> >> >shared-memory transport for MultiIface. For this model, there is
> >>clearly
> >> >no
> >> >need for it. Things get more complicated if we want to do something
> >>like
> >> >have N nodes connected to a single switch and split that over two hosts
> >> >(with N/2 nodes simulated on each), but even in that case, I think
> >>it's a
> >> >better idea to make the switch model deal with having half of its links
> >> >internal and half external (since we already want the same model to
> >>work
> >> >in
> >> >both the all-internal and all-external cases). Not that I'm worried
> >>that
> >> >someone is about to go off and build this shared-memory transport, but
> >>I
> >> >think it's important to reach an understanding here, since it's
> >> >fundamental
> >> >to defining the strategic relationship between these capabilities going
> >> >forward.
> >> >
> >> >Stepping back a little further, it would be nice to have a model that
> >>is
> >> >as
> >> >generic as the multi-threading model, where it's really just a matter
> >>of
> >> >taking a simulation, partitioning the components among the threads, and
> >> >setting the synchronization quantum, and it works. Of course, even with
> >> >the
> >> >multi-threaded model, if you don't choose your partitioning and your
> >> >quantum wisely, you're not going to get much speedup or a deterministic
> >> >simulation, but the fundamental implementation is oblivious to that.
> >>I'm
> >> >not saying we really need to go all the way to this extreme---it's
> >>pretty
> >> >reasonable to assume that no one in the near future will want to
> >>partition
> >> >across hosts anywhere other than on a simulated network link---but I
> >>think
> >> >we should keep this ideal in mind as a guiding principle as we choose
> >>how
> >> >to go forward from here.
> >> >
> >> >This ties in to my point #4, which is that if we're really building a
> >> >mechanism to partition a simulation across multiple hosts, then you
> >>should
> >> >be able to run the same simulation in a single gem5 process and get the
> >> >same results. I think this is the strength of pd-gem5; correspondingly
> >>the
> >> >main weakness of multi-gem5 is that it architecturally feels more like
> >> >tying together a set of mostly independent gem5 simulations than like
> >> >partitioning a single gem5 simulation.  (Of course, they both end up at
> >> >roughly the same point in the middle.)
> >> >
> >> >On the flip side, multi-gem5 has some clear advantages in terms of the
> >> >better separation of the communication layer (and I can imagine it
> >>being
> >> >very useful to port to MPI and perhaps some RDMA API for InfiniBand
> >> >clusters). Also I think the integrated sockets for communication and
> >> >syncrhonization are the superior design; while the separate sockets
> >>used
> >> >by
> >> >pd-gem5 may only very rarely cause problems, I agree with Andreas that
> >> >that's not good enough, and I don't see any real advantage either---if
> >>you
> >> >have to flush the data sockets (or wait for them to drain) before
> >> >synchronizing, then you might as well just have the synchronization
> >> >messages queue up behind the data messages.
> >> >
> >> >Regarding unsynchronized checkpoints: Thanks for the example, but I'm
> >> >still
> >> >a little confused. If all the processes are about to execute an
> >> >MPI_Barrier(), doesn't that mean they'll all be synchronized shortly
> >> >anyway? So what's the harm until waiting until they're synchronized and
> >> >then checkpointing?
> >> >
> >> >Regarding the simulation of non-Ethernet networks: I agree that the
> >> >biggest
> >> >obstacle to this is the lack of generality of the current gem5 network
> >> >components. I tried to take a step toward supporting other link types
> >>two
> >> >years ago (see http://reviews.gem5.org/r/1922) but someone shot me
> down
> >> >;).
> >> >We shouldn't try and fix that here, but we should also consciously try
> >>not
> >> >to make it any worse...
> >> >
> >> >Thanks for reading all the way to the end!
> >> >
> >> >Steve
> >> >
> >> >
> >> >On Fri, Jul 3, 2015 at 7:11 AM Gabor Dozsa <[email protected]>
> wrote:
> >> >
> >> >>Hi all,
> >> >>
> >> >>Thank you Steve for the thorough review.
> >> >>
> >> >>First, let me elaborate a bit on Andreas’s 3rd point about
> >> >>non-synchronous
> >> >>checkpoints. Let’s assume that we aim to simulate MPI applications
> >>(HPC
> >> >>workloads). The ROI in an MPI application is typically starts with a
> >> >>global MPI_Barrier() call. We want to take the checkpoint when *every*
> >> >>gem5 process is reached that MPI_Barrier() in the simulated code but
> >> >>that
> >> >>may not happen at the same tick in each gem5 (due to load imbalance
> >> >>among
> >> >>the simulated nodes). That’s why multi-gem5 implements the
> >> >>non-synchronous
> >> >>checkpoint support.
> >> >>
> >> >>My answers to your questions are as follows.
> >> >>
> >> >>1. The only change necessary to use multi-gem5 with a non Ethernet
> >> >>(simulated) network is to replace the Ethernet packet type with
> >>another
> >> >>packet type in MultiIface.
> >> >>In fact, the first implementation of MultiIface was a template
> >> >>that took EthPacketData as parameter because I plan to support
> >>different
> >> >>network types. When I realized that currently only Ethernet is
> >>supported
> >> >>by gem5 I dropped the template param to keep the implementation
> >> >>simpler. I
> >> >>have also realized in the meantime that the right approach would
> >> >>probably
> >> >>be to create a pure virtual ‘base' class for network packets from
> >>which
> >> >>Ethernet (and other types of) packets could be derived. Then
> >>MultiIface
> >> >>could simply use that base class to provide support for different
> >> >>network
> >> >>types. The interface provided by the base packet class could be very
> >> >>simple. Beside the total size() of the packet, multi-gem5 only needs a
> >> >>method to ‘extract' the source/destination address. Those addresses
> >>are
> >> >>used in MultiIface as opaque byte arrays so they are quite network
> >>type
> >> >>agnostic already.
> >> >>
> >> >>2. That’s right, we have designed the MultiIface/TCPIface split with
> >> >>different underlaying messaging systems in mind.
> >> >>
> >> >>3. Multi-gem5 can work together with multi-threaded/multi-event-queue
> >> >>gem5
> >> >>configs. The current TCPIface/tcp_server components would still use
> >> >>sockets to send around the packets. So it is possible to put together
> >>a
> >> >>multi-gem5 simulation where each gem5 process has multiple event
> >>queues
> >> >>(and an independent simulation thread per event queue) but all the
> >> >>simulated Ethernet links would use sockets to forward every Ethernet
> >> >>packet to the tcp_server.
> >> >>
> >> >>If someone wanted to run only a single gem5 process to simulate an
> >> >>entire
> >> >>cluster (using one thread/event-queue per cluster node) then the
> >>current
> >> >>multi-gem5 implementation using sockets/tcp_server is not optimal. In
> >> >>that
> >> >>case,  a better solution would be to provide a shared memory based
> >> >>implementation of the MultiIface virtual communication methods
> >> >>sendRaw()/recvRaw()/syncRaw() (i.e. a shared memory equivalent of
> >> >>TCPIface). In that implementation, the entire discrete tcp_sever
> >> >>component
> >> >>could be replaced with a shared data structure.
> >> >>
> >> >>4. You are right, the current implementation does not make it possible
> >> >>to
> >> >>construct an equivalent single-process simulation model for a
> >>multi-gem5
> >> >>run. However, a possible solution is a shared memory based
> >> >>implementation
> >> >>of the MultiIface virtual communication methods just as I described in
> >> >>the
> >> >>previous paragraph. The same implementation could then work with both
> >> >>multi-threaded/multi-event-queues and single-thread/single-event-queue
> >> >>gem5 configs.
> >> >>
> >> >>Thanks,
> >> >>- Gabor
> >> >>
> >> >>On 7/2/15, 7:20 PM, "Steve Reinhardt" <[email protected]> wrote:
> >> >>
> >> >>>Hi everyone,
> >> >>>
> >> >>>Sorry for taking so long to engage. This is a great development and I
> >> >>>think
> >> >>>both these patches are terrific contributions. Thanks to Mohammad,
> >> >>Gabor,
> >> >>>and everyone else involved.
> >> >>>
> >> >>>I agree with Andreas that we should start with some top-level goals &
> >> >>>assumptions, agree on those, and then we can sort out the detailed
> >> >>issues
> >> >>>based on a consistent view.
> >> >>>
> >> >>>I definitely agree with Andreas's first two points. The third one
> >> >>seems a
> >> >>>little surprising; I'd like to hear more about the motivation before
> >> >>>expressing an opinion. I can see where non-synchronous checkpointing
> >> >>could
> >> >>>be useful, but it's also clear from the associated patch that it's
> >>not
> >> >>>trivial to implement either. How much would be lost by requiring a
> >> >>>synchronization before a checkpoint?
> >> >>>
> >> >>>From my personal perspective, I would like to see whatever we do here
> >> >>be a
> >> >>>first step toward a more general distributed simulation platform.
> >>Both
> >> >>of
> >> >>>these patches seem pretty Ethernet-centric in different ways. This is
> >> >>not
> >> >>>terrible; part of the problem is that gem5's current internal
> >> >>networking
> >> >>>support is already overly Ethernet-centric IMO. But it would be nice
> >>to
> >> >>>avoid baking that in even further. Rather than assume I have
> >>understood
> >> >>>all
> >> >>>the code completely, I'll phrase things in the form of questions, and
> >> >>>people can comment on how those questions would be answered in the
> >> >>context
> >> >>>of the two different approaches.
> >> >>>
> >> >>>1. How much effort would be required to simulate a non-Ethernet
> >> >>network?
> >> >>>My
> >> >>>impression is that pd-gem5 has a leg up here, since a gem5 switch
> >>model
> >> >>>for
> >> >>>a non-Ethernet network (which you'd have to write anyway if you were
> >> >>>simulating a different network) could be used in place of the current
> >> >>>Ethernet switch, where for multi-gem5 I think that the
> >> >>>util/multi//tcp_server.cc code would have to be modified (i.e.,
> >> >>there'd be
> >> >>>additional work above and beyond what you'd need to get the network
> >> >>>modeled
> >> >>>in base gem5).
> >> >>>
> >> >>>2. How much effort is required to run on a non-Ethernet network (or
> >> >>>equivalently using a non-sockets API)?  The MultiIface/TCPIface split
> >> >>in
> >> >>>the multi-gem5 code looks like it addresses this nicely, but pd-gem5
> >> >>seems
> >> >>>pretty tied to an Ethernet host fabric.
> >> >>>
> >> >>>3. Do both of these patches work with the existing multithreaded
> >> >>>multiple-event-queue simulation? I think multi-gem5 does (though it
> >> >>would
> >> >>>be nice to have a confirmation), but it's not clear about pd-gem5. I
> >> >>don't
> >> >>>see a benefit to having multiple gem5 processes on a single host vs.
> >>a
> >> >>>single multithreaded gem5 process using the existing support. I think
> >> >>this
> >> >>>could be particularly valuable with a hierarchical network; e.g.,
> >> >>maybe I
> >> >>>would want to model a rack in multithreaded mode on a single
> >>multicore
> >> >>>server, then use pd-gem5 or multi-gem5 to build up a simulation of
> >> >>>multiple
> >> >>>racks. Would this work out of the box with either of these patches,
> >> >>and if
> >> >>>not, what would need to be done?
> >> >>>
> >> >>>4. Is it possible to construct a single-process simulation model
> >>that's
> >> >>>identical to the distributed simulation? It would be very valuable
> >>for
> >> >>>verification to be able to take a single simulation run and do it
> >>both
> >> >>>within a single process and also across multiple processes and verify
> >> >>that
> >> >>>identical results are achieved. This seems like a big drawback to the
> >> >>>multi-gem5 tcp_server approach, IMO.
> >> >>>
> >> >>>I'm definitely not saying that all these issues need to be resolved
> >> >>before
> >> >>>anything gets committed, but if we can agree that these are valid
> >> >>goals,
> >> >>>then we can evaluate detailed issues based on whether they move us
> >> >>toward
> >> >>>or away from those goals.
> >> >>>
> >> >>>Thanks,
> >> >>>
> >> >>>Steve
> >> >>>
> >> >>>
> >> >>>On Thu, Jul 2, 2015 at 8:34 AM Andreas Hansson
> >> >><[email protected]>
> >> >>>wrote:
> >> >>>
> >> >>>>Hi all,
> >> >>>>
> >> >>>>I think we need to up-level this a bit. From our perspective (and I
> >> >>>>suspect in general):
> >> >>>>
> >> >>>>1. Robustness is important. Having a design that _may_ break,
> >>however
> >> >>>>unlikely is simply not an option.
> >> >>>>
> >> >>>>2. Performance and scaling is important. We can compare actual
> >>numbers
> >> >>>>here, and I am fairly sure the two solutions are on par. Let’s
> >> >>quantify
> >> >>>>that though.
> >> >>>>
> >> >>>>3. Checkpointing must not rely on synchronicity. It is vital for
> >> >>several
> >> >>>>workloads that we can checkpoint the various gem5 instances at
> >> >>different
> >> >>>>Ticks (due to the way the workloads are constructed).
> >> >>>>
> >> >>>>Andreas
> >> >>>>
> >> >>>>On 01/07/2015 21:41, "gem5-dev on behalf of Mohammad Alian"
> >> >>>><[email protected] on behalf of [email protected]> wrote:
> >> >>>>
> >> >>>>>Thanks Gabor for the reply.
> >> >>>>>
> >> >>>>>I feel this conversation is useful as we can find out pros/cons of
> >> >>each
> >> >>>>>design.
> >> >>>>>Please find my response in-lined below.
> >> >>>>>
> >> >>>>>Thank you,
> >> >>>>>Mohammad
> >> >>>>>
> >> >>>>>On Wed, Jul 1, 2015 at 6:44 AM, Gabor Dozsa <[email protected]>
> >> >>>>wrote:
> >> >>>>>
> >> >>>>>>Hi All,
> >> >>>>>>
> >> >>>>>>Sorry for the missing indentation in my previous e-mail! (This was
> >> >>my
> >> >>>>>>first e-mail to the dev-list so I could not simply use “reply").
> >> >>>>Below
> >> >>>>>>is
> >> >>>>>>the same message, hopefully in more readable form.
> >> >>>>>>
> >> >>>>>>====================================
> >> >>>>>>
> >> >>>>>>Hi  All,
> >> >>>>>>
> >> >>>>>>Thank you Mohammad for your elaboration on the issues!
> >> >>>>>>
> >> >>>>>>I have written most of the multi-gem5 patch so let me add some
> >>more
> >> >>>>>>clarifications  and answer to your concerns. My comments are
> >>inline
> >> >>>>>>below.
> >> >>>>>>
> >> >>>>>>Thanks,
> >> >>>>>>- Gabor
> >> >>>>>>
> >> >>>>>>On 6/27/15, 10:20 AM, "Mohammad Alian" <[email protected]> wrote:
> >> >>>>>>
> >> >>>>>>>Hi All,
> >> >>>>>>>
> >> >>>>>>>Curtis-Thank you for listing some of the differences. I was
> >> >>waiting
> >> >>>>for
> >> >>>>>>>the
> >> >>>>>>>completed multi-gem5 patch before I send my review. Please see my
> >> >>>>>>inline
> >> >>>>>>>response below. I¹ve addressed the concerns that you¹ve raised.
> >> >>>>Also,
> >> >>>>>>I¹ve
> >> >>>>>>>added a bit more to the comparison.
> >> >>>>>>>
> >> >>>>>>>-*  Synchronization.
> >> >>>>>>>
> >> >>>>>>>pd-gem5 implements this in Python (not a problem in itself;
> >> >>>>>>aesthetically
> >> >>>>>>>
> >> >>>>>>>this is nice, but...).  The issue is that pd-gem5's data packets
> >> >>and
> >> >>>>>>>
> >> >>>>>>>barrier messages travel over different sockets.  Since pd-gem5
> >> >>could
> >> >>>>>>see
> >> >>>>>>>
> >> >>>>>>>data packets passing synchronization barriers, it could create an
> >> >>>>>>>
> >> >>>>>>>inconsistent checkpoint.
> >> >>>>>>>
> >> >>>>>>>multi-gem5's synchronization is implemented in C++ using sync
> >> >>>>events,
> >> >>>>>>but
> >> >>>>>>>
> >> >>>>>>>more importantly, the messages queue up in the same stream and so
> >> >>>>>>cannot
> >> >>>>>>>
> >> >>>>>>>have the issue just described.  (Event ordering is often crucial
> >> >>in
> >> >>>>>>>
> >> >>>>>>>snapshot protocols.) Therefore we feel that multi-gem5 is a more
> >> >>>>robust
> >> >>>>>>>
> >> >>>>>>>solution in this respect.
> >> >>>>>>>
> >> >>>>>>>Each packet in pd-gem5 has a time-stamp. So even if data packets
> >> >>>>pass
> >> >>>>>>>synchronization barriers (in another word data packets arrive
> >> >>early
> >> >>>>at
> >> >>>>>>the
> >> >>>>>>>destination node), destination node process packets based on
> >>their
> >> >>>>>>>timestamp. Actually allowing data packets to pass sync barriers
> >> >>is a
> >> >>>>>>nice
> >> >>>>>>>feature that can reduce the likelihood of late packet reception.
> >> >>>>>>Ordering
> >> >>>>>>>of data messages that flow over pd-gem5 nodes is also preserved
> >>in
> >> >>>>>>pd-gem5
> >> >>>>>>>implementation.
> >> >>>>>>
> >> >>>>>>This seems to be a misunderstanding. Maybe the wording was not
> >> >>>>precise
> >> >>>>>>before.The problem is not a data packet that “passing" a sync
> >> >>barrier
> >> >>>>>>but the other way around, a sync barrier that can pass a data
> >> >>packet
> >> >>>>>>(e.g. while the data packet is waiting in the host operating
> >>system
> >> >>>>>>socket layer).  If that happens, the packet will arrive later than
> >> >>it
> >> >>>>>>was
> >> >>>>>>supposed to and it may miss the computed receive tick.
> >> >>>>>>
> >> >>>>>>For instance, let’s assume that the quantum coincides with the
> >> >>>>simulated
> >> >>>>>>Ether link delay. (This is the optimal choice of quantum to
> >> >>minimize
> >> >>>>the
> >> >>>>>>number of sync barriers.)  If a data packet is sent right at the
> >> >>>>>>beginning
> >> >>>>>>of a quantum then this packet must arrive at the destination gem5
> >> >>>>>>process
> >> >>>>>>within the same quantum in order not to miss its receive tick at
> >> >>the
> >> >>>>>>very
> >> >>>>>>beginning of the next quantum. If the sync barrier can pass the
> >> >>data
> >> >>>>>>packet
> >> >>>>>>then the data packet may arrive only during the next quantum (or
> >> >>in
> >> >>>>>>extreme conditions even later than that) so when it arrives the
> >> >>>>receiver
> >> >>>>>>gem5 may pass already the receive tick.
> >> >>>>>>
> >> >>>>>>This argument makes more sense than the previous one. Note that
> >> >>gem5
> >> >>>>is
> >> >>>>>>a
> >> >>>>>cycle accurate simulator and it runs orders of magnitude slower
> >>that
> >> >>>>real
> >> >>>>>hardware. So it's almost impossible that the flight time of packet
> >> >>>>through
> >> >>>>>real network turns to be more that simulation time of one quantum.
> >>We
> >> >>>>ran
> >> >>>>>a
> >> >>>>>set of experiments just for this purpose: with quantum size equal
> >>to
> >> >>>>>etherlink delay, we never got any late arrival violation (what you
> >> >>>>>described) for full NAS benchmarks suit (please refer to the
> >>paper).
> >> >>>>>
> >> >>>>>multi-gem5 is optimized for a case that almost never happens! and
> >> >>>>>scarifying speedup for no gain.
> >> >>>>>
> >> >>>>>
> >> >>>>>>Time-stamping does help with this issue. Also, if a data packet is
> >> >>>>>>waiting
> >> >>>>>>in the host operating system socket layer when the simulation
> >> >>thread
> >> >>>>>>exits
> >> >>>>>>to python to complete the next sync barrier  then the packet will
> >> >>>>not go
> >> >>>>>>into the checkpoint that may follow that sync barrier.
> >> >>>>>>
> >> >>>>>>That's a good point. Current pd-gem5 checkpointing mechanism might
> >> >>>>miss
> >> >>>>>packets that have been sent during previous quantum and are waiting
> >> >>in
> >> >>>>OS
> >> >>>>>socket buffer. I should add some code inside ethertap serialization
> >> >>>>>function to drain ethertap socket before writing checkpoint. I will
> >> >>>>update
> >> >>>>>pd-gem5 patch accordingly.
> >> >>>>>
> >> >>>>>>
> >> >>>>>>>What you mentioned as an advantage for multi-gem5 is actually a
> >> >>key
> >> >>>>>>>disadvantage: buffering sync messages behind data packets can add
> >> >>>>up to
> >> >>>>>>>the
> >> >>>>>>>synchronization overhead and slow down simulation significantly.
> >> >>>>>>
> >> >>>>>>The purpose of sync messages is to make sure that the data packets
> >> >>>>>>arrive
> >> >>>>>>in time (in terms of simulated time) at the destination so they
> >>can
> >> >>>>be
> >> >>>>>>scheduled for being received at the proper computed tick.  Sync
> >> >>>>messages
> >> >>>>>>also make sure that no data packets are in flight when a sync
> >> >>barrier
> >> >>>>>>completes before we take a checkpoint.  They definitely add
> >> >>overhead
> >> >>>>for
> >> >>>>>>the simulation but they are necessary for the correctness of the
> >> >>>>>>simulation.
> >> >>>>>>
> >> >>>>>>The receive thread in multi-gem5 reads out packets from the socket
> >> >>in
> >> >>>>>>parallel with the simulation thread so packets normally will not
> >>be
> >> >>>>>>"queueing up” before a sync barrier message.  There is definitely
> >> >>>>room
> >> >>>>>>for improvements in the current implementation for reducing the
> >> >>>>>>synchronization overhead but that is likely true for pd-gem5, too.
> >> >>>>>>The important thing here is that the solution must provide
> >> >>>>correctness
> >> >>>>>>(robustness) first.
> >> >>>>>>
> >> >>>>>>pd-gem5 provides correctness. Please read my previous comment. The
> >> >>>>whole
> >> >>>>>purpose of multi/pd-gem5 is to parallelize simulation with minimal
> >> >>>>>overhead
> >> >>>>>and gain speedup. If you fail to do so, nobody will use your tool.
> >> >>>>>
> >> >>>>>
> >> >>>>>>>Also,
> >> >>>>>>>multi-gem5 send huge sized messages (multiHeaderPkt) through
> >> >>>>network to
> >> >>>>>>>perform each synchronization point, which increases
> >> >>synchronization
> >> >>>>>>>overhead further. In pd-gem5, we choose to send just one
> >>character
> >> >>>>as
> >> >>>>>>sync
> >> >>>>>>>message through a separate socket to reduce synchronization
> >> >>>>overhead.
> >> >>>>>>
> >> >>>>>>The TCP/IP message size is unlikely the bottleneck here.
> >>Multi-gem5
> >> >>>>will
> >> >>>>>>send ~50 bytes more in a sync barrier message than pd-gem5 but
> >>that
> >> >>>>>>bigger
> >> >>>>>>sync message still fits into a single ethernet frame on the wire.
> >> >>The
> >> >>>>>>end-to-end latency overhead that is caused by 50 bytes extra
> >> >>payload
> >> >>>>for
> >> >>>>>>a small single frame TCP/IP message is likely to fall into the
> >> >>>>“noise"
> >> >>>>>>category if one tries to measure it in a real cluster.
> >> >>>>>>
> >> >>>>>>You should prove your hypothesis experimentally. Each gem5 process
> >> >>>>>send/receive sync messages at the end of every quantum. Say you are
> >> >>>>>simulating "N" node computer cluster with "M" different
> >> >>configuration.
> >> >>>>>Then
> >> >>>>>you will have N*M gem5 processes that send/receive these 50 Bytes
> >>(it
> >> >>>>>think
> >> >>>>>it's more) extra data at the same time over network ...
> >> >>>>>
> >> >>>>>Furthermore, multi-gem5 send a header before each data message.
> >> >>>>Comparing
> >> >>>>>with pd-gem5, pd-gem5 just add 12 Bytes (each time-stamp is 12
> >>least
> >> >>>>>significant digits of the Tick) to each data packet. I don't know
> >> >>>>exactly
> >> >>>>>how large are these "MultiHeaderPkt", but it just has two Tick
> >>field
> >> >>>>that
> >> >>>>>each is 64 Bytes! Also, header packets are separate TCP packets, so
> >> >>you
> >> >>>>>pay
> >> >>>>>for sending two separate packets for each data packet. And worst,
> >>you
> >> >>>>>serialize all of these with sync messages.
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>*  Packet handling.
> >> >>>>>>>
> >> >>>>>>>pd-gem5 uses EtherTap for data packets but changed the polling
> >> >>>>>>mechanism
> >> >>>>>>>
> >> >>>>>>>to go through the main event queue.  Since this rate is actually
> >> >>>>linked
> >> >>>>>>>
> >> >>>>>>>with simulator progress, it cannot guarantee that the packets are
> >> >>>>>>>serviced
> >> >>>>>>>
> >> >>>>>>>at regular intervals of real time.  This can lead to packets
> >> >>>>queueing
> >> >>>>>>up
> >> >>>>>>>
> >> >>>>>>>which would contribute to the synchronization issues mentioned
> >> >>>>above.
> >> >>>>>>>
> >> >>>>>>>multi-gem5 uses plain sockets with separate receive threads and
> >>so
> >> >>>>does
> >> >>>>>>>not
> >> >>>>>>>
> >> >>>>>>>have this issue.
> >> >>>>>>>
> >> >>>>>>>I think again you are pointing to your first concern that I¹ve
> >> >>>>>>explained
> >> >>>>>>>above. Packets that have queued up in EtherTap socket, will be
> >> >>>>>>processed
> >> >>>>>>>and delivered to simulation environment at the beginning of next
> >> >>>>>>>simulation
> >> >>>>>>>quantum.
> >> >>>>>>>
> >> >>>>>>>Please notice that multi-gem5 introduces a new simObjects to
> >> >>>>interface
> >> >>>>>>>simulation environment to real world which is redundant. This
> >> >>>>>>>functionality
> >> >>>>>>>is already there by EtherTap.
> >> >>>>>>
> >> >>>>>>Except that the EtherTap solution does not provide a correct
> >> >>(robust)
> >> >>>>>>solution for the synchronization problem.
> >> >>>>>>
> >> >>>>>>Please read my first/second comments.
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>* Checkpoint accuracy.
> >> >>>>>>>
> >> >>>>>>>A user would like to have a checkpoint at precisely the time the
> >> >>>>>>>
> >> >>>>>>>'m5 checkpoint' operation is executed so as to not miss any of
> >>the
> >> >>>>>>>
> >> >>>>>>>area of interest in his application.
> >> >>>>>>>
> >> >>>>>>>pd-gem5 requires that simulation finish the current quantum
> >> >>>>>>>
> >> >>>>>>>before checkpointing, so it cannot provide this.
> >> >>>>>>>
> >> >>>>>>>(Shortening the quantum can help, but usually the snapshot is
> >> >>being
> >> >>>>>>taken
> >> >>>>>>>
> >> >>>>>>>while 'fast-forwarding', i.e. simulating as fast as possible,
> >> >>which
> >> >>>>>>would
> >> >>>>>>>
> >> >>>>>>>motivate a longer quantum.)
> >> >>>>>>>
> >> >>>>>>>multi-gem5 can enter the drain cycle immediately upon receiving a
> >> >>>>>>>
> >> >>>>>>>checkpoint request.  We find this accuracy highly desirable.
> >> >>>>>>>
> >> >>>>>>>It¹s true that if you have a large quantum size then there would
> >> >>be
> >> >>>>>>some
> >> >>>>>>>discrepancy between the m5_ckpt instruction tick and the actual
> >> >>dump
> >> >>>>>>tick.
> >> >>>>>>>Based on multi-gem5 code, my understanding is that you send async
> >> >>>>>>>checkpoint message as soon as one of the gem5 processes encounter
> >> >>>>>>m5_ckpt
> >> >>>>>>>instruction. But I¹m not sure how you fix the aforementioned
> >> >>issue,
> >> >>>>>>>because
> >> >>>>>>>you have to sync all gem5 processes before you start dumping
> >> >>>>>>checkpoint,
> >> >>>>>>>which necessitate a global synchronization beforehand.
> >> >>>>>>
> >> >>>>>>In multi-gem5, the gem5 process who encounters the m5_ckpt
> >> >>>>instruction
> >> >>>>>>sends out an async checkpoint notification for the peer gem5
> >> >>>>processes
> >> >>>>>>and
> >> >>>>>>then it starts the draining immediately (at the same tick).  So
> >>the
> >> >>>>>>checkpoint will be taken at the exact tick form the initiator
> >> >>process
> >> >>>>>>point of view. The global synchronisation with the peer processes
> >> >>>>takes
> >> >>>>>>place while the initiator process is still waiting at the same
> >>tick
> >> >>>>(i.e
> >> >>>>>>the simulation thread is suspended). However,  the receiver thread
> >> >>>>>>Continues reading out the socket - while waiting for the global
> >> >>sync
> >> >>>>to
> >> >>>>>>complete- to make sure that in-flight data packets from peer gem5
> >> >>>>>>processes
> >> >>>>>>are stored properly and saved into the checkpoint.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>So you mean multi-gem5 ends up with having gem5 processes with
> >> >>>>different
> >> >>>>>ticks after checkpoint? In pd-gem5 we make sure that all gem5
> >> >>processes
> >> >>>>>start dumping checkpoint at the same tick. Are you sure that this
> >>is
> >> >>>>>correct to have each gem5 process dump checkpoint at different
> >> >>ticks???
> >> >>>>>
> >> >>>>>I don't think this a correct checkpointing design. However, if you
> >> >>>>feel it
> >> >>>>>is correct, I can change a couple of lines in "Simulation.py" and
> >> >>>>barrier
> >> >>>>>scripts to implement the same functionality in pd-gem5. One thing
> >> >>that
> >> >>>>you
> >> >>>>>are obsessed about is to make sure that there is no in-flight
> >>packets
> >> >>>>>while
> >> >>>>>we start dumping checkpoint, and you have all these complex
> >> >>mechanisms
> >> >>>>in
> >> >>>>>place to insure that! I think you can 99.99999% make sure that
> >>there
> >> >>>>is no
> >> >>>>>in-flight packet by waiting for 1 second after all gem5 processes
> >> >>>>finished
> >> >>>>>their quantum simulation and then dump checkpoint. Do you really
> >> >>think
> >> >>>>>that
> >> >>>>>delivering a tcp packet would take more than 1 second in today's
> >> >>>>systems!?
> >> >>>>>Always go for simple solutions ...
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>By the way, we have a fix for this issue by introducing a new m5
> >> >>>>pseudo
> >> >>>>>>>instruction.
> >> >>>>>>
> >> >>>>>>I fail to see how a new pseudo instruction can solve the problem
> >>of
> >> >>>>>>completing the full quantum in pd-gem5 before a checkpoint can be
> >> >>>>taken.
> >> >>>>>>Could you please elaborate on that?
> >> >>>>>>
> >> >>>>>>As we take checkpoint while fast-forwarding and it is likely that
> >> >>we
> >> >>>>>>relax
> >> >>>>>synchronization for speedup purpose, a new pseudo instruction that
> >> >>can
> >> >>>>set
> >> >>>>>quantum size (m5_qset) can be helpful. So, one can insert m5_qset
> >>in
> >> >>>>his
> >> >>>>>benchmark source code before entering ROI that contains m5_ckpt to
> >> >>>>>decrease
> >> >>>>>quantum size beforehand and reduce the discrepancy between m5_ckpt
> >> >>tick
> >> >>>>>and
> >> >>>>>actual checkpoint tick. This is not included in pd-gem5 patch right
> >> >>>>now.
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>* Implementation of network topology.
> >> >>>>>>>
> >> >>>>>>>pd-gem5 uses a separate gem5 process to act as a switch whereas
> >> >>>>>>multi-gem5
> >> >>>>>>>
> >> >>>>>>>uses a standalone packet relay process.
> >> >>>>>>>
> >> >>>>>>>We haven't measured the overhead of pd-gem5's simulated switch
> >> >>yet,
> >> >>>>but
> >> >>>>>>>
> >> >>>>>>>we're confident that our approach is at least as fast and more
> >> >>>>>>scalable.
> >> >>>>>>>
> >> >>>>>>>There is this flexibility in pd-gem5 to simulate a switch box
> >> >>>>alongside
> >> >>>>>>>one
> >> >>>>>>>of the other gem5 processes. However, it might make that gem5
> >> >>>>process
> >> >>>>>>the
> >> >>>>>>>simulation bottleneck. One of the advantages of pd-gem5 over
> >> >>>>>>multi-gem5 is
> >> >>>>>>>that we use gem5 to simulate a switch box, which allows us to
> >> >>model
> >> >>>>any
> >> >>>>>>>network topology by instantiating several Switch simObjects and
> >> >>>>>>>interconnect them with EhterLink in an arbitrary fashion. A
> >> >>>>standalone
> >> >>>>>>tcp
> >> >>>>>>>server just can provide switch functionality (forwarding packets
> >> >>to
> >> >>>>>>>destinations) and model a star network topology. Furthermore, it
> >> >>>>cannot
> >> >>>>>>>model various network timings such as queueing delay, congestion,
> >> >>>>and
> >> >>>>>>>routing latency. Also it has some accuracy issues that I will
> >> >>point
> >> >>>>out
> >> >>>>>>>next.
> >> >>>>>>
> >> >>>>>>I agree with the complex topology argument. We already mentioned
> >> >>that
> >> >>>>>>before as an advantage for pd-gem5 from the point of view of
> >>future
> >> >>>>>>extensions. However, I do not agree that multi-gem5 cannot model
> >> >>>>>>queueing
> >> >>>>>>delays and congestions. For a simple crossbar switch, it can model
> >> >>>>>>queueing
> >> >>>>>>delays and congestions, but the receive queues are distributed
> >> >>among
> >> >>>>the
> >> >>>>>>gem5 processes.
> >> >>>>>>
> >> >>>>>>It's true that you can model queuing delay of a simple crossbar by
> >> >>>>>distributing queues across gem5 processes (end points). But to be
> >> >>able
> >> >>>>to
> >> >>>>>do so you have to ensure the ordering of packets that you enqueue
> >>in
> >> >>>>the
> >> >>>>>distributed queues. It is almost impossible without a synchronized
> >> >>>>switch
> >> >>>>>box. You should have a reorder queue that reorders packets
> >> >>dynamically
> >> >>>>and
> >> >>>>>updates timing parameter for each packet as well. I don't know how
> >> >>much
> >> >>>>>progress have you had to ensure ordering scheme in multi-gem5 but
> >>you
> >> >>>>may
> >> >>>>>already realized that how complex and error prone it can be. This
> >> >>>>argument
> >> >>>>>is also related to my next argument for "Broken network timing".
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>* Broken network timing:
> >> >>>>>>>
> >> >>>>>>>Forwarding packets between gem5 processes using a standalone tcp
> >> >>>>server
> >> >>>>>>>can
> >> >>>>>>>cause reordering between packets that have different source but
> >> >>same
> >> >>>>>>>destination. It causes  inaccurate network timing and worse of
> >>all
> >> >>>>>>>non-deterministic simulation. pd-gem5 resolve this by reordering
> >> >>>>>>packets
> >> >>>>>>>at
> >> >>>>>>>Switch process and then send them to their destination (it¹s
> >> >>>>possible
> >> >>>>>>as
> >> >>>>>>>switch is synchronized with the rest of the nodes).
> >> >>>>>>
> >> >>>>>>In multi-gem5, there is always a HeaderPkt that contains some meta
> >> >>>>>>information for each data packet. The meta information include the
> >> >>>>send
> >> >>>>>>tick and the sender rank (i.e. a  unique ID of the sender gem5
> >> >>>>process).
> >> >>>>>>We use those information to define a well defined ordering of
> >> >>packets
> >> >>>>>>even
> >> >>>>>>if packets are arriving at the same receiver from different
> >> >>senders.
> >> >>>>>>This
> >> >>>>>>packet ordering scheme is still being tested so the corresponding
> >> >>>>patch
> >> >>>>>>is
> >> >>>>>>not on the RB yet.
> >> >>>>>>
> >> >>>>>>Please read my previous comment. The most important part of
> >> >>>>>>multi/pd-gem5
> >> >>>>>extension is ensuring accurate and deterministic simulation.
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>* Amount of changes
> >> >>>>>>>
> >> >>>>>>>pd-gem5 introduce different modes in etherlink just to provide
> >> >>>>accurate
> >> >>>>>>>timing for each component in the network subsystem (NIC, link,
> >> >>>>switch)
> >> >>>>>>as
> >> >>>>>>>well as capability of modeling different network topologies
> >>(mesh,
> >> >>>>>>ring,
> >> >>>>>>>fat tree, etc). To enable a simple functionality, like what
> >> >>>>multi-gem5
> >> >>>>>>>provides, the amount of changes in gem5 can be limited to
> >> >>>>time-stamping
> >> >>>>>>>packets and providing synchronization through python scripts.
> >> >>>>However,
> >> >>>>>>>multi-gem5 re-implements functionalists that are already in gem5.
> >> >>>>>>
> >> >>>>>>This argument holds only if both implementations are correct
> >> >>>>(robust).
> >> >>>>>>It
> >> >>>>>>still seems to me that pd-gem5 does not provide correctness for
> >>the
> >> >>>>>>synchronization/checkpointing parts.
> >> >>>>>>
> >> >>>>>>Again, please read my first comment for correctness of pd-gem5.
> >> >>>>>
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>>* Integrating with gem5 mainstream:
> >> >>>>>>>
> >> >>>>>>>pd-gem5 launch script is written in python which is suited for
> >> >>>>>>integration
> >> >>>>>>>with gem5 python scripts. However multi-gem5 uses bash script.
> >> >>Also,
> >> >>>>>>all
> >> >>>>>>>source files in pd-gem5 are already parts of gem5 mainstream.
> >> >>>>However
> >> >>>>>>>multi-gem5 has tcp_server.cc/hh that is a standalone process and
> >> >>>>cannot
> >> >>>>>>be
> >> >>>>>>>part of gem5.
> >> >>>>>>
> >> >>>>>>The multi-gem5 launch script is simply enough to rely only on the
> >> >>>>>>shell. It
> >> >>>>>>can obviously be easily re-written in python if that added any
> >> >>value.
> >> >>>>>>The
> >> >>>>>>tcp_server component is only a utility (like the "m5" utility that
> >> >>is
> >> >>>>>>also
> >> >>>>>>part of gem5).
> >> >>>>>>
> >> >>>>>>The thing is that it's more likely that users want to add some
> >> >>>>>functionality to the run-script of multi/pd-gem5. E.g. pd-gem5
> >> >>>>run-script
> >> >>>>>supports launching simulations using a simulation pool management
> >> >>>>>software (
> >> >>>>>http://research.cs.wisc.edu/htcondor/). Using python enables users
> >>to
> >> >>>>>easily add these kind of supports.
> >> >>>>>
> >> >>>>>
> >> >>>>>>
> >> >>>>>>Cheers,
> >> >>>>>>- Gabor
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>>On Fri, Jun 26, 2015 at 8:40 PM, Curtis Dunham
> >> >>>><[email protected]>
> >> >>>>>>>wrote:
> >> >>>>>>>
> >> >>>>>>>>Hello everyone,
> >> >>>>>>>>We have taken a look at how pd-gem5 compares with multi-gem5.
> >> >>>>While
> >> >>>>>>>>intending
> >> >>>>>>>>to deliver the same functionality, there are some crucial
> >> >>>>differences:
> >> >>>>>>>>
> >> >>>>>>>>*  Synchronization.
> >> >>>>>>>>
> >> >>>>>>>>    pd-gem5 implements this in Python (not a problem in itself;
> >> >>>>>>>>aesthetically
> >> >>>>>>>>    this is nice, but...).  The issue is that pd-gem5's data
> >> >>>>packets
> >> >>>>>>and
> >> >>>>>>>>    barrier messages travel over different sockets.  Since
> >> >>pd-gem5
> >> >>>>>>could
> >> >>>>>>>>see
> >> >>>>>>>>    data packets passing synchronization barriers, it could
> >> >>create
> >> >>>>an
> >> >>>>>>>>    inconsistent checkpoint.
> >> >>>>>>>>
> >> >>>>>>>>    multi-gem5's synchronization is implemented in C++ using
> >>sync
> >> >>>>>>events,
> >> >>>>>>>>but
> >> >>>>>>>>    more importantly, the messages queue up in the same stream
> >> >>and
> >> >>>>so
> >> >>>>>>>>cannot
> >> >>>>>>>>    have the issue just described.  (Event ordering is often
> >> >>>>crucial
> >> >>>>>>in
> >> >>>>>>>>    snapshot protocols.) Therefore we feel that multi-gem5 is a
> >> >>>>more
> >> >>>>>>>>robust
> >> >>>>>>>>    solution in this respect.
> >> >>>>>>>>
> >> >>>>>>>>*  Packet handling.
> >> >>>>>>>>
> >> >>>>>>>>    pd-gem5 uses EtherTap for data packets but changed the
> >> >>polling
> >> >>>>>>>>mechanism
> >> >>>>>>>>    to go through the main event queue.  Since this rate is
> >> >>>>actually
> >> >>>>>>>>linked
> >> >>>>>>>>    with simulator progress, it cannot guarantee that the
> >>packets
> >> >>>>are
> >> >>>>>>>>serviced
> >> >>>>>>>>    at regular intervals of real time.  This can lead to packets
> >> >>>>>>>>queueing up
> >> >>>>>>>>    which would contribute to the synchronization issues
> >> >>mentioned
> >> >>>>>>above.
> >> >>>>>>>>
> >> >>>>>>>>    multi-gem5 uses plain sockets with separate receive threads
> >> >>>>and so
> >> >>>>>>>>does
> >> >>>>>>>>not
> >> >>>>>>>>    have this issue.
> >> >>>>>>>>
> >> >>>>>>>>* Checkpoint accuracy.
> >> >>>>>>>>
> >> >>>>>>>>   A user would like to have a checkpoint at precisely the time
> >> >>the
> >> >>>>>>>>   'm5 checkpoint' operation is executed so as to not miss any
> >>of
> >> >>>>the
> >> >>>>>>>>   area of interest in his application.
> >> >>>>>>>>
> >> >>>>>>>>   pd-gem5 requires that simulation finish the current quantum
> >> >>>>>>>>   before checkpointing, so it cannot provide this.
> >> >>>>>>>>
> >> >>>>>>>>   (Shortening the quantum can help, but usually the snapshot is
> >> >>>>being
> >> >>>>>>>>taken
> >> >>>>>>>>   while 'fast-forwarding', i.e. simulating as fast as possible,
> >> >>>>which
> >> >>>>>>>>would
> >> >>>>>>>>   motivate a longer quantum.)
> >> >>>>>>>>
> >> >>>>>>>>   multi-gem5 can enter the drain cycle immediately upon
> >> >>receiving
> >> >>>>a
> >> >>>>>>>>   checkpoint request.  We find this accuracy highly desirable.
> >> >>>>>>>>
> >> >>>>>>>>* Implementation of network topology.
> >> >>>>>>>>
> >> >>>>>>>>   pd-gem5 uses a separate gem5 process to act as a switch
> >> >>whereas
> >> >>>>>>>>multi-gem5
> >> >>>>>>>>   uses a standalone packet relay process.
> >> >>>>>>>>
> >> >>>>>>>>   We haven't measured the overhead of pd-gem5's simulated
> >>switch
> >> >>>>yet,
> >> >>>>>>>>but
> >> >>>>>>>>   we're confident that our approach is at least as fast and
> >>more
> >> >>>>>>>>scalable.
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>Thanks,
> >> >>>>>>>>Curtis
> >> >>>>>>>>________________________________________
> >> >>>>>>>>From: gem5-dev [[email protected]] On Behalf Of
> Mohammad
> >> >>>>>>Alian [
> >> >>>>>>>>[email protected]]
> >> >>>>>>>>Sent: Friday, June 26, 2015 7:37 PM
> >> >>>>>>>>To: gem5 Developer List
> >> >>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a
> >> >>parallel/distributed
> >> >>>>>>>>system
> >> >>>>>>>>on multiple physical hosts
> >> >>>>>>>>
> >> >>>>>>>>Hi Anthony,
> >> >>>>>>>>
> >> >>>>>>>>I think that would be a good option, then I can add pd-gem5
> >> >>>>>>>>functionality
> >> >>>>>>>>on top of that. Right now I've simplified your implementation.
> >> >>>>Also, I
> >> >>>>>>>>think I had found some bugs in your patch that I cannot remember
> >> >>>>now.
> >> >>>>>>If
> >> >>>>>>>>you decided to ship EtherSwitch patch, let me know to give you a
> >> >>>>>>review
> >> >>>>>>>>on
> >> >>>>>>>>that.
> >> >>>>>>>>
> >> >>>>>>>>Thanks,
> >> >>>>>>>>Mohammad
> >> >>>>>>>>
> >> >>>>>>>>On Thu, Jun 25, 2015 at 8:36 PM, Gutierrez, Anthony <
> >> >>>>>>>>[email protected]> wrote:
> >> >>>>>>>>
> >> >>>>>>>>>Would it make sense for me to ship the EtherSwitch patch first,
> >> >>>>since
> >> >>>>>>>>it
> >> >>>>>>>>>has utility on its own, and then we can decide which of the
> >> >>>>>>>>"multi-gem5"
> >> >>>>>>>>>approaches is best, or if it's some combination of both?
> >> >>>>>>>>>
> >> >>>>>>>>>The only reason I never shipped it was because Steve raised an
> >> >>>>issue
> >> >>>>>>>>that
> >> >>>>>>>>>I didn't have a good alternative for, and didn't have the time
> >> >>to
> >> >>>>>>look
> >> >>>>>>>>into
> >> >>>>>>>>>one at that time.
> >> >>>>>>>>>________________________________________
> >> >>>>>>>>>From: gem5-dev [[email protected]] on behalf of
> >>Mohammad
> >> >>>>>>>>Alian [
> >> >>>>>>>>>[email protected]]
> >> >>>>>>>>>Sent: Wednesday, June 24, 2015 12:43 PM
> >> >>>>>>>>>To: gem5 Developer List
> >> >>>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a
> >> >>parallel/distributed
> >> >>>>>>>>system
> >> >>>>>>>>>on multiple physical hosts
> >> >>>>>>>>>
> >> >>>>>>>>>Hi Andreas,
> >> >>>>>>>>>
> >> >>>>>>>>>Thanks for the comment.
> >> >>>>>>>>>I think the checkpointing support in both works is the same.
> >> >>Here
> >> >>>>is
> >> >>>>>>>>how
> >> >>>>>>>>>checkpointing support is implemented in pd-gem5:
> >> >>>>>>>>>
> >> >>>>>>>>>Whenever one of gem5 processes encounter an m5-checkpoint
> >>pseudo
> >> >>>>>>>>>instruction, it will send a ³recv-ckpt² signal to the
> >> >>>>>>>>>³barrier² process. Then the ³barrier² process sends a
> >> >>³take-ckpt²
> >> >>>>>>>>signal
> >> >>>>>>>>to
> >> >>>>>>>>>all the simulated nodes
> >> >>>>>>>>>(including the node that encountered m5-checkpoint) at the end
> >> >>of
> >> >>>>the
> >> >>>>>>>>>current simulation quantum. On the reception of
> >> >>>>>>>>>³take-ckpt² signal, gem5 processes start dumping check-points.
> >> >>>>This
> >> >>>>>>>>makes
> >> >>>>>>>>>each simulated node dump a checkpoint
> >> >>>>>>>>>at the same simulated time point while ensuring there is no
> >> >>>>in-flight
> >> >>>>>>>>>packets.
> >> >>>>>>>>>
> >> >>>>>>>>>I believe this is the same as multi-gem5 patch approach for
> >> >>>>>>checkpoint
> >> >>>>>>>>>support (based on the commit message of
> >> >>>>>>>>http://reviews.gem5.org/r/2865/
> >> >>>>>>>>).
> >> >>>>>>>>>Also, we have tested our mechanism with several benchmarks and
> >> >>it
> >> >>>>>>>>works.
> >> >>>>>>>>As
> >> >>>>>>>>>Steve suggested, I'll look into Curtis's patch and try to
> >>review
> >> >>>>it
> >> >>>>>>as
> >> >>>>>>>>>well.
> >> >>>>>>>>>But as Nilay also mentioned earlier, there are some codes
> >> >>missing
> >> >>>>in
> >> >>>>>>>>>Curtis's patch. I prefer to first run multi-gem5 before
> >>starting
> >> >>>>to
> >> >>>>>>>>review
> >> >>>>>>>>>it.
> >> >>>>>>>>>
> >> >>>>>>>>>Thank you,
> >> >>>>>>>>>Mohammad
> >> >>>>>>>>>
> >> >>>>>>>>>On Wed, Jun 24, 2015 at 7:25 AM, Andreas Hansson <
> >> >>>>>>>>[email protected]>
> >> >>>>>>>>>wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>>Hi Steve,
> >> >>>>>>>>>>
> >> >>>>>>>>>>Apologies for the confusion. We are on the same page. My point
> >> >>is
> >> >>>>>>>>that
> >> >>>>>>>>we
> >> >>>>>>>>>>cannot simply take a little bit of patch A and a little bit of
> >> >>>>>>>>patch B.
> >> >>>>>>>>>>This change involves a lot of code, and we need to approach
> >> >>this
> >> >>>>in
> >> >>>>>>>>a
> >> >>>>>>>>>>structured fashion. My proposal is to do it bottom up, and
> >> >>start
> >> >>>>by
> >> >>>>>>>>>>getting the basic support in place. Since
> >> >>>>>>>>>http://reviews.gem5.org/r/2826/
> >> >>>>>>>>>>has already been on the review board for a few months, I am
> >> >>>>merely
> >> >>>>>>>>>>suggesting that the it would be a good start to relate the
> >> >>newly
> >> >>>>>>>>posted
> >> >>>>>>>>>>patches to what is already there.
> >> >>>>>>>>>>
> >> >>>>>>>>>>Andreas
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>On 24/06/2015 13:11, "gem5-dev on behalf of Steve Reinhardt"
> >> >>>>>>>>>><[email protected] on behalf of [email protected]>
> >> >>wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>>>Hi Andreas,
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>I'm a little confused by your email---you say you're
> >> >>>>fundamentally
> >> >>>>>>>>>opposed
> >> >>>>>>>>>>>to looking at both patches and picking the best features,
> >>then
> >> >>>>you
> >> >>>>>>>>point
> >> >>>>>>>>>>>out that the patches Curtis posted have the feature of better
> >> >>>>>>>>>>>checkpointing
> >> >>>>>>>>>>>support so we should pick that :).
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>Obviously we can't just pick patch A from Mohammad's set and
> >> >>>>patch
> >> >>>>>>>>B
> >> >>>>>>>>>from
> >> >>>>>>>>>>>Curtis's set and expect them to work together, but I think
> >> >>that
> >> >>>>>>>>having
> >> >>>>>>>>>>>both
> >> >>>>>>>>>>>sets of patches available and comparing and contrasting the
> >> >>two
> >> >>>>>>>>>>>implementations should enable us to get to a single
> >> >>>>implementation
> >> >>>>>>>>>that's
> >> >>>>>>>>>>>the best of both. Someone will have to make the effort of
> >> >>>>>>>>integrating
> >> >>>>>>>>>the
> >> >>>>>>>>>>>better ideas from one set into the other set to create a new
> >> >>>>>>>>unified
> >> >>>>>>>>set
> >> >>>>>>>>>>>of
> >> >>>>>>>>>>>patches; (or maybe we commit one set and then integrate the
> >> >>>>best of
> >> >>>>>>>>the
> >> >>>>>>>>>>>other set as patches on top of that), but the first step is
> >>to
> >> >>>>>>>>identify
> >> >>>>>>>>>>>what "the best of both" is.  Having Mohammad look at Curtis's
> >> >>>>>>>>patches,
> >> >>>>>>>>>and
> >> >>>>>>>>>>>Curtis (or someone else from ARM) closely examine Mohammad's
> >> >>>>>>>>patches
> >> >>>>>>>>>would
> >> >>>>>>>>>>>be a great start.  I intend to review them both, though
> >> >>>>>>>>unfortunately
> >> >>>>>>>>my
> >> >>>>>>>>>>>time has been scarce lately---I'm hoping to squeeze that in
> >> >>>>later
> >> >>>>>>>>this
> >> >>>>>>>>>>>week.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>Once we've had a few people look at both, we can discuss the
> >> >>>>pros
> >> >>>>>>>>and
> >> >>>>>>>>>cons
> >> >>>>>>>>>>>of each, then discuss the strategy for getting the best
> >> >>features
> >> >>>>>>>>in.
> >> >>>>>>>>So
> >> >>>>>>>>>>>far I've heard that Mohammad's patches have a better network
> >> >>>>model
> >> >>>>>>>>but
> >> >>>>>>>>>the
> >> >>>>>>>>>>>ARM patches have better checkpointing support; that seems
> >> >>like a
> >> >>>>>>>>good
> >> >>>>>>>>>>>start.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>Steve
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>On Wed, Jun 24, 2015 at 12:26 AM Andreas Hansson <
> >> >>>>>>>>>[email protected]
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>>Hi all,
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>Great work. However, I fundamentally do not believe in the
> >> >>>>>>>>approach
> >> >>>>>>>>of
> >> >>>>>>>>>>>>Œletting reviewers pick the best features¹. There is no way
> >> >>we
> >> >>>>>>>>would
> >> >>>>>>>>>>>>ever
> >> >>>>>>>>>>>>get something working out if it. We need to get _one_
> >>working
> >> >>>>>>>>solution
> >> >>>>>>>>>>>>here, and figure out how to best get there. I would propose
> >> >>to
> >> >>>>>>>>do it
> >> >>>>>>>>>>>>bottom up, starting with the basic multi-simulator instance
> >> >>>>>>>>support,
> >> >>>>>>>>>>>>checkpointing support, and then move on to the network
> >> >>between
> >> >>>>>>>>the
> >> >>>>>>>>>>>>simulator instances.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>Thus, I propose we go with the low-level plumbing and
> >> >>>>checkpoint
> >> >>>>>>>>>support
> >> >>>>>>>>>>>>from what Curtis has posted. I believe proper checkpointing
> >> >>>>>>>>support
> >> >>>>>>>>to
> >> >>>>>>>>>>>>be
> >> >>>>>>>>>>>>the most challenging, and from what I can tell this is far
> >> >>more
> >> >>>>>>>>>limited
> >> >>>>>>>>>>>>in
> >> >>>>>>>>>>>>what you just posted Mohammad. Could you perhaps review
> >> >>Curtis
> >> >>>>>>>>patches
> >> >>>>>>>>>>>>based on your insights, and we can try and get these patches
> >> >>in
> >> >>>>>>>>shape
> >> >>>>>>>>>>>>and
> >> >>>>>>>>>>>>committed asap.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>Once we have the baseline functionality in place, then we
> >>can
> >> >>>>>>>>start
> >> >>>>>>>>>>>>looking at the more elaborate network models.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>Does this sound reasonable?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>Thanks,
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>Andreas
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>On 24/06/2015 05:05, "gem5-dev on behalf of Mohammad Alian"
> >> >>>>>>>>>>>><[email protected] on behalf of [email protected]>
> >> >>wrote:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>>Hello All,
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>I have submitted a chain of patches which enables gem5 to
> >> >>>>>>>>simulate
> >> >>>>>>>>a
> >> >>>>>>>>>>>>>cluster on multiple physical hosts:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2909/
> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2910/
> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2912/
> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2913/
> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2914/
> >> >>>>>>>><http://reviews.gem5.org/r/2914/>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>and a patch that contains run scripts for a simple
> >> >>experiment:
> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2915/
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>We have run several benchmarks using this infrastructure,
> >> >>>>>>>>including
> >> >>>>>>>>>NAS
> >> >>>>>>>>>>>>>parallel benchmarks (MPI) and DCBench-hadoop
> >> >>>>>>>>>>>>>(http://prof.ict.ac.cn/DCBench/),
> >> >>>>>>>>>>>>>and would be happy to share scripts/diskimages.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>We call this *pd-gem5*. *pd-gem5 *functionality is more or
> >> >>>>less
> >> >>>>>>>>the
> >> >>>>>>>>>>>>same
> >> >>>>>>>>>>>>>as
> >> >>>>>>>>>>>>>Curtis's patch for *multi-gem5.* However, I feel *pd-gem5
> >> >>>>>>>>*network
> >> >>>>>>>>>>>>model
> >> >>>>>>>>>>>>>is
> >> >>>>>>>>>>>>>more thorough; it also enables modeling different network
> >> >>>>>>>>topologies.
> >> >>>>>>>>>>>>>Having both set of changes together let reviewers to pick
> >> >>best
> >> >>>>>>>>>features
> >> >>>>>>>>>>>>>from both works.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>Thank you,
> >> >>>>>>>>>>>>>Mohammad Alian
> >> >>>>>>>>>>>>>_______________________________________________
> >> >>>>>>>>>>>>>gem5-dev mailing list
> >> >>>>>>>>>>>>>[email protected]
> >> >>>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >>>>>>>>attachments
> >> >>>>>>>>>are
> >> >>>>>>>>>>>>confidential and may also be privileged. If you are not the
> >> >>>>>>>>intended
> >> >>>>>>>>>>>>recipient, please notify the sender immediately and do not
> >> >>>>>>>>disclose
> >> >>>>>>>>>the
> >> >>>>>>>>>>>>contents to any other person, use it for any purpose, or
> >> >>store
> >> >>>>or
> >> >>>>>>>>copy
> >> >>>>>>>>>>>>the
> >> >>>>>>>>>>>>information in any medium.  Thank you.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge
> >> >>CB1
> >> >>>>>>>>9NJ,
> >> >>>>>>>>>>>>Registered in England & Wales, Company No:  2557590
> >> >>>>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road,
> >> >>>>Cambridge
> >> >>>>>>>>CB1
> >> >>>>>>>>>>>>9NJ,
> >> >>>>>>>>>>>>Registered in England & Wales, Company No:  2548782
> >> >>>>>>>>>>>>_______________________________________________
> >> >>>>>>>>>>>>gem5-dev mailing list
> >> >>>>>>>>>>>>[email protected]
> >> >>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>_______________________________________________
> >> >>>>>>>>>>>gem5-dev mailing list
> >> >>>>>>>>>>>[email protected]
> >> >>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >>>>attachments
> >> >>>>>>>>are
> >> >>>>>>>>>>confidential and may also be privileged. If you are not the
> >> >>>>intended
> >> >>>>>>>>>>recipient, please notify the sender immediately and do not
> >> >>>>disclose
> >> >>>>>>>>the
> >> >>>>>>>>>>contents to any other person, use it for any purpose, or store
> >> >>or
> >> >>>>>>>>copy
> >> >>>>>>>>>the
> >> >>>>>>>>>>information in any medium.  Thank you.
> >> >>>>>>>>>>
> >> >>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge
> >>CB1
> >> >>>>9NJ,
> >> >>>>>>>>>>Registered in England & Wales, Company No:  2557590
> >> >>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road,
> >> >>Cambridge
> >> >>>>CB1
> >> >>>>>>>>9NJ,
> >> >>>>>>>>>>Registered in England & Wales, Company No:  2548782
> >> >>>>>>>>>>_______________________________________________
> >> >>>>>>>>>>gem5-dev mailing list
> >> >>>>>>>>>>[email protected]
> >> >>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>>>
> >> >>>>>>>>>_______________________________________________
> >> >>>>>>>>>gem5-dev mailing list
> >> >>>>>>>>>[email protected]
> >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>>_______________________________________________
> >> >>>>>>>>>gem5-dev mailing list
> >> >>>>>>>>>[email protected]
> >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>>
> >> >>>>>>>>_______________________________________________
> >> >>>>>>>>gem5-dev mailing list
> >> >>>>>>>>[email protected]
> >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>
> >> >>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >>attachments
> >> >>>>>>are
> >> >>>>>>>>confidential and may also be privileged. If you are not the
> >> >>>>intended
> >> >>>>>>>>recipient, please notify the sender immediately and do not
> >> >>disclose
> >> >>>>>>the
> >> >>>>>>>>contents to any other person, use it for any purpose, or store
> >>or
> >> >>>>copy
> >> >>>>>>>>the
> >> >>>>>>>>information in any medium.  Thank you.
> >> >>>>>>>>
> >> >>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >>>>9NJ,
> >> >>>>>>>>Registered in England & Wales, Company No:  2557590
> >> >>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> >> >>>>CB1
> >> >>>>>>>>9NJ,
> >> >>>>>>>>Registered in England & Wales, Company No:  2548782
> >> >>>>>>>>
> >> >>>>>>>>_______________________________________________
> >> >>>>>>>>gem5-dev mailing list
> >> >>>>>>>>[email protected]
> >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>>>
> >> >>>>>>>_______________________________________________
> >> >>>>>>>gem5-dev mailing list
> >> >>>>>>>[email protected]
> >> >>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >>attachments
> >> >>>>are
> >> >>>>>>confidential and may also be privileged. If you are not the
> >> >>intended
> >> >>>>>>recipient, please notify the sender immediately and do not
> >>disclose
> >> >>>>the
> >> >>>>>>contents to any other person, use it for any purpose, or store or
> >> >>>>copy
> >> >>>>>>the
> >> >>>>>>information in any medium.  Thank you.
> >> >>>>>>
> >> >>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >>9NJ,
> >> >>>>>>Registered in England & Wales, Company No:  2557590
> >> >>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> >> >>CB1
> >> >>>>>>9NJ,
> >> >>>>>>Registered in England & Wales, Company No:  2548782
> >> >>>>>>_______________________________________________
> >> >>>>>>gem5-dev mailing list
> >> >>>>>>[email protected]
> >> >>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>>>
> >> >>>>>_______________________________________________
> >> >>>>>gem5-dev mailing list
> >> >>>>>[email protected]
> >> >>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>
> >> >>>>
> >> >>>>-- IMPORTANT NOTICE: The contents of this email and any attachments
> >> >>are
> >> >>>>confidential and may also be privileged. If you are not the intended
> >> >>>>recipient, please notify the sender immediately and do not disclose
> >> >>the
> >> >>>>contents to any other person, use it for any purpose, or store or
> >>copy
> >> >>>>the
> >> >>>>information in any medium.  Thank you.
> >> >>>>
> >> >>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> >>>>Registered in England & Wales, Company No:  2557590
> >> >>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >>>>9NJ,
> >> >>>>Registered in England & Wales, Company No:  2548782
> >> >>>>_______________________________________________
> >> >>>>gem5-dev mailing list
> >> >>>>[email protected]
> >> >>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>>>
> >> >>>_______________________________________________
> >> >>>gem5-dev mailing list
> >> >>>[email protected]
> >> >>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>-- IMPORTANT NOTICE: The contents of this email and any attachments
> >>are
> >> >>confidential and may also be privileged. If you are not the intended
> >> >>recipient, please notify the sender immediately and do not disclose
> >>the
> >> >>contents to any other person, use it for any purpose, or store or copy
> >> >>the
> >> >>information in any medium.  Thank you.
> >> >>
> >> >>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> >>Registered in England & Wales, Company No:  2557590
> >> >>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >>9NJ,
> >> >>Registered in England & Wales, Company No:  2548782
> >> >>_______________________________________________
> >> >>gem5-dev mailing list
> >> >>[email protected]
> >> >>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>
> >> >_______________________________________________
> >> >gem5-dev mailing list
> >> >[email protected]
> >> >http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >>
> >>
> >>
> >>
> >>
> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> >> confidential and may also be privileged. If you are not the intended
> >> recipient, please notify the sender immediately and do not disclose the
> >> contents to any other person, use it for any purpose, or store or copy
> >>the
> >> information in any medium.  Thank you.
> >>
> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> Registered in England & Wales, Company No:  2557590
> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >>9NJ,
> >> Registered in England & Wales, Company No:  2548782
> >> _______________________________________________
> >> gem5-dev mailing list
> >> [email protected]
> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >_______________________________________________
> >gem5-dev mailing list
> >[email protected]
> >http://m5sim.org/mailman/listinfo/gem5-dev
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium.  Thank you.
>
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2548782
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system on multiple physical hosts

Reply via email to