Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system on multiple physical hosts

Mohammad Alian Tue, 07 Jul 2015 10:10:59 -0700

I think you didn't understand my point. I'll explain it with an example.

>> A receive
>> tick of a packet cannot fall into the current quantum so every packet
>>can
>> get scheduled for eceive properly even if a checkpoint/restore happens
>> during a quantum.


This assumption is true when "quantum size <= link_latency". But
link_latency is not fixed, it's a parameter.
Assume you take checkpoint with q=10 and have 3 nodes and take checkpoint
@tick=11 on node0. Then assume this is the tick value of nodes when you
take unsync ckpt: node0:11, node1:20, node2:20. If you restore with quantum
smaller than 10, then your above statement does not hold. So you cannot
restore from a checkpoint with link_latency smaller than the value that you
took checkpoint with!

Mohammad

On Tue, Jul 7, 2015 at 11:05 AM, Gabor Dozsa <gabor.do...@arm.com> wrote:

> Mohammad, I’m not sure what you mean by “taking a checkpoint with quantum
> size smaller than link latency”.
>
> In multi-gem5, thequantum size and the checkpoint is completely
> independent. The quantum is the number of ticks simulated between two
> consecutive periodic sync - that’s why every periodic sync happens at the
> same tick at each gem5 process. A checkpoint can be taken at any point
> within a quantum. After the checkpoint is taken, each gem5 rocess
> completes what remained from the current quantum and then enters the next
> periodic sync.
>
> When fast-forwarding, you can increase link latency to allow larger
> quantum and reduce periodic sync overhead. Does that make sense?
>
> - Gabor
>
> On 7/7/15, 4:11 PM, "Mohammad Alian" <al...@wisc.edu> wrote:
>
> >Then you are assuming taking checkpoint with quantum size smaller than
> >link
> >latency which contradicts your initial motivation for unsync checkpoint!:
> >(I copied this sentence from earlier messages in the thread as a reminder)
> >"Shortening the quantum canhelp, but usually the snapshot is being taken
> >while 'fast-forwarding', i.e. simulating as fast s possible, which would
> >motivate a longer quantum."
> >
> >What if somebody wants t relax synchronization and take checkpoint?
> >
> >On Tue, Jul 7, 2015 at 7:38 AM, Gabor Dozsa <gabor.do...@arm.com> wrote:
> >
> >>
> >> Hi Mohammad and all,
> >>
> >> gem5 processes may restore at a different tick from a checkpoint but the
> >> next periodic sync will hapen at the same tick in all gem5. A receive
> >> tick of a packet cannot fall into the current quantum so every packet
> >>can
> >> get scheduled for eceive properly even if a checkpoint/restore happens
> >> during a quantum.
> >>
> >> Regarding your multi-threaded dual config, my understanding is that
> >> EtherLink is not prepared to work with multi threading as it lacks
> >>thread
> >> safety. The multiple event queues/threads config only works if the
> >>systems
> >> are independent.
> >>
> >> One possible way to fix that is to provide a "multi-thread” based
> >> implementation for MultiIface ;-)
> >>
> >> - Gabor
> >>
> >> On 7/7/15, 6:29 AM, "Mohammad Alian" <al...@wisc.edu> wrote:
> >>
> >> >Gabor- My concern about unsync checkpoint is that when you restore
> >>from an
> >> >unsync checkpoin, you'll have gem5 processes that each is running in
> >> >different tick. Then how do you handle accurate delivery of packets
> >> >between
> >> >these gem5 processes? It willalso make it harder to integrate
> >> >multi/pd-gem5 with current multi-threaded gem5. The problem with sync
> >> >checkpoint is that you cannot exactl take checkpoint at ROI, but I
> >>think
> >> >unsync checkpoint introduces some other problems. Considering the
> >> >necessary
> >> >warmup periodbefore starting stat collection, I think we don't need to
> >> >exactly pinpoint the ROI. Please correct me if I'm wrong.
> >> >
> >> >I'm trying to run a multi-threaded experiment with pd-gem5, but I got
> >>an
> >> >error when I tried to partition dual mode simulation on two threads. I
> >> >posted that in gem5 users mailing list. Please help me on that if you
> >>can.
> >> >
> >> >Thank you,
> >> >Mohammad
> >> >
> >> >On Mon, Jul 6, 2015 at 11:45 AM, Gabor Dozsa <gabor.do...@arm.com>
> >>wrote:
> >> >
> >> >> Thank you Steve for the detaile elaboration on the issues.
> >> >>
> >> >>
> >> >> Regarding the “unsynchronized checkpoints”, the terminology might be
> >>a
> >> >>bit
> >> >> confusing. In fact, w always need to do a global synchronization
> >>among
> >> >> the gem5 processes bfore taking a distributed checkpoint (in order
> >>to
> >> >> avoid in-flight packets). The global synchronization here means that
> >> >>each
> >> >> gem5 has to suspend the simulation and wait until every in-flight
> >> >>packets
> >> >> arrives (and is stored) at the destination gem5 process. If that
> >>global
> >> >> synchronization step happens at the same simulated tick in each gem5
> >> >>then
> >> >> the we call the checkpoint “synchronous” otherwise it is an
> >> >>“asynchronous”
> >> >> checkpoint.
> >> >>
> >> >> In the MPI application example I mentioned before the checkpoint
> >>should
> >> >>be
> >> >> triggered as soon as the “slowest” MPI process reaches the
> >> >>MPI_barrier().
> >> >> The problem is that the “slowest” MPI process usually does not reach
> >>the
> >> >> MPI_barrier() right at the end of the current quantum. If we let the
> >> >> simulation continue until the quantum completes (to ensure that the
> >> >> checkpoint is taken at the same simulated tick n each gem5) then the
> >> >>MPI
> >> >> processes will complete the MPI_barrier and start executing the ROI
> >>code
> >> >> already.
> >> >>
> >> >> Regarding the integration of multi-threaded/multi-host simulation,
> >> >> multi-gem5 does not support fine grainsimulation of hierarchical
> >> >>switches
> >> >> (or any other network topologies except a single crossbar) or
> >>multiple
> >> >> synchronization domains currently.
> >> >>
> >> >> However, I'm a bit confused about your statement that you don’t see
> >> >>value
> >> >> in ever building a shared-memory transport for MultiIface.
> >>MultiIface in
> >> >> my view is just an abstract interface for “multi-(ether)-link"
> >>objects
> >> >> which are link objects for connecting multiple (i.e. more than two)
> >> >> systems. It aims to encapsulate the API necessary for any Link object
> >> >> in a any multi-system configuration - provided that we partition the
> >> >> systems across network links during run time.
> >> >>
> >> >> An orthogonal issue is if we want to include a simple crossbar switch
> >> >> model in a MultiIface implementation or we want to provide a
> >> >>‘standalone'
> >> >> fine  grain model for the switch (e.g. the pd-gem5 approach).
> >> >>
> >> >> Thanks,
> >> >> - Gabor
> >> >>
> >> >>
> >> >>
> >> >> On 7/3/15, 7:33 PM, "Steve Reinhardt" <ste...@gmail.com> wrote:
> >> >>
> >> >> >Thanks Mohammad & Gabor for the responses.
> >> >> >
> >> >> >I think there's still some misunderstanding on what I mean by the
> >> >> >integration of multi-threaded and multi-host simulation based on
> >> >>Gabor's
> >> >> >response above and Andreas's response in the other thread.
> >> >> >
> >> >> >The primary example scenario I'm proposing is as Mohammad described:
> >> >> >within
> >> >> >each host node, we're simulating an entire rack + top-of-rack switch
> >> >>in a
> >> >> >single gem5 process, with separate event queues/threads being used
> >>to
> >> >> >parallelize across nodes within the rack. The switch may or may not
> >>be
> >> >>on
> >> >> >its own thread as well.  The synchronization among the threads only
> >> >>needs
> >> >> >to be at the granularity of the intra-rack network latency.
> >> >> >
> >> >> >Now we want to expand this by using pd-gem5 or multi-gem5 to
> >> >>parallelize
> >> >> >multiple of these rack-level simulations across hosts, so we can
> >> >>simulate
> >> >> >a
> >> >> >whole row of a datacenter.  Only the uplinks from the TOR switches
> >> >>would
> >> >> >need to go over sockets between processes, and the switch being
> >> >>modeled by
> >> >> >pd-gem5 or multi-gem5 would be the end-of-row switch. The
> >> >>synchronization
> >> >> >delay among the multiple gem5 processes would be based on the
> >> >>inter-rack
> >> >> >latency.
> >> >> >
> >> >> >So the basic question is: Is this feasible with pd-gem5 /
> >>multi-gem5,
> >> >>and
> >> >> >if not, how much work would it take to make it so?
> >> >> >
> >> >> >However, my larger point is that I still don't see value in ever
> >> >>building
> >> >> >a
> >> >> >shared-memory transport for MultiIface. For this model, there is
> >> >>clearly
> >> >> >no
> >> >> >need for it. Things get more complicated if we want to do something
> >> >>like
> >> >> >have N nodes connected to a single switch and split that over two
> >>hosts
> >> >> >(with N/2 nodes simulated on each), but even in that case, I think
> >> >>it's a
> >> >> >better idea to make the switch model deal with having half of its
> >>links
> >> >> >internal and half external (since we already want the same model to
> >> >>work
> >> >> >in
> >> >> >both the all-internal and all-external cases). Not that I'm worried
> >> >>that
> >> >> >someone is about to go off and build this shared-memory transport,
> >>but
> >> >>I
> >> >> >think it's important to reach an understanding here, since it's
> >> >> >fundamental
> >> >> >to defining the strategic relationship between these capabilities
> >>going
> >> >> >forward.
> >> >> >
> >> >> >Stepping back a little further, it would be nice to have a model
> >>that
> >> >>is
> >> >> >as
> >> >> >generic as the multi-threading model, where it's really just a
> >>matter
> >> >>of
> >> >> >taking a simulation, partitioning the components among the threads,
> >>and
> >> >> >setting the synchronization quantum, and it works. Of course, even
> >>with
> >> >> >the
> >> >> >multi-threaded model, if you don't choose your partitioning and your
> >> >> >quantum wisely, you're not going to get much speedup or a
> >>deterministic
> >> >> >simulation, but the fundamental implementation is oblivious to that.
> >> >>I'm
> >> >> >not saying we really need to go all the way to this extreme---it's
> >> >>pretty
> >> >> >reasonable to assume that no one in the near future will want to
> >> >>partition
> >> >> >across hosts anywhere other than on a simulated network link---but I
> >> >>think
> >> >> >we should keep this ideal in mind as a guiding principle as we
> >>choose
> >> >>how
> >> >> >to go forward from here.
> >> >> >
> >> >> >This ties in to my point #4, which is that if we're really building
> >>a
> >> >> >mechanism to partition a simulation across multiple hosts, then you
> >> >>should
> >> >> >be able to run the same simulation in a single gem5 process and get
> >>the
> >> >> >same results. I think this is the strength of pd-gem5;
> >>correspondingly
> >> >>the
> >> >> >main weakness of multi-gem5 is that it architecturally feels more
> >>like
> >> >> >tying together a set of mostly independent gem5 simulations than
> >>like
> >> >> >partitioning a single gem5 simulation.  (Of course, they both end
> >>up at
> >> >> >roughly the same point in the middle.)
> >> >> >
> >> >> >On the flip side, multi-gem5 has some clear advantages in terms of
> >>the
> >> >> >better separation of the communication layer (and I can imagine it
> >> >>being
> >> >> >very useful to port to MPI and perhaps some RDMA API for InfiniBand
> >> >> >clusters). Also I think the integrated sockets for communication and
> >> >> >syncrhonization are the superior design; while the separate sockets
> >> >>used
> >> >> >by
> >> >> >pd-gem5 may only very rarely cause problems, I agree with Andreas
> >>that
> >> >> >that's not good enough, and I don't see any real advantage
> >>either---if
> >> >>you
> >> >> >have to flush the data sockets (or wait for them to drain) before
> >> >> >synchronizing, then you might as well just have the synchronization
> >> >> >messages queue up behind the data messages.
> >> >> >
> >> >> >Regarding unsynchronized checkpoints: Thanks for the example, but
> >>I'm
> >> >> >still
> >> >> >a little confused. If all the processes are about to execute an
> >> >> >MPI_Barrier(), doesn't that mean they'll all be synchronized shortly
> >> >> >anyway? So what's the harm until waiting until they're synchronized
> >>and
> >> >> >then checkpointing?
> >> >> >
> >> >> >Regarding the simulation of non-Ethernet networks: I agree that the
> >> >> >biggest
> >> >> >obstacle to this is the lack of generality of the current gem5
> >>network
> >> >> >components. I tried to take a step toward supporting other link
> >>types
> >> >>two
> >> >> >years ago (see http://reviews.gem5.org/r/1922) but someone shot me
> >> down
> >> >> >;).
> >> >> >We shouldn't try and fix that here, but we should also consciously
> >>try
> >> >>not
> >> >> >to make it any worse...
> >> >> >
> >> >> >Thanks for reading all the way to the end!
> >> >> >
> >> >> >Steve
> >> >> >
> >> >> >
> >> >> >On Fri, Jul 3, 2015 at 7:11 AM Gabor Dozsa <gabor.do...@arm.com>
> >> wrote:
> >> >> >
> >> >> >>Hi all,
> >> >> >>
> >> >> >>Thank you Steve for the thorough review.
> >> >> >>
> >> >> >>First, let me elaborate a bit on Andreas’s 3rd point about
> >> >> >>non-synchronous
> >> >> >>checkpoints. Let’s assume that we aim to simulate MPI applications
> >> >>(HPC
> >> >> >>workloads). The ROI in an MPI application is typically starts with
> >>a
> >> >> >>global MPI_Barrier() call. We want to take the checkpoint when
> >>*every*
> >> >> >>gem5 process is reached that MPI_Barrier() in the simulated code
> >>but
> >> >> >>that
> >> >> >>may not happen at the same tick in each gem5 (due to load imbalance
> >> >> >>among
> >> >> >>the simulated nodes). That’s why multi-gem5 implements the
> >> >> >>non-synchronous
> >> >> >>checkpoint support.
> >> >> >>
> >> >> >>My answers to your questions are as follows.
> >> >> >>
> >> >> >>1. The only change necessary to use multi-gem5 with a non Ethernet
> >> >> >>(simulated) network is to replace the Ethernet packet type with
> >> >>another
> >> >> >>packet type in MultiIface.
> >> >> >>In fact, the first implementation of MultiIface was a template
> >> >> >>that took EthPacketData as parameter because I plan to support
> >> >>different
> >> >> >>network types. When I realized that currently only Ethernet is
> >> >>supported
> >> >> >>by gem5 I dropped the template param to keep the implementation
> >> >> >>simpler. I
> >> >> >>have also realized in the meantime that the right approach would
> >> >> >>probably
> >> >> >>be to create a pure virtual ‘base' class for network packets from
> >> >>which
> >> >> >>Ethernet (and other types of) packets could be derived. Then
> >> >>MultiIface
> >> >> >>could simply use that base class to provide support for different
> >> >> >>network
> >> >> >>types. The interface provided by the base packet class could be
> >>very
> >> >> >>simple. Beside the total size() of the packet, multi-gem5 only
> >>needs a
> >> >> >>method to ‘extract' the source/destination address. Those addresses
> >> >>are
> >> >> >>used in MultiIface as opaque byte arrays so they are quite network
> >> >>type
> >> >> >>agnostic already.
> >> >> >>
> >> >> >>2. That’s right, we have designed the MultiIface/TCPIface split
> >>with
> >> >> >>different underlaying messaging systems in mind.
> >> >> >>
> >> >> >>3. Multi-gem5 can work together with
> >>multi-threaded/multi-event-queue
> >> >> >>gem5
> >> >> >>configs. The current TCPIface/tcp_server components would still use
> >> >> >>sockets to send around the packets. So it is possible to put
> >>together
> >> >>a
> >> >> >>multi-gem5 simulation where each gem5 process has multiple event
> >> >>queues
> >> >> >>(and an independent simulation thread per event queue) but all the
> >> >> >>simulated Ethernet links would use sockets to forward every
> >>Ethernet
> >> >> >>packet to the tcp_server.
> >> >> >>
> >> >> >>If someone wanted to run only a single gem5 process to simulate an
> >> >> >>entire
> >> >> >>cluster (using one thread/event-queue per cluster node) then the
> >> >>current
> >> >> >>multi-gem5 implementation using sockets/tcp_server is not optimal.
> >>In
> >> >> >>that
> >> >> >>case,  a better solution would be to provide a shared memory based
> >> >> >>implementation of the MultiIface virtual communication methods
> >> >> >>sendRaw()/recvRaw()/syncRaw() (i.e. a shared memory equivalent of
> >> >> >>TCPIface). In that implementation, the entire discrete tcp_sever
> >> >> >>component
> >> >> >>could be replaced with a shared data structure.
> >> >> >>
> >> >> >>4. You are right, the current implementation does not make it
> >>possible
> >> >> >>to
> >> >> >>construct an equivalent single-process simulation model for a
> >> >>multi-gem5
> >> >> >>run. However, a possible solution is a shared memory based
> >> >> >>implementation
> >> >> >>of the MultiIface virtual communication methods just as I
> >>described in
> >> >> >>the
> >> >> >>previous paragraph. The same implementation could then work with
> >>both
> >> >> >>multi-threaded/multi-event-queues and
> >>single-thread/single-event-queue
> >> >> >>gem5 configs.
> >> >> >>
> >> >> >>Thanks,
> >> >> >>- Gabor
> >> >> >>
> >> >> >>On 7/2/15, 7:20 PM, "Steve Reinhardt" <ste...@gmail.com> wrote:
> >> >> >>
> >> >> >>>Hi everyone,
> >> >> >>>
> >> >> >>>Sorry for taking so long to engage. This is a great development
> >>and I
> >> >> >>>think
> >> >> >>>both these patches are terrific contributions. Thanks to Mohammad,
> >> >> >>Gabor,
> >> >> >>>and everyone else involved.
> >> >> >>>
> >> >> >>>I agree with Andreas that we should start with some top-level
> >>goals &
> >> >> >>>assumptions, agree on those, and then we can sort out the detailed
> >> >> >>issues
> >> >> >>>based on a consistent view.
> >> >> >>>
> >> >> >>>I definitely agree with Andreas's first two points. The third one
> >> >> >>seems a
> >> >> >>>little surprising; I'd like to hear more about the motivation
> >>before
> >> >> >>>expressing an opinion. I can see where non-synchronous
> >>checkpointing
> >> >> >>could
> >> >> >>>be useful, but it's also clear from the associated patch that it's
> >> >>not
> >> >> >>>trivial to implement either. How much would be lost by requiring a
> >> >> >>>synchronization before a checkpoint?
> >> >> >>>
> >> >> >>>From my personal perspective, I would like to see whatever we do
> >>here
> >> >> >>be a
> >> >> >>>first step toward a more general distributed simulation platform.
> >> >>Both
> >> >> >>of
> >> >> >>>these patches seem pretty Ethernet-centric in different ways.
> >>This is
> >> >> >>not
> >> >> >>>terrible; part of the problem is that gem5's current internal
> >> >> >>networking
> >> >> >>>support is already overly Ethernet-centric IMO. But it would be
> >>nice
> >> >>to
> >> >> >>>avoid baking that in even further. Rather than assume I have
> >> >>understood
> >> >> >>>all
> >> >> >>>the code completely, I'll phrase things in the form of questions,
> >>and
> >> >> >>>people can comment on how those questions would be answered in the
> >> >> >>context
> >> >> >>>of the two different approaches.
> >> >> >>>
> >> >> >>>1. How much effort would be required to simulate a non-Ethernet
> >> >> >>network?
> >> >> >>>My
> >> >> >>>impression is that pd-gem5 has a leg up here, since a gem5 switch
> >> >>model
> >> >> >>>for
> >> >> >>>a non-Ethernet network (which you'd have to write anyway if you
> >>were
> >> >> >>>simulating a different network) could be used in place of the
> >>current
> >> >> >>>Ethernet switch, where for multi-gem5 I think that the
> >> >> >>>util/multi//tcp_server.cc code would have to be modified (i.e.,
> >> >> >>there'd be
> >> >> >>>additional work above and beyond what you'd need to get the
> >>network
> >> >> >>>modeled
> >> >> >>>in base gem5).
> >> >> >>>
> >> >> >>>2. How much effort is required to run on a non-Ethernet network
> >>(or
> >> >> >>>equivalently using a non-sockets API)?  The MultiIface/TCPIface
> >>split
> >> >> >>in
> >> >> >>>the multi-gem5 code looks like it addresses this nicely, but
> >>pd-gem5
> >> >> >>seems
> >> >> >>>pretty tied to an Ethernet host fabric.
> >> >> >>>
> >> >> >>>3. Do both of these patches work with the existing multithreaded
> >> >> >>>multiple-event-queue simulation? I think multi-gem5 does (though
> >>it
> >> >> >>would
> >> >> >>>be nice to have a confirmation), but it's not clear about
> >>pd-gem5. I
> >> >> >>don't
> >> >> >>>see a benefit to having multiple gem5 processes on a single host
> >>vs.
> >> >>a
> >> >> >>>single multithreaded gem5 process using the existing support. I
> >>think
> >> >> >>this
> >> >> >>>could be particularly valuable with a hierarchical network; e.g.,
> >> >> >>maybe I
> >> >> >>>would want to model a rack in multithreaded mode on a single
> >> >>multicore
> >> >> >>>server, then use pd-gem5 or multi-gem5 to build up a simulation of
> >> >> >>>multiple
> >> >> >>>racks. Would this work out of the box with either of these
> >>patches,
> >> >> >>and if
> >> >> >>>not, what would need to be done?
> >> >> >>>
> >> >> >>>4. Is it possible to construct a single-process simulation model
> >> >>that's
> >> >> >>>identical to the distributed simulation? It would be very valuable
> >> >>for
> >> >> >>>verification to be able to take a single simulation run and do it
> >> >>both
> >> >> >>>within a single process and also across multiple processes and
> >>verify
> >> >> >>that
> >> >> >>>identical results are achieved. This seems like a big drawback to
> >>the
> >> >> >>>multi-gem5 tcp_server approach, IMO.
> >> >> >>>
> >> >> >>>I'm definitely not saying that all these issues need to be
> >>resolved
> >> >> >>before
> >> >> >>>anything gets committed, but if we can agree that these are valid
> >> >> >>goals,
> >> >> >>>then we can evaluate detailed issues based on whether they move us
> >> >> >>toward
> >> >> >>>or away from those goals.
> >> >> >>>
> >> >> >>>Thanks,
> >> >> >>>
> >> >> >>>Steve
> >> >> >>>
> >> >> >>>
> >> >> >>>On Thu, Jul 2, 2015 at 8:34 AM Andreas Hansson
> >> >> >><andreas.hans...@arm.com>
> >> >> >>>wrote:
> >> >> >>>
> >> >> >>>>Hi all,
> >> >> >>>>
> >> >> >>>>I think we need to up-level this a bit. From our perspective
> >>(and I
> >> >> >>>>suspect in general):
> >> >> >>>>
> >> >> >>>>1. Robustness is important. Having a design that _may_ break,
> >> >>however
> >> >> >>>>unlikely is simply not an option.
> >> >> >>>>
> >> >> >>>>2. Performance and scaling is important. We can compare actual
> >> >>numbers
> >> >> >>>>here, and I am fairly sure the two solutions are on par. Let’s
> >> >> >>quantify
> >> >> >>>>that though.
> >> >> >>>>
> >> >> >>>>3. Checkpointing must not rely on synchronicity. It is vital for
> >> >> >>several
> >> >> >>>>workloads that we can checkpoint the various gem5 instances at
> >> >> >>different
> >> >> >>>>Ticks (due to the way the workloads are constructed).
> >> >> >>>>
> >> >> >>>>Andreas
> >> >> >>>>
> >> >> >>>>On 01/07/2015 21:41, "gem5-dev on behalf of Mohammad Alian"
> >> >> >>>><gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu> wrote:
> >> >> >>>>
> >> >> >>>>>Thanks Gabor for the reply.
> >> >> >>>>>
> >> >> >>>>>I feel this conversation is useful as we can find out pros/cons
> >>of
> >> >> >>each
> >> >> >>>>>design.
> >> >> >>>>>Please find my response in-lined below.
> >> >> >>>>>
> >> >> >>>>>Thank you,
> >> >> >>>>>Mohammad
> >> >> >>>>>
> >> >> >>>>>On Wed, Jul 1, 2015 at 6:44 AM, Gabor Dozsa
> >><gabor.do...@arm.com>
> >> >> >>>>wrote:
> >> >> >>>>>
> >> >> >>>>>>Hi All,
> >> >> >>>>>>
> >> >> >>>>>>Sorry for the missing indentation in my previous e-mail! (This
> >>was
> >> >> >>my
> >> >> >>>>>>first e-mail to the dev-list so I could not simply use
> >>“reply").
> >> >> >>>>Below
> >> >> >>>>>>is
> >> >> >>>>>>the same message, hopefully in more readable form.
> >> >> >>>>>>
> >> >> >>>>>>====================================
> >> >> >>>>>>
> >> >> >>>>>>Hi  All,
> >> >> >>>>>>
> >> >> >>>>>>Thank you Mohammad for your elaboration on the issues!
> >> >> >>>>>>
> >> >> >>>>>>I have written most of the multi-gem5 patch so let me add some
> >> >>more
> >> >> >>>>>>clarifications  and answer to your concerns. My comments are
> >> >>inline
> >> >> >>>>>>below.
> >> >> >>>>>>
> >> >> >>>>>>Thanks,
> >> >> >>>>>>- Gabor
> >> >> >>>>>>
> >> >> >>>>>>On 6/27/15, 10:20 AM, "Mohammad Alian" <al...@wisc.edu> wrote:
> >> >> >>>>>>
> >> >> >>>>>>>Hi All,
> >> >> >>>>>>>
> >> >> >>>>>>>Curtis-Thank you for listing some of the differences. I was
> >> >> >>waiting
> >> >> >>>>for
> >> >> >>>>>>>the
> >> >> >>>>>>>completed multi-gem5 patch before I send my review. Please
> >>see my
> >> >> >>>>>>inline
> >> >> >>>>>>>response below. I¹ve addressed the concerns that you¹ve
> >>raised.
> >> >> >>>>Also,
> >> >> >>>>>>I¹ve
> >> >> >>>>>>>added a bit more to the comparison.
> >> >> >>>>>>>
> >> >> >>>>>>>-*  Synchronization.
> >> >> >>>>>>>
> >> >> >>>>>>>pd-gem5 implements this in Python (not a problem in itself;
> >> >> >>>>>>aesthetically
> >> >> >>>>>>>
> >> >> >>>>>>>this is nice, but...).  The issue is that pd-gem5's data
> >>packets
> >> >> >>and
> >> >> >>>>>>>
> >> >> >>>>>>>barrier messages travel over different sockets.  Since pd-gem5
> >> >> >>could
> >> >> >>>>>>see
> >> >> >>>>>>>
> >> >> >>>>>>>data packets passing synchronization barriers, it could
> >>create an
> >> >> >>>>>>>
> >> >> >>>>>>>inconsistent checkpoint.
> >> >> >>>>>>>
> >> >> >>>>>>>multi-gem5's synchronization is implemented in C++ using sync
> >> >> >>>>events,
> >> >> >>>>>>but
> >> >> >>>>>>>
> >> >> >>>>>>>more importantly, the messages queue up in the same stream
> >>and so
> >> >> >>>>>>cannot
> >> >> >>>>>>>
> >> >> >>>>>>>have the issue just described.  (Event ordering is often
> >>crucial
> >> >> >>in
> >> >> >>>>>>>
> >> >> >>>>>>>snapshot protocols.) Therefore we feel that multi-gem5 is a
> >>more
> >> >> >>>>robust
> >> >> >>>>>>>
> >> >> >>>>>>>solution in this respect.
> >> >> >>>>>>>
> >> >> >>>>>>>Each packet in pd-gem5 has a time-stamp. So even if data
> >>packets
> >> >> >>>>pass
> >> >> >>>>>>>synchronization barriers (in another word data packets arrive
> >> >> >>early
> >> >> >>>>at
> >> >> >>>>>>the
> >> >> >>>>>>>destination node), destination node process packets based on
> >> >>their
> >> >> >>>>>>>timestamp. Actually allowing data packets to pass sync
> >>barriers
> >> >> >>is a
> >> >> >>>>>>nice
> >> >> >>>>>>>feature that can reduce the likelihood of late packet
> >>reception.
> >> >> >>>>>>Ordering
> >> >> >>>>>>>of data messages that flow over pd-gem5 nodes is also
> >>preserved
> >> >>in
> >> >> >>>>>>pd-gem5
> >> >> >>>>>>>implementation.
> >> >> >>>>>>
> >> >> >>>>>>This seems to be a misunderstanding. Maybe the wording was not
> >> >> >>>>precise
> >> >> >>>>>>before.The problem is not a data packet that “passing" a sync
> >> >> >>barrier
> >> >> >>>>>>but the other way around, a sync barrier that can pass a data
> >> >> >>packet
> >> >> >>>>>>(e.g. while the data packet is waiting in the host operating
> >> >>system
> >> >> >>>>>>socket layer).  If that happens, the packet will arrive later
> >>than
> >> >> >>it
> >> >> >>>>>>was
> >> >> >>>>>>supposed to and it may miss the computed receive tick.
> >> >> >>>>>>
> >> >> >>>>>>For instance, let’s assume that the quantum coincides with the
> >> >> >>>>simulated
> >> >> >>>>>>Ether link delay. (This is the optimal choice of quantum to
> >> >> >>minimize
> >> >> >>>>the
> >> >> >>>>>>number of sync barriers.)  If a data packet is sent right at
> >>the
> >> >> >>>>>>beginning
> >> >> >>>>>>of a quantum then this packet must arrive at the destination
> >>gem5
> >> >> >>>>>>process
> >> >> >>>>>>within the same quantum in order not to miss its receive tick
> >>at
> >> >> >>the
> >> >> >>>>>>very
> >> >> >>>>>>beginning of the next quantum. If the sync barrier can pass the
> >> >> >>data
> >> >> >>>>>>packet
> >> >> >>>>>>then the data packet may arrive only during the next quantum
> >>(or
> >> >> >>in
> >> >> >>>>>>extreme conditions even later than that) so when it arrives the
> >> >> >>>>receiver
> >> >> >>>>>>gem5 may pass already the receive tick.
> >> >> >>>>>>
> >> >> >>>>>>This argument makes more sense than the previous one. Note that
> >> >> >>gem5
> >> >> >>>>is
> >> >> >>>>>>a
> >> >> >>>>>cycle accurate simulator and it runs orders of magnitude slower
> >> >>that
> >> >> >>>>real
> >> >> >>>>>hardware. So it's almost impossible that the flight time of
> >>packet
> >> >> >>>>through
> >> >> >>>>>real network turns to be more that simulation time of one
> >>quantum.
> >> >>We
> >> >> >>>>ran
> >> >> >>>>>a
> >> >> >>>>>set of experiments just for this purpose: with quantum size
> >>equal
> >> >>to
> >> >> >>>>>etherlink delay, we never got any late arrival violation (what
> >>you
> >> >> >>>>>described) for full NAS benchmarks suit (please refer to the
> >> >>paper).
> >> >> >>>>>
> >> >> >>>>>multi-gem5 is optimized for a case that almost never happens!
> >>and
> >> >> >>>>>scarifying speedup for no gain.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>Time-stamping does help with this issue. Also, if a data
> >>packet is
> >> >> >>>>>>waiting
> >> >> >>>>>>in the host operating system socket layer when the simulation
> >> >> >>thread
> >> >> >>>>>>exits
> >> >> >>>>>>to python to complete the next sync barrier  then the packet
> >>will
> >> >> >>>>not go
> >> >> >>>>>>into the checkpoint that may follow that sync barrier.
> >> >> >>>>>>
> >> >> >>>>>>That's a good point. Current pd-gem5 checkpointing mechanism
> >>might
> >> >> >>>>miss
> >> >> >>>>>packets that have been sent during previous quantum and are
> >>waiting
> >> >> >>in
> >> >> >>>>OS
> >> >> >>>>>socket buffer. I should add some code inside ethertap
> >>serialization
> >> >> >>>>>function to drain ethertap socket before writing checkpoint. I
> >>will
> >> >> >>>>update
> >> >> >>>>>pd-gem5 patch accordingly.
> >> >> >>>>>
> >> >> >>>>>>
> >> >> >>>>>>>What you mentioned as an advantage for multi-gem5 is actually
> >>a
> >> >> >>key
> >> >> >>>>>>>disadvantage: buffering sync messages behind data packets can
> >>add
> >> >> >>>>up to
> >> >> >>>>>>>the
> >> >> >>>>>>>synchronization overhead and slow down simulation
> >>significantly.
> >> >> >>>>>>
> >> >> >>>>>>The purpose of sync messages is to make sure that the data
> >>packets
> >> >> >>>>>>arrive
> >> >> >>>>>>in time (in terms of simulated time) at the destination so they
> >> >>can
> >> >> >>>>be
> >> >> >>>>>>scheduled for being received at the proper computed tick.  Sync
> >> >> >>>>messages
> >> >> >>>>>>also make sure that no data packets are in flight when a sync
> >> >> >>barrier
> >> >> >>>>>>completes before we take a checkpoint.  They definitely add
> >> >> >>overhead
> >> >> >>>>for
> >> >> >>>>>>the simulation but they are necessary for the correctness of
> >>the
> >> >> >>>>>>simulation.
> >> >> >>>>>>
> >> >> >>>>>>The receive thread in multi-gem5 reads out packets from the
> >>socket
> >> >> >>in
> >> >> >>>>>>parallel with the simulation thread so packets normally will
> >>not
> >> >>be
> >> >> >>>>>>"queueing up” before a sync barrier message.  There is
> >>definitely
> >> >> >>>>room
> >> >> >>>>>>for improvements in the current implementation for reducing the
> >> >> >>>>>>synchronization overhead but that is likely true for pd-gem5,
> >>too.
> >> >> >>>>>>The important thing here is that the solution must provide
> >> >> >>>>correctness
> >> >> >>>>>>(robustness) first.
> >> >> >>>>>>
> >> >> >>>>>>pd-gem5 provides correctness. Please read my previous comment.
> >>The
> >> >> >>>>whole
> >> >> >>>>>purpose of multi/pd-gem5 is to parallelize simulation with
> >>minimal
> >> >> >>>>>overhead
> >> >> >>>>>and gain speedup. If you fail to do so, nobody will use your
> >>tool.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>Also,
> >> >> >>>>>>>multi-gem5 send huge sized messages (multiHeaderPkt) through
> >> >> >>>>network to
> >> >> >>>>>>>perform each synchronization point, which increases
> >> >> >>synchronization
> >> >> >>>>>>>overhead further. In pd-gem5, we choose to send just one
> >> >>character
> >> >> >>>>as
> >> >> >>>>>>sync
> >> >> >>>>>>>message through a separate socket to reduce synchronization
> >> >> >>>>overhead.
> >> >> >>>>>>
> >> >> >>>>>>The TCP/IP message size is unlikely the bottleneck here.
> >> >>Multi-gem5
> >> >> >>>>will
> >> >> >>>>>>send ~50 bytes more in a sync barrier message than pd-gem5 but
> >> >>that
> >> >> >>>>>>bigger
> >> >> >>>>>>sync message still fits into a single ethernet frame on the
> >>wire.
> >> >> >>The
> >> >> >>>>>>end-to-end latency overhead that is caused by 50 bytes extra
> >> >> >>payload
> >> >> >>>>for
> >> >> >>>>>>a small single frame TCP/IP message is likely to fall into the
> >> >> >>>>“noise"
> >> >> >>>>>>category if one tries to measure it in a real cluster.
> >> >> >>>>>>
> >> >> >>>>>>You should prove your hypothesis experimentally. Each gem5
> >>process
> >> >> >>>>>send/receive sync messages at the end of every quantum. Say you
> >>are
> >> >> >>>>>simulating "N" node computer cluster with "M" different
> >> >> >>configuration.
> >> >> >>>>>Then
> >> >> >>>>>you will have N*M gem5 processes that send/receive these 50
> >>Bytes
> >> >>(it
> >> >> >>>>>think
> >> >> >>>>>it's more) extra data at the same time over network ...
> >> >> >>>>>
> >> >> >>>>>Furthermore, multi-gem5 send a header before each data message.
> >> >> >>>>Comparing
> >> >> >>>>>with pd-gem5, pd-gem5 just add 12 Bytes (each time-stamp is 12
> >> >>least
> >> >> >>>>>significant digits of the Tick) to each data packet. I don't
> >>know
> >> >> >>>>exactly
> >> >> >>>>>how large are these "MultiHeaderPkt", but it just has two Tick
> >> >>field
> >> >> >>>>that
> >> >> >>>>>each is 64 Bytes! Also, header packets are separate TCP
> >>packets, so
> >> >> >>you
> >> >> >>>>>pay
> >> >> >>>>>for sending two separate packets for each data packet. And
> >>worst,
> >> >>you
> >> >> >>>>>serialize all of these with sync messages.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>*  Packet handling.
> >> >> >>>>>>>
> >> >> >>>>>>>pd-gem5 uses EtherTap for data packets but changed the polling
> >> >> >>>>>>mechanism
> >> >> >>>>>>>
> >> >> >>>>>>>to go through the main event queue.  Since this rate is
> >>actually
> >> >> >>>>linked
> >> >> >>>>>>>
> >> >> >>>>>>>with simulator progress, it cannot guarantee that the packets
> >>are
> >> >> >>>>>>>serviced
> >> >> >>>>>>>
> >> >> >>>>>>>at regular intervals of real time.  This can lead to packets
> >> >> >>>>queueing
> >> >> >>>>>>up
> >> >> >>>>>>>
> >> >> >>>>>>>which would contribute to the synchronization issues mentioned
> >> >> >>>>above.
> >> >> >>>>>>>
> >> >> >>>>>>>multi-gem5 uses plain sockets with separate receive threads
> >>and
> >> >>so
> >> >> >>>>does
> >> >> >>>>>>>not
> >> >> >>>>>>>
> >> >> >>>>>>>have this issue.
> >> >> >>>>>>>
> >> >> >>>>>>>I think again you are pointing to your first concern that I¹ve
> >> >> >>>>>>explained
> >> >> >>>>>>>above. Packets that have queued up in EtherTap socket, will be
> >> >> >>>>>>processed
> >> >> >>>>>>>and delivered to simulation environment at the beginning of
> >>next
> >> >> >>>>>>>simulation
> >> >> >>>>>>>quantum.
> >> >> >>>>>>>
> >> >> >>>>>>>Please notice that multi-gem5 introduces a new simObjects to
> >> >> >>>>interface
> >> >> >>>>>>>simulation environment to real world which is redundant. This
> >> >> >>>>>>>functionality
> >> >> >>>>>>>is already there by EtherTap.
> >> >> >>>>>>
> >> >> >>>>>>Except that the EtherTap solution does not provide a correct
> >> >> >>(robust)
> >> >> >>>>>>solution for the synchronization problem.
> >> >> >>>>>>
> >> >> >>>>>>Please read my first/second comments.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>* Checkpoint accuracy.
> >> >> >>>>>>>
> >> >> >>>>>>>A user would like to have a checkpoint at precisely the time
> >>the
> >> >> >>>>>>>
> >> >> >>>>>>>'m5 checkpoint' operation is executed so as to not miss any of
> >> >>the
> >> >> >>>>>>>
> >> >> >>>>>>>area of interest in his application.
> >> >> >>>>>>>
> >> >> >>>>>>>pd-gem5 requires that simulation finish the current quantum
> >> >> >>>>>>>
> >> >> >>>>>>>before checkpointing, so it cannot provide this.
> >> >> >>>>>>>
> >> >> >>>>>>>(Shortening the quantum can help, but usually the snapshot is
> >> >> >>being
> >> >> >>>>>>taken
> >> >> >>>>>>>
> >> >> >>>>>>>while 'fast-forwarding', i.e. simulating as fast as possible,
> >> >> >>which
> >> >> >>>>>>would
> >> >> >>>>>>>
> >> >> >>>>>>>motivate a longer quantum.)
> >> >> >>>>>>>
> >> >> >>>>>>>multi-gem5 can enter the drain cycle immediately upon
> >>receiving a
> >> >> >>>>>>>
> >> >> >>>>>>>checkpoint request.  We find this accuracy highly desirable.
> >> >> >>>>>>>
> >> >> >>>>>>>It¹s true that if you have a large quantum size then there
> >>would
> >> >> >>be
> >> >> >>>>>>some
> >> >> >>>>>>>discrepancy between the m5_ckpt instruction tick and the
> >>actual
> >> >> >>dump
> >> >> >>>>>>tick.
> >> >> >>>>>>>Based on multi-gem5 code, my understanding is that you send
> >>async
> >> >> >>>>>>>checkpoint message as soon as one of the gem5 processes
> >>encounter
> >> >> >>>>>>m5_ckpt
> >> >> >>>>>>>instruction. But I¹m not sure how you fix the aforementioned
> >> >> >>issue,
> >> >> >>>>>>>because
> >> >> >>>>>>>you have to sync all gem5 processes before you start dumping
> >> >> >>>>>>checkpoint,
> >> >> >>>>>>>which necessitate a global synchronization beforehand.
> >> >> >>>>>>
> >> >> >>>>>>In multi-gem5, the gem5 process who encounters the m5_ckpt
> >> >> >>>>instruction
> >> >> >>>>>>sends out an async checkpoint notification for the peer gem5
> >> >> >>>>processes
> >> >> >>>>>>and
> >> >> >>>>>>then it starts the draining immediately (at the same tick).  So
> >> >>the
> >> >> >>>>>>checkpoint will be taken at the exact tick form the initiator
> >> >> >>process
> >> >> >>>>>>point of view. The global synchronisation with the peer
> >>processes
> >> >> >>>>takes
> >> >> >>>>>>place while the initiator process is still waiting at the same
> >> >>tick
> >> >> >>>>(i.e
> >> >> >>>>>>the simulation thread is suspended). However,  the receiver
> >>thread
> >> >> >>>>>>Continues reading out the socket - while waiting for the global
> >> >> >>sync
> >> >> >>>>to
> >> >> >>>>>>complete- to make sure that in-flight data packets from peer
> >>gem5
> >> >> >>>>>>processes
> >> >> >>>>>>are stored properly and saved into the checkpoint.
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>So you mean multi-gem5 ends up with having gem5 processes with
> >> >> >>>>different
> >> >> >>>>>ticks after checkpoint? In pd-gem5 we make sure that all gem5
> >> >> >>processes
> >> >> >>>>>start dumping checkpoint at the same tick. Are you sure that
> >>this
> >> >>is
> >> >> >>>>>correct to have each gem5 process dump checkpoint at different
> >> >> >>ticks???
> >> >> >>>>>
> >> >> >>>>>I don't think this a correct checkpointing design. However, if
> >>you
> >> >> >>>>feel it
> >> >> >>>>>is correct, I can change a couple of lines in "Simulation.py"
> >>and
> >> >> >>>>barrier
> >> >> >>>>>scripts to implement the same functionality in pd-gem5. One
> >>thing
> >> >> >>that
> >> >> >>>>you
> >> >> >>>>>are obsessed about is to make sure that there is no in-flight
> >> >>packets
> >> >> >>>>>while
> >> >> >>>>>we start dumping checkpoint, and you have all these complex
> >> >> >>mechanisms
> >> >> >>>>in
> >> >> >>>>>place to insure that! I think you can 99.99999% make sure that
> >> >>there
> >> >> >>>>is no
> >> >> >>>>>in-flight packet by waiting for 1 second after all gem5
> >>processes
> >> >> >>>>finished
> >> >> >>>>>their quantum simulation and then dump checkpoint. Do you really
> >> >> >>think
> >> >> >>>>>that
> >> >> >>>>>delivering a tcp packet would take more than 1 second in today's
> >> >> >>>>systems!?
> >> >> >>>>>Always go for simple solutions ...
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>By the way, we have a fix for this issue by introducing a new
> >>m5
> >> >> >>>>pseudo
> >> >> >>>>>>>instruction.
> >> >> >>>>>>
> >> >> >>>>>>I fail to see how a new pseudo instruction can solve the
> >>problem
> >> >>of
> >> >> >>>>>>completing the full quantum in pd-gem5 before a checkpoint can
> >>be
> >> >> >>>>taken.
> >> >> >>>>>>Could you please elaborate on that?
> >> >> >>>>>>
> >> >> >>>>>>As we take checkpoint while fast-forwarding and it is likely
> >>that
> >> >> >>we
> >> >> >>>>>>relax
> >> >> >>>>>synchronization for speedup purpose, a new pseudo instruction
> >>that
> >> >> >>can
> >> >> >>>>set
> >> >> >>>>>quantum size (m5_qset) can be helpful. So, one can insert
> >>m5_qset
> >> >>in
> >> >> >>>>his
> >> >> >>>>>benchmark source code before entering ROI that contains m5_ckpt
> >>to
> >> >> >>>>>decrease
> >> >> >>>>>quantum size beforehand and reduce the discrepancy between
> >>m5_ckpt
> >> >> >>tick
> >> >> >>>>>and
> >> >> >>>>>actual checkpoint tick. This is not included in pd-gem5 patch
> >>right
> >> >> >>>>now.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>* Implementation of network topology.
> >> >> >>>>>>>
> >> >> >>>>>>>pd-gem5 uses a separate gem5 process to act as a switch
> >>whereas
> >> >> >>>>>>multi-gem5
> >> >> >>>>>>>
> >> >> >>>>>>>uses a standalone packet relay process.
> >> >> >>>>>>>
> >> >> >>>>>>>We haven't measured the overhead of pd-gem5's simulated switch
> >> >> >>yet,
> >> >> >>>>but
> >> >> >>>>>>>
> >> >> >>>>>>>we're confident that our approach is at least as fast and more
> >> >> >>>>>>scalable.
> >> >> >>>>>>>
> >> >> >>>>>>>There is this flexibility in pd-gem5 to simulate a switch box
> >> >> >>>>alongside
> >> >> >>>>>>>one
> >> >> >>>>>>>of the other gem5 processes. However, it might make that gem5
> >> >> >>>>process
> >> >> >>>>>>the
> >> >> >>>>>>>simulation bottleneck. One of the advantages of pd-gem5 over
> >> >> >>>>>>multi-gem5 is
> >> >> >>>>>>>that we use gem5 to simulate a switch box, which allows us to
> >> >> >>model
> >> >> >>>>any
> >> >> >>>>>>>network topology by instantiating several Switch simObjects
> >>and
> >> >> >>>>>>>interconnect them with EhterLink in an arbitrary fashion. A
> >> >> >>>>standalone
> >> >> >>>>>>tcp
> >> >> >>>>>>>server just can provide switch functionality (forwarding
> >>packets
> >> >> >>to
> >> >> >>>>>>>destinations) and model a star network topology. Furthermore,
> >>it
> >> >> >>>>cannot
> >> >> >>>>>>>model various network timings such as queueing delay,
> >>congestion,
> >> >> >>>>and
> >> >> >>>>>>>routing latency. Also it has some accuracy issues that I will
> >> >> >>point
> >> >> >>>>out
> >> >> >>>>>>>next.
> >> >> >>>>>>
> >> >> >>>>>>I agree with the complex topology argument. We already
> >>mentioned
> >> >> >>that
> >> >> >>>>>>before as an advantage for pd-gem5 from the point of view of
> >> >>future
> >> >> >>>>>>extensions. However, I do not agree that multi-gem5 cannot
> >>model
> >> >> >>>>>>queueing
> >> >> >>>>>>delays and congestions. For a simple crossbar switch, it can
> >>model
> >> >> >>>>>>queueing
> >> >> >>>>>>delays and congestions, but the receive queues are distributed
> >> >> >>among
> >> >> >>>>the
> >> >> >>>>>>gem5 processes.
> >> >> >>>>>>
> >> >> >>>>>>It's true that you can model queuing delay of a simple
> >>crossbar by
> >> >> >>>>>distributing queues across gem5 processes (end points). But to
> >>be
> >> >> >>able
> >> >> >>>>to
> >> >> >>>>>do so you have to ensure the ordering of packets that you
> >>enqueue
> >> >>in
> >> >> >>>>the
> >> >> >>>>>distributed queues. It is almost impossible without a
> >>synchronized
> >> >> >>>>switch
> >> >> >>>>>box. You should have a reorder queue that reorders packets
> >> >> >>dynamically
> >> >> >>>>and
> >> >> >>>>>updates timing parameter for each packet as well. I don't know
> >>how
> >> >> >>much
> >> >> >>>>>progress have you had to ensure ordering scheme in multi-gem5
> >>but
> >> >>you
> >> >> >>>>may
> >> >> >>>>>already realized that how complex and error prone it can be.
> >>This
> >> >> >>>>argument
> >> >> >>>>>is also related to my next argument for "Broken network timing".
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>* Broken network timing:
> >> >> >>>>>>>
> >> >> >>>>>>>Forwarding packets between gem5 processes using a standalone
> >>tcp
> >> >> >>>>server
> >> >> >>>>>>>can
> >> >> >>>>>>>cause reordering between packets that have different source
> >>but
> >> >> >>same
> >> >> >>>>>>>destination. It causes  inaccurate network timing and worse of
> >> >>all
> >> >> >>>>>>>non-deterministic simulation. pd-gem5 resolve this by
> >>reordering
> >> >> >>>>>>packets
> >> >> >>>>>>>at
> >> >> >>>>>>>Switch process and then send them to their destination (it¹s
> >> >> >>>>possible
> >> >> >>>>>>as
> >> >> >>>>>>>switch is synchronized with the rest of the nodes).
> >> >> >>>>>>
> >> >> >>>>>>In multi-gem5, there is always a HeaderPkt that contains some
> >>meta
> >> >> >>>>>>information for each data packet. The meta information include
> >>the
> >> >> >>>>send
> >> >> >>>>>>tick and the sender rank (i.e. a  unique ID of the sender gem5
> >> >> >>>>process).
> >> >> >>>>>>We use those information to define a well defined ordering of
> >> >> >>packets
> >> >> >>>>>>even
> >> >> >>>>>>if packets are arriving at the same receiver from different
> >> >> >>senders.
> >> >> >>>>>>This
> >> >> >>>>>>packet ordering scheme is still being tested so the
> >>corresponding
> >> >> >>>>patch
> >> >> >>>>>>is
> >> >> >>>>>>not on the RB yet.
> >> >> >>>>>>
> >> >> >>>>>>Please read my previous comment. The most important part of
> >> >> >>>>>>multi/pd-gem5
> >> >> >>>>>extension is ensuring accurate and deterministic simulation.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>* Amount of changes
> >> >> >>>>>>>
> >> >> >>>>>>>pd-gem5 introduce different modes in etherlink just to provide
> >> >> >>>>accurate
> >> >> >>>>>>>timing for each component in the network subsystem (NIC, link,
> >> >> >>>>switch)
> >> >> >>>>>>as
> >> >> >>>>>>>well as capability of modeling different network topologies
> >> >>(mesh,
> >> >> >>>>>>ring,
> >> >> >>>>>>>fat tree, etc). To enable a simple functionality, like what
> >> >> >>>>multi-gem5
> >> >> >>>>>>>provides, the amount of changes in gem5 can be limited to
> >> >> >>>>time-stamping
> >> >> >>>>>>>packets and providing synchronization through python scripts.
> >> >> >>>>However,
> >> >> >>>>>>>multi-gem5 re-implements functionalists that are already in
> >>gem5.
> >> >> >>>>>>
> >> >> >>>>>>This argument holds only if both implementations are correct
> >> >> >>>>(robust).
> >> >> >>>>>>It
> >> >> >>>>>>still seems to me that pd-gem5 does not provide correctness for
> >> >>the
> >> >> >>>>>>synchronization/checkpointing parts.
> >> >> >>>>>>
> >> >> >>>>>>Again, please read my first comment for correctness of pd-gem5.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>* Integrating with gem5 mainstream:
> >> >> >>>>>>>
> >> >> >>>>>>>pd-gem5 launch script is written in python which is suited for
> >> >> >>>>>>integration
> >> >> >>>>>>>with gem5 python scripts. However multi-gem5 uses bash script.
> >> >> >>Also,
> >> >> >>>>>>all
> >> >> >>>>>>>source files in pd-gem5 are already parts of gem5 mainstream.
> >> >> >>>>However
> >> >> >>>>>>>multi-gem5 has tcp_server.cc/hh that is a standalone process
> >>and
> >> >> >>>>cannot
> >> >> >>>>>>be
> >> >> >>>>>>>part of gem5.
> >> >> >>>>>>
> >> >> >>>>>>The multi-gem5 launch script is simply enough to rely only on
> >>the
> >> >> >>>>>>shell. It
> >> >> >>>>>>can obviously be easily re-written in python if that added any
> >> >> >>value.
> >> >> >>>>>>The
> >> >> >>>>>>tcp_server component is only a utility (like the "m5" utility
> >>that
> >> >> >>is
> >> >> >>>>>>also
> >> >> >>>>>>part of gem5).
> >> >> >>>>>>
> >> >> >>>>>>The thing is that it's more likely that users want to add some
> >> >> >>>>>functionality to the run-script of multi/pd-gem5. E.g. pd-gem5
> >> >> >>>>run-script
> >> >> >>>>>supports launching simulations using a simulation pool
> >>management
> >> >> >>>>>software (
> >> >> >>>>>http://research.cs.wisc.edu/htcondor/). Using python enables
> >>users
> >> >>to
> >> >> >>>>>easily add these kind of supports.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>>
> >> >> >>>>>>Cheers,
> >> >> >>>>>>- Gabor
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>>On Fri, Jun 26, 2015 at 8:40 PM, Curtis Dunham
> >> >> >>>><curtis.dun...@arm.com>
> >> >> >>>>>>>wrote:
> >> >> >>>>>>>
> >> >> >>>>>>>>Hello everyone,
> >> >> >>>>>>>>We have taken a look at how pd-gem5 compares with multi-gem5.
> >> >> >>>>While
> >> >> >>>>>>>>intending
> >> >> >>>>>>>>to deliver the same functionality, there are some crucial
> >> >> >>>>differences:
> >> >> >>>>>>>>
> >> >> >>>>>>>>*  Synchronization.
> >> >> >>>>>>>>
> >> >> >>>>>>>>    pd-gem5 implements this in Python (not a problem in
> >>itself;
> >> >> >>>>>>>>aesthetically
> >> >> >>>>>>>>    this is nice, but...).  The issue is that pd-gem5's data
> >> >> >>>>packets
> >> >> >>>>>>and
> >> >> >>>>>>>>    barrier messages travel over different sockets.  Since
> >> >> >>pd-gem5
> >> >> >>>>>>could
> >> >> >>>>>>>>see
> >> >> >>>>>>>>    data packets passing synchronization barriers, it could
> >> >> >>create
> >> >> >>>>an
> >> >> >>>>>>>>    inconsistent checkpoint.
> >> >> >>>>>>>>
> >> >> >>>>>>>>    multi-gem5's synchronization is implemented in C++ using
> >> >>sync
> >> >> >>>>>>events,
> >> >> >>>>>>>>but
> >> >> >>>>>>>>    more importantly, the messages queue up in the same
> >>stream
> >> >> >>and
> >> >> >>>>so
> >> >> >>>>>>>>cannot
> >> >> >>>>>>>>    have the issue just described.  (Event ordering is often
> >> >> >>>>crucial
> >> >> >>>>>>in
> >> >> >>>>>>>>    snapshot protocols.) Therefore we feel that multi-gem5
> >>is a
> >> >> >>>>more
> >> >> >>>>>>>>robust
> >> >> >>>>>>>>    solution in this respect.
> >> >> >>>>>>>>
> >> >> >>>>>>>>*  Packet handling.
> >> >> >>>>>>>>
> >> >> >>>>>>>>    pd-gem5 uses EtherTap for data packets but changed the
> >> >> >>polling
> >> >> >>>>>>>>mechanism
> >> >> >>>>>>>>    to go through the main event queue.  Since this rate is
> >> >> >>>>actually
> >> >> >>>>>>>>linked
> >> >> >>>>>>>>    with simulator progress, it cannot guarantee that the
> >> >>packets
> >> >> >>>>are
> >> >> >>>>>>>>serviced
> >> >> >>>>>>>>    at regular intervals of real time.  This can lead to
> >>packets
> >> >> >>>>>>>>queueing up
> >> >> >>>>>>>>    which would contribute to the synchronization issues
> >> >> >>mentioned
> >> >> >>>>>>above.
> >> >> >>>>>>>>
> >> >> >>>>>>>>    multi-gem5 uses plain sockets with separate receive
> >>threads
> >> >> >>>>and so
> >> >> >>>>>>>>does
> >> >> >>>>>>>>not
> >> >> >>>>>>>>    have this issue.
> >> >> >>>>>>>>
> >> >> >>>>>>>>* Checkpoint accuracy.
> >> >> >>>>>>>>
> >> >> >>>>>>>>   A user would like to have a checkpoint at precisely the
> >>time
> >> >> >>the
> >> >> >>>>>>>>   'm5 checkpoint' operation is executed so as to not miss
> >>any
> >> >>of
> >> >> >>>>the
> >> >> >>>>>>>>   area of interest in his application.
> >> >> >>>>>>>>
> >> >> >>>>>>>>   pd-gem5 requires that simulation finish the current
> >>quantum
> >> >> >>>>>>>>   before checkpointing, so it cannot provide this.
> >> >> >>>>>>>>
> >> >> >>>>>>>>   (Shortening the quantum can help, but usually the
> >>snapshot is
> >> >> >>>>being
> >> >> >>>>>>>>taken
> >> >> >>>>>>>>   while 'fast-forwarding', i.e. simulating as fast as
> >>possible,
> >> >> >>>>which
> >> >> >>>>>>>>would
> >> >> >>>>>>>>   motivate a longer quantum.)
> >> >> >>>>>>>>
> >> >> >>>>>>>>   multi-gem5 can enter the drain cycle immediately upon
> >> >> >>receiving
> >> >> >>>>a
> >> >> >>>>>>>>   checkpoint request.  We find this accuracy highly
> >>desirable.
> >> >> >>>>>>>>
> >> >> >>>>>>>>* Implementation of network topology.
> >> >> >>>>>>>>
> >> >> >>>>>>>>   pd-gem5 uses a separate gem5 process to act as a switch
> >> >> >>whereas
> >> >> >>>>>>>>multi-gem5
> >> >> >>>>>>>>   uses a standalone packet relay process.
> >> >> >>>>>>>>
> >> >> >>>>>>>>   We haven't measured the overhead of pd-gem5's simulated
> >> >>switch
> >> >> >>>>yet,
> >> >> >>>>>>>>but
> >> >> >>>>>>>>   we're confident that our approach is at least as fast and
> >> >>more
> >> >> >>>>>>>>scalable.
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>>Thanks,
> >> >> >>>>>>>>Curtis
> >> >> >>>>>>>>________________________________________
> >> >> >>>>>>>>From: gem5-dev [gem5-dev-boun...@gem5.org] On Behalf Of
> >> Mohammad
> >> >> >>>>>>Alian [
> >> >> >>>>>>>>al...@wisc.edu]
> >> >> >>>>>>>>Sent: Friday, June 26, 2015 7:37 PM
> >> >> >>>>>>>>To: gem5 Developer List
> >> >> >>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a
> >> >> >>parallel/distributed
> >> >> >>>>>>>>system
> >> >> >>>>>>>>on multiple physical hosts
> >> >> >>>>>>>>
> >> >> >>>>>>>>Hi Anthony,
> >> >> >>>>>>>>
> >> >> >>>>>>>>I think that would be a good option, then I can add pd-gem5
> >> >> >>>>>>>>functionality
> >> >> >>>>>>>>on top of that. Right now I've simplified your
> >>implementation.
> >> >> >>>>Also, I
> >> >> >>>>>>>>think I had found some bugs in your patch that I cannot
> >>remember
> >> >> >>>>now.
> >> >> >>>>>>If
> >> >> >>>>>>>>you decided to ship EtherSwitch patch, let me know to give
> >>you a
> >> >> >>>>>>review
> >> >> >>>>>>>>on
> >> >> >>>>>>>>that.
> >> >> >>>>>>>>
> >> >> >>>>>>>>Thanks,
> >> >> >>>>>>>>Mohammad
> >> >> >>>>>>>>
> >> >> >>>>>>>>On Thu, Jun 25, 2015 at 8:36 PM, Gutierrez, Anthony <
> >> >> >>>>>>>>anthony.gutier...@amd.com> wrote:
> >> >> >>>>>>>>
> >> >> >>>>>>>>>Would it make sense for me to ship the EtherSwitch patch
> >>first,
> >> >> >>>>since
> >> >> >>>>>>>>it
> >> >> >>>>>>>>>has utility on its own, and then we can decide which of the
> >> >> >>>>>>>>"multi-gem5"
> >> >> >>>>>>>>>approaches is best, or if it's some combination of both?
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>The only reason I never shipped it was because Steve raised
> >>an
> >> >> >>>>issue
> >> >> >>>>>>>>that
> >> >> >>>>>>>>>I didn't have a good alternative for, and didn't have the
> >>time
> >> >> >>to
> >> >> >>>>>>look
> >> >> >>>>>>>>into
> >> >> >>>>>>>>>one at that time.
> >> >> >>>>>>>>>________________________________________
> >> >> >>>>>>>>>From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of
> >> >>Mohammad
> >> >> >>>>>>>>Alian [
> >> >> >>>>>>>>>al...@wisc.edu]
> >> >> >>>>>>>>>Sent: Wednesday, June 24, 2015 12:43 PM
> >> >> >>>>>>>>>To: gem5 Developer List
> >> >> >>>>>>>>>Subject: Re: [gem5-dev] pd-gem5: simulating a
> >> >> >>parallel/distributed
> >> >> >>>>>>>>system
> >> >> >>>>>>>>>on multiple physical hosts
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>Hi Andreas,
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>Thanks for the comment.
> >> >> >>>>>>>>>I think the checkpointing support in both works is the same.
> >> >> >>Here
> >> >> >>>>is
> >> >> >>>>>>>>how
> >> >> >>>>>>>>>checkpointing support is implemented in pd-gem5:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>Whenever one of gem5 processes encounter an m5-checkpoint
> >> >>pseudo
> >> >> >>>>>>>>>instruction, it will send a ³recv-ckpt² signal to the
> >> >> >>>>>>>>>³barrier² process. Then the ³barrier² process sends a
> >> >> >>³take-ckpt²
> >> >> >>>>>>>>signal
> >> >> >>>>>>>>to
> >> >> >>>>>>>>>all the simulated nodes
> >> >> >>>>>>>>>(including the node that encountered m5-checkpoint) at the
> >>end
> >> >> >>of
> >> >> >>>>the
> >> >> >>>>>>>>>current simulation quantum. On the reception of
> >> >> >>>>>>>>>³take-ckpt² signal, gem5 processes start dumping
> >>check-points.
> >> >> >>>>This
> >> >> >>>>>>>>makes
> >> >> >>>>>>>>>each simulated node dump a checkpoint
> >> >> >>>>>>>>>at the same simulated time point while ensuring there is no
> >> >> >>>>in-flight
> >> >> >>>>>>>>>packets.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>I believe this is the same as multi-gem5 patch approach for
> >> >> >>>>>>checkpoint
> >> >> >>>>>>>>>support (based on the commit message of
> >> >> >>>>>>>>http://reviews.gem5.org/r/2865/
> >> >> >>>>>>>>).
> >> >> >>>>>>>>>Also, we have tested our mechanism with several benchmarks
> >>and
> >> >> >>it
> >> >> >>>>>>>>works.
> >> >> >>>>>>>>As
> >> >> >>>>>>>>>Steve suggested, I'll look into Curtis's patch and try to
> >> >>review
> >> >> >>>>it
> >> >> >>>>>>as
> >> >> >>>>>>>>>well.
> >> >> >>>>>>>>>But as Nilay also mentioned earlier, there are some codes
> >> >> >>missing
> >> >> >>>>in
> >> >> >>>>>>>>>Curtis's patch. I prefer to first run multi-gem5 before
> >> >>starting
> >> >> >>>>to
> >> >> >>>>>>>>review
> >> >> >>>>>>>>>it.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>Thank you,
> >> >> >>>>>>>>>Mohammad
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>On Wed, Jun 24, 2015 at 7:25 AM, Andreas Hansson <
> >> >> >>>>>>>>andreas.hans...@arm.com>
> >> >> >>>>>>>>>wrote:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>>Hi Steve,
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>Apologies for the confusion. We are on the same page. My
> >>point
> >> >> >>is
> >> >> >>>>>>>>that
> >> >> >>>>>>>>we
> >> >> >>>>>>>>>>cannot simply take a little bit of patch A and a little
> >>bit of
> >> >> >>>>>>>>patch B.
> >> >> >>>>>>>>>>This change involves a lot of code, and we need to approach
> >> >> >>this
> >> >> >>>>in
> >> >> >>>>>>>>a
> >> >> >>>>>>>>>>structured fashion. My proposal is to do it bottom up, and
> >> >> >>start
> >> >> >>>>by
> >> >> >>>>>>>>>>getting the basic support in place. Since
> >> >> >>>>>>>>>http://reviews.gem5.org/r/2826/
> >> >> >>>>>>>>>>has already been on the review board for a few months, I am
> >> >> >>>>merely
> >> >> >>>>>>>>>>suggesting that the it would be a good start to relate the
> >> >> >>newly
> >> >> >>>>>>>>posted
> >> >> >>>>>>>>>>patches to what is already there.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>Andreas
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>On 24/06/2015 13:11, "gem5-dev on behalf of Steve
> >>Reinhardt"
> >> >> >>>>>>>>>><gem5-dev-boun...@gem5.org on behalf of ste...@gmail.com>
> >> >> >>wrote:
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>>Hi Andreas,
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>I'm a little confused by your email---you say you're
> >> >> >>>>fundamentally
> >> >> >>>>>>>>>opposed
> >> >> >>>>>>>>>>>to looking at both patches and picking the best features,
> >> >>then
> >> >> >>>>you
> >> >> >>>>>>>>point
> >> >> >>>>>>>>>>>out that the patches Curtis posted have the feature of
> >>better
> >> >> >>>>>>>>>>>checkpointing
> >> >> >>>>>>>>>>>support so we should pick that :).
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>Obviously we can't just pick patch A from Mohammad's set
> >>and
> >> >> >>>>patch
> >> >> >>>>>>>>B
> >> >> >>>>>>>>>from
> >> >> >>>>>>>>>>>Curtis's set and expect them to work together, but I think
> >> >> >>that
> >> >> >>>>>>>>having
> >> >> >>>>>>>>>>>both
> >> >> >>>>>>>>>>>sets of patches available and comparing and contrasting
> >>the
> >> >> >>two
> >> >> >>>>>>>>>>>implementations should enable us to get to a single
> >> >> >>>>implementation
> >> >> >>>>>>>>>that's
> >> >> >>>>>>>>>>>the best of both. Someone will have to make the effort of
> >> >> >>>>>>>>integrating
> >> >> >>>>>>>>>the
> >> >> >>>>>>>>>>>better ideas from one set into the other set to create a
> >>new
> >> >> >>>>>>>>unified
> >> >> >>>>>>>>set
> >> >> >>>>>>>>>>>of
> >> >> >>>>>>>>>>>patches; (or maybe we commit one set and then integrate
> >>the
> >> >> >>>>best of
> >> >> >>>>>>>>the
> >> >> >>>>>>>>>>>other set as patches on top of that), but the first step
> >>is
> >> >>to
> >> >> >>>>>>>>identify
> >> >> >>>>>>>>>>>what "the best of both" is.  Having Mohammad look at
> >>Curtis's
> >> >> >>>>>>>>patches,
> >> >> >>>>>>>>>and
> >> >> >>>>>>>>>>>Curtis (or someone else from ARM) closely examine
> >>Mohammad's
> >> >> >>>>>>>>patches
> >> >> >>>>>>>>>would
> >> >> >>>>>>>>>>>be a great start.  I intend to review them both, though
> >> >> >>>>>>>>unfortunately
> >> >> >>>>>>>>my
> >> >> >>>>>>>>>>>time has been scarce lately---I'm hoping to squeeze that
> >>in
> >> >> >>>>later
> >> >> >>>>>>>>this
> >> >> >>>>>>>>>>>week.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>Once we've had a few people look at both, we can discuss
> >>the
> >> >> >>>>pros
> >> >> >>>>>>>>and
> >> >> >>>>>>>>>cons
> >> >> >>>>>>>>>>>of each, then discuss the strategy for getting the best
> >> >> >>features
> >> >> >>>>>>>>in.
> >> >> >>>>>>>>So
> >> >> >>>>>>>>>>>far I've heard that Mohammad's patches have a better
> >>network
> >> >> >>>>model
> >> >> >>>>>>>>but
> >> >> >>>>>>>>>the
> >> >> >>>>>>>>>>>ARM patches have better checkpointing support; that seems
> >> >> >>like a
> >> >> >>>>>>>>good
> >> >> >>>>>>>>>>>start.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>Steve
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>On Wed, Jun 24, 2015 at 12:26 AM Andreas Hansson <
> >> >> >>>>>>>>>andreas.hans...@arm.com
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>wrote:
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>>Hi all,
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>Great work. However, I fundamentally do not believe in
> >>the
> >> >> >>>>>>>>approach
> >> >> >>>>>>>>of
> >> >> >>>>>>>>>>>>Œletting reviewers pick the best features¹. There is no
> >>way
> >> >> >>we
> >> >> >>>>>>>>would
> >> >> >>>>>>>>>>>>ever
> >> >> >>>>>>>>>>>>get something working out if it. We need to get _one_
> >> >>working
> >> >> >>>>>>>>solution
> >> >> >>>>>>>>>>>>here, and figure out how to best get there. I would
> >>propose
> >> >> >>to
> >> >> >>>>>>>>do it
> >> >> >>>>>>>>>>>>bottom up, starting with the basic multi-simulator
> >>instance
> >> >> >>>>>>>>support,
> >> >> >>>>>>>>>>>>checkpointing support, and then move on to the network
> >> >> >>between
> >> >> >>>>>>>>the
> >> >> >>>>>>>>>>>>simulator instances.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>Thus, I propose we go with the low-level plumbing and
> >> >> >>>>checkpoint
> >> >> >>>>>>>>>support
> >> >> >>>>>>>>>>>>from what Curtis has posted. I believe proper
> >>checkpointing
> >> >> >>>>>>>>support
> >> >> >>>>>>>>to
> >> >> >>>>>>>>>>>>be
> >> >> >>>>>>>>>>>>the most challenging, and from what I can tell this is
> >>far
> >> >> >>more
> >> >> >>>>>>>>>limited
> >> >> >>>>>>>>>>>>in
> >> >> >>>>>>>>>>>>what you just posted Mohammad. Could you perhaps review
> >> >> >>Curtis
> >> >> >>>>>>>>patches
> >> >> >>>>>>>>>>>>based on your insights, and we can try and get these
> >>patches
> >> >> >>in
> >> >> >>>>>>>>shape
> >> >> >>>>>>>>>>>>and
> >> >> >>>>>>>>>>>>committed asap.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>Once we have the baseline functionality in place, then we
> >> >>can
> >> >> >>>>>>>>start
> >> >> >>>>>>>>>>>>looking at the more elaborate network models.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>Does this sound reasonable?
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>Thanks,
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>Andreas
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>On 24/06/2015 05:05, "gem5-dev on behalf of Mohammad
> >>Alian"
> >> >> >>>>>>>>>>>><gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu>
> >> >> >>wrote:
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>Hello All,
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>I have submitted a chain of patches which enables gem5
> >>to
> >> >> >>>>>>>>simulate
> >> >> >>>>>>>>a
> >> >> >>>>>>>>>>>>>cluster on multiple physical hosts:
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2909/
> >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2910/
> >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2912/
> >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2913/
> >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2914/
> >> >> >>>>>>>><http://reviews.gem5.org/r/2914/>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>and a patch that contains run scripts for a simple
> >> >> >>experiment:
> >> >> >>>>>>>>>>>>>http://reviews.gem5.org/r/2915/
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>We have run several benchmarks using this
> >>infrastructure,
> >> >> >>>>>>>>including
> >> >> >>>>>>>>>NAS
> >> >> >>>>>>>>>>>>>parallel benchmarks (MPI) and DCBench-hadoop
> >> >> >>>>>>>>>>>>>(http://prof.ict.ac.cn/DCBench/),
> >> >> >>>>>>>>>>>>>and would be happy to share scripts/diskimages.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>We call this *pd-gem5*. *pd-gem5 *functionality is more
> >>or
> >> >> >>>>less
> >> >> >>>>>>>>the
> >> >> >>>>>>>>>>>>same
> >> >> >>>>>>>>>>>>>as
> >> >> >>>>>>>>>>>>>Curtis's patch for *multi-gem5.* However, I feel
> >>*pd-gem5
> >> >> >>>>>>>>*network
> >> >> >>>>>>>>>>>>model
> >> >> >>>>>>>>>>>>>is
> >> >> >>>>>>>>>>>>>more thorough; it also enables modeling different
> >>network
> >> >> >>>>>>>>topologies.
> >> >> >>>>>>>>>>>>>Having both set of changes together let reviewers to
> >>pick
> >> >> >>best
> >> >> >>>>>>>>>features
> >> >> >>>>>>>>>>>>>from both works.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>Thank you,
> >> >> >>>>>>>>>>>>>Mohammad Alian
> >> >> >>>>>>>>>>>>>_______________________________________________
> >> >> >>>>>>>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >> >>>>>>>>attachments
> >> >> >>>>>>>>>are
> >> >> >>>>>>>>>>>>confidential and may also be privileged. If you are not
> >>the
> >> >> >>>>>>>>intended
> >> >> >>>>>>>>>>>>recipient, please notify the sender immediately and do
> >>not
> >> >> >>>>>>>>disclose
> >> >> >>>>>>>>>the
> >> >> >>>>>>>>>>>>contents to any other person, use it for any purpose, or
> >> >> >>store
> >> >> >>>>or
> >> >> >>>>>>>>copy
> >> >> >>>>>>>>>>>>the
> >> >> >>>>>>>>>>>>information in any medium.  Thank you.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road,
> >>Cambridge
> >> >> >>CB1
> >> >> >>>>>>>>9NJ,
> >> >> >>>>>>>>>>>>Registered in England & Wales, Company No:  2557590
> >> >> >>>>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road,
> >> >> >>>>Cambridge
> >> >> >>>>>>>>CB1
> >> >> >>>>>>>>>>>>9NJ,
> >> >> >>>>>>>>>>>>Registered in England & Wales, Company No:  2548782
> >> >> >>>>>>>>>>>>_______________________________________________
> >> >> >>>>>>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>_______________________________________________
> >> >> >>>>>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >> >>>>attachments
> >> >> >>>>>>>>are
> >> >> >>>>>>>>>>confidential and may also be privileged. If you are not the
> >> >> >>>>intended
> >> >> >>>>>>>>>>recipient, please notify the sender immediately and do not
> >> >> >>>>disclose
> >> >> >>>>>>>>the
> >> >> >>>>>>>>>>contents to any other person, use it for any purpose, or
> >>store
> >> >> >>or
> >> >> >>>>>>>>copy
> >> >> >>>>>>>>>the
> >> >> >>>>>>>>>>information in any medium.  Thank you.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge
> >> >>CB1
> >> >> >>>>9NJ,
> >> >> >>>>>>>>>>Registered in England & Wales, Company No:  2557590
> >> >> >>>>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road,
> >> >> >>Cambridge
> >> >> >>>>CB1
> >> >> >>>>>>>>9NJ,
> >> >> >>>>>>>>>>Registered in England & Wales, Company No:  2548782
> >> >> >>>>>>>>>>_______________________________________________
> >> >> >>>>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>_______________________________________________
> >> >> >>>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>>_______________________________________________
> >> >> >>>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>>
> >> >> >>>>>>>>_______________________________________________
> >> >> >>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>
> >> >> >>>>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >> >>attachments
> >> >> >>>>>>are
> >> >> >>>>>>>>confidential and may also be privileged. If you are not the
> >> >> >>>>intended
> >> >> >>>>>>>>recipient, please notify the sender immediately and do not
> >> >> >>disclose
> >> >> >>>>>>the
> >> >> >>>>>>>>contents to any other person, use it for any purpose, or
> >>store
> >> >>or
> >> >> >>>>copy
> >> >> >>>>>>>>the
> >> >> >>>>>>>>information in any medium.  Thank you.
> >> >> >>>>>>>>
> >> >> >>>>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge
> >>CB1
> >> >> >>>>9NJ,
> >> >> >>>>>>>>Registered in England & Wales, Company No:  2557590
> >> >> >>>>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road,
> >>Cambridge
> >> >> >>>>CB1
> >> >> >>>>>>>>9NJ,
> >> >> >>>>>>>>Registered in England & Wales, Company No:  2548782
> >> >> >>>>>>>>
> >> >> >>>>>>>>_______________________________________________
> >> >> >>>>>>>>gem5-dev mailing list
> >> >> >>>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>>>
> >> >> >>>>>>>_______________________________________________
> >> >> >>>>>>>gem5-dev mailing list
> >> >> >>>>>>>gem5-dev@gem5.org
> >> >> >>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>-- IMPORTANT NOTICE: The contents of this email and any
> >> >>attachments
> >> >> >>>>are
> >> >> >>>>>>confidential and may also be privileged. If you are not the
> >> >> >>intended
> >> >> >>>>>>recipient, please notify the sender immediately and do not
> >> >>disclose
> >> >> >>>>the
> >> >> >>>>>>contents to any other person, use it for any purpose, or store
> >>or
> >> >> >>>>copy
> >> >> >>>>>>the
> >> >> >>>>>>information in any medium.  Thank you.
> >> >> >>>>>>
> >> >> >>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >> >>9NJ,
> >> >> >>>>>>Registered in England & Wales, Company No:  2557590
> >> >> >>>>>>ARM Holdings plc, Registered office 110 Fulbourn Road,
> >>Cambridge
> >> >> >>CB1
> >> >> >>>>>>9NJ,
> >> >> >>>>>>Registered in England & Wales, Company No:  2548782
> >> >> >>>>>>_______________________________________________
> >> >> >>>>>>gem5-dev mailing list
> >> >> >>>>>>gem5-dev@gem5.org
> >> >> >>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>>>
> >> >> >>>>>_______________________________________________
> >> >> >>>>>gem5-dev mailing list
> >> >> >>>>>gem5-dev@gem5.org
> >> >> >>>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>-- IMPORTANT NOTICE: The contents of this email and any
> >>attachments
> >> >> >>are
> >> >> >>>>confidential and may also be privileged. If you are not the
> >>intended
> >> >> >>>>recipient, please notify the sender immediately and do not
> >>disclose
> >> >> >>the
> >> >> >>>>contents to any other person, use it for any purpose, or store or
> >> >>copy
> >> >> >>>>the
> >> >> >>>>information in any medium.  Thank you.
> >> >> >>>>
> >> >> >>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> >>9NJ,
> >> >> >>>>Registered in England & Wales, Company No:  2557590
> >> >> >>>>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> >>CB1
> >> >> >>>>9NJ,
> >> >> >>>>Registered in England & Wales, Company No:  2548782
> >> >> >>>>_______________________________________________
> >> >> >>>>gem5-dev mailing list
> >> >> >>>>gem5-dev@gem5.org
> >> >> >>>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>>>
> >> >> >>>_______________________________________________
> >> >> >>>gem5-dev mailing list
> >> >> >>>gem5-dev@gem5.org
> >> >> >>>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>-- IMPORTANT NOTICE: The contents of this email and any attachments
> >> >>are
> >> >> >>confidential and may also be privileged. If you are not the
> >>intended
> >> >> >>recipient, please notify the sender immediately and do not disclose
> >> >>the
> >> >> >>contents to any other person, use it for any purpose, or store or
> >>copy
> >> >> >>the
> >> >> >>information in any medium.  Thank you.
> >> >> >>
> >> >> >>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> >>9NJ,
> >> >> >>Registered in England & Wales, Company No:  2557590
> >> >> >>ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> >>CB1
> >> >> >>9NJ,
> >> >> >>Registered in England & Wales, Company No:  2548782
> >> >> >>_______________________________________________
> >> >> >>gem5-dev mailing list
> >> >> >>gem5-dev@gem5.org
> >> >> >>http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>
> >> >> >_______________________________________________
> >> >> >gem5-dev mailing list
> >> >> >gem5-dev@gem5.org
> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments
> >>are
> >> >> confidential and may also be privileged. If you are not the intended
> >> >> recipient, please notify the sender immediately and do not disclose
> >>the
> >> >> contents to any other person, use it for any purpose, or store or
> >>copy
> >> >>the
> >> >> information in any medium.  Thank you.
> >> >>
> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> >> Registered in England & Wales, Company No:  2557590
> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >>9NJ,
> >> >> Registered in England & Wales, Company No:  2548782
> >> >> _______________________________________________
> >> >> gem5-dev mailing list
> >> >> gem5-dev@gem5.org
> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>
> >> >_______________________________________________
> >> >gem5-dev mailing list
> >> >gem5-dev@gem5.org
> >> >http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >>
> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> >> confidential and may also be privileged. If you are not the intended
> >> recipient, please notify the sender immediately and do not disclose the
> >> contents to any other person, use it for any purpose, or store or copy
> >>the
> >> information in any medium.  Thank you.
> >>
> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> Registered in England & Wales, Company No:  2557590
> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >>9NJ,
> >> Registered in England & Wales, Company No:  2548782
> >> _______________________________________________
> >> gem5-dev mailing list
> >> gem5-dev@gem5.org
> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >_______________________________________________
> >gem5-dev mailing list
> >gem5-dev@gem5.org
> >http://m5sim.org/mailman/listinfo/gem5-dev
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium.  Thank you.
>
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2548782
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system on multiple physical hosts

Reply via email to