I agree, and I think points 1 and 3 are also non-negotiable. Given that, I 
think the multi-gem5 design is more robust and fits in with the overall gem5 
design philosophy. I've been slowly going over the code and see no major 
problems - certainly nothing to warrant keeping it out of the code base.

I was planning on giving it a ship it today, so I'll do that now.

-----Original Message-----
From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Andreas Hansson
Sent: Thursday, July 02, 2015 8:35 AM
To: gem5 Developer List
Subject: Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system on 
multiple physical hosts

Hi all,

I think we need to up-level this a bit. From our perspective (and I suspect in 
general):

1. Robustness is important. Having a design that _may_ break, however unlikely 
is simply not an option.

2. Performance and scaling is important. We can compare actual numbers here, 
and I am fairly sure the two solutions are on par. Let’s quantify that though.

3. Checkpointing must not rely on synchronicity. It is vital for several 
workloads that we can checkpoint the various gem5 instances at different Ticks 
(due to the way the workloads are constructed).

Andreas

On 01/07/2015 21:41, "gem5-dev on behalf of Mohammad Alian"
<gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu> wrote:

>Thanks Gabor for the reply.
>
>I feel this conversation is useful as we can find out pros/cons of each 
>design.
>Please find my response in-lined below.
>
>Thank you,
>Mohammad
>
>On Wed, Jul 1, 2015 at 6:44 AM, Gabor Dozsa <gabor.do...@arm.com> wrote:
>
>> Hi All,
>>
>> Sorry for the missing indentation in my previous e-mail! (This was my  
>>first e-mail to the dev-list so I could not simply use “reply"). Below 
>>is  the same message, hopefully in more readable form.
>>
>> ====================================
>>
>> Hi  All,
>>
>> Thank you Mohammad for your elaboration on the issues!
>>
>> I have written most of the multi-gem5 patch so let me add some more  
>>clarifications  and answer to your concerns. My comments are inline 
>>below.
>>
>> Thanks,
>> - Gabor
>>
>> On 6/27/15, 10:20 AM, "Mohammad Alian" <al...@wisc.edu> wrote:
>>
>> >Hi All,
>> >
>> >Curtis-Thank you for listing some of the differences. I was waiting 
>> >for the completed multi-gem5 patch before I send my review. Please 
>> >see my
>>inline
>> >response below. I¹ve addressed the concerns that you¹ve raised. 
>> >Also,
>>I¹ve
>> >added a bit more to the comparison.
>> >
>> >-*  Synchronization.
>> >
>> >pd-gem5 implements this in Python (not a problem in itself;
>>aesthetically
>> >
>> >this is nice, but...).  The issue is that pd-gem5's data packets and
>> >
>> >barrier messages travel over different sockets.  Since pd-gem5 could
>>see
>> >
>> >data packets passing synchronization barriers, it could create an
>> >
>> >inconsistent checkpoint.
>> >
>> >multi-gem5's synchronization is implemented in C++ using sync 
>> >events,
>>but
>> >
>> >more importantly, the messages queue up in the same stream and so
>>cannot
>> >
>> >have the issue just described.  (Event ordering is often crucial in
>> >
>> >snapshot protocols.) Therefore we feel that multi-gem5 is a more 
>> >robust
>> >
>> >solution in this respect.
>> >
>> >Each packet in pd-gem5 has a time-stamp. So even if data packets 
>> >pass synchronization barriers (in another word data packets arrive 
>> >early at
>>the
>> >destination node), destination node process packets based on their 
>> >timestamp. Actually allowing data packets to pass sync barriers is a
>>nice
>> >feature that can reduce the likelihood of late packet reception.
>>Ordering
>> >of data messages that flow over pd-gem5 nodes is also preserved in
>>pd-gem5
>> >implementation.
>>
>> This seems to be a misunderstanding. Maybe the wording was not 
>>precise  before.The problem is not a data packet that “passing" a sync 
>>barrier  but the other way around, a sync barrier that can pass a data 
>>packet  (e.g. while the data packet is waiting in the host operating 
>>system  socket layer).  If that happens, the packet will arrive later 
>>than it was  supposed to and it may miss the computed receive tick.
>>
>> For instance, let’s assume that the quantum coincides with the 
>>simulated  Ether link delay. (This is the optimal choice of quantum to 
>>minimize the  number of sync barriers.)  If a data packet is sent 
>>right at the beginning  of a quantum then this packet must arrive at 
>>the destination gem5 process  within the same quantum in order not to 
>>miss its receive tick at the very  beginning of the next quantum. If 
>>the sync barrier can pass the data packet  then the data packet may 
>>arrive only during the next quantum (or  in  extreme conditions even 
>>later than that) so when it arrives the receiver
>> gem5 may pass already the receive tick.
>>
>> This argument makes more sense than the previous one. Note that gem5 
>>is a
>cycle accurate simulator and it runs orders of magnitude slower that 
>real hardware. So it's almost impossible that the flight time of packet 
>through real network turns to be more that simulation time of one 
>quantum. We ran a set of experiments just for this purpose: with 
>quantum size equal to etherlink delay, we never got any late arrival 
>violation (what you
>described) for full NAS benchmarks suit (please refer to the paper).
>
>multi-gem5 is optimized for a case that almost never happens! and 
>scarifying speedup for no gain.
>
>
>> Time-stamping does help with this issue. Also, if a data packet is 
>>waiting  in the host operating system socket layer when the simulation 
>>thread exits  to python to complete the next sync barrier  then the 
>>packet will not go  into the checkpoint that may follow that sync 
>>barrier.
>>
>> That's a good point. Current pd-gem5 checkpointing mechanism might 
>> miss
>packets that have been sent during previous quantum and are waiting in 
>OS socket buffer. I should add some code inside ethertap serialization 
>function to drain ethertap socket before writing checkpoint. I will 
>update
>pd-gem5 patch accordingly.
>
>>
>> >What you mentioned as an advantage for multi-gem5 is actually a key
>> >disadvantage: buffering sync messages behind data packets can add up 
>> >to the synchronization overhead and slow down simulation 
>> >significantly.
>>
>> The purpose of sync messages is to make sure that the data packets 
>>arrive  in time (in terms of simulated time) at the destination so 
>>they can be  scheduled for being received at the proper computed tick.  
>>Sync messages  also make sure that no data packets are in flight when 
>>a sync barrier  completes before we take a checkpoint.  They 
>>definitely add overhead for  the simulation but they are necessary for 
>>the correctness of the  simulation.
>>
>> The receive thread in multi-gem5 reads out packets from the socket in 
>> parallel with the simulation thread so packets normally will not be 
>> "queueing up” before a sync barrier message.  There is definitely 
>> room for improvements in the current implementation for reducing the 
>> synchronization overhead but that is likely true for pd-gem5, too.
>> The important thing here is that the solution must provide 
>> correctness
>> (robustness) first.
>>
>> pd-gem5 provides correctness. Please read my previous comment. The 
>> whole
>purpose of multi/pd-gem5 is to parallelize simulation with minimal 
>overhead and gain speedup. If you fail to do so, nobody will use your 
>tool.
>
>
>> >Also,
>> >multi-gem5 send huge sized messages (multiHeaderPkt) through network 
>> >to perform each synchronization point, which increases 
>> >synchronization overhead further. In pd-gem5, we choose to send just 
>> >one character as
>>sync
>> >message through a separate socket to reduce synchronization overhead.
>>
>> The TCP/IP message size is unlikely the bottleneck here. Multi-gem5 
>>will  send ~50 bytes more in a sync barrier message than pd-gem5 but 
>>that bigger  sync message still fits into a single ethernet frame on 
>>the wire. The  end-to-end latency overhead that is caused by 50 bytes 
>>extra payload for  a small single frame TCP/IP message is likely to 
>>fall into the “noise"
>> category if one tries to measure it in a real cluster.
>>
>> You should prove your hypothesis experimentally. Each gem5 process
>send/receive sync messages at the end of every quantum. Say you are 
>simulating "N" node computer cluster with "M" different configuration.
>Then
>you will have N*M gem5 processes that send/receive these 50 Bytes (it 
>think it's more) extra data at the same time over network ...
>
>Furthermore, multi-gem5 send a header before each data message. 
>Comparing with pd-gem5, pd-gem5 just add 12 Bytes (each time-stamp is 
>12 least significant digits of the Tick) to each data packet. I don't 
>know exactly how large are these "MultiHeaderPkt", but it just has two 
>Tick field that each is 64 Bytes! Also, header packets are separate TCP 
>packets, so you pay for sending two separate packets for each data 
>packet. And worst, you serialize all of these with sync messages.
>
>
>> >
>> >*  Packet handling.
>> >
>> >pd-gem5 uses EtherTap for data packets but changed the polling
>>mechanism
>> >
>> >to go through the main event queue.  Since this rate is actually 
>> >linked
>> >
>> >with simulator progress, it cannot guarantee that the packets are 
>> >serviced
>> >
>> >at regular intervals of real time.  This can lead to packets 
>> >queueing
>>up
>> >
>> >which would contribute to the synchronization issues mentioned above.
>> >
>> >multi-gem5 uses plain sockets with separate receive threads and so 
>> >does not
>> >
>> >have this issue.
>> >
>> >I think again you are pointing to your first concern that I¹ve
>>explained
>> >above. Packets that have queued up in EtherTap socket, will be
>>processed
>> >and delivered to simulation environment at the beginning of next 
>> >simulation quantum.
>> >
>> >Please notice that multi-gem5 introduces a new simObjects to 
>> >interface simulation environment to real world which is redundant. 
>> >This functionality is already there by EtherTap.
>>
>> Except that the EtherTap solution does not provide a correct (robust) 
>> solution for the synchronization problem.
>>
>> Please read my first/second comments.
>
>
>> >
>> >* Checkpoint accuracy.
>> >
>> >A user would like to have a checkpoint at precisely the time the
>> >
>> >'m5 checkpoint' operation is executed so as to not miss any of the
>> >
>> >area of interest in his application.
>> >
>> >pd-gem5 requires that simulation finish the current quantum
>> >
>> >before checkpointing, so it cannot provide this.
>> >
>> >(Shortening the quantum can help, but usually the snapshot is being
>>taken
>> >
>> >while 'fast-forwarding', i.e. simulating as fast as possible, which
>>would
>> >
>> >motivate a longer quantum.)
>> >
>> >multi-gem5 can enter the drain cycle immediately upon receiving a
>> >
>> >checkpoint request.  We find this accuracy highly desirable.
>> >
>> >It¹s true that if you have a large quantum size then there would be
>>some
>> >discrepancy between the m5_ckpt instruction tick and the actual dump
>>tick.
>> >Based on multi-gem5 code, my understanding is that you send async 
>> >checkpoint message as soon as one of the gem5 processes encounter
>>m5_ckpt
>> >instruction. But I¹m not sure how you fix the aforementioned issue, 
>> >because you have to sync all gem5 processes before you start dumping
>>checkpoint,
>> >which necessitate a global synchronization beforehand.
>>
>> In multi-gem5, the gem5 process who encounters the m5_ckpt 
>>instruction  sends out an async checkpoint notification for the peer 
>>gem5 processes and  then it starts the draining immediately (at the 
>>same tick).  So the  checkpoint will be taken at the exact tick form 
>>the initiator process  point of view. The global synchronisation with 
>>the peer processes takes  place while the initiator process is still 
>>waiting at the same tick (i.e  the simulation thread is suspended). 
>>However,  the receiver thread  Continues reading out the socket - 
>>while waiting for the global sync to
>> complete- to make sure that in-flight data packets from peer gem5 
>>processes  are stored properly and saved into the checkpoint.
>>
>>
>So you mean multi-gem5 ends up with having gem5 processes with 
>different ticks after checkpoint? In pd-gem5 we make sure that all gem5 
>processes start dumping checkpoint at the same tick. Are you sure that 
>this is correct to have each gem5 process dump checkpoint at different ticks???
>
>I don't think this a correct checkpointing design. However, if you feel 
>it is correct, I can change a couple of lines in "Simulation.py" and 
>barrier scripts to implement the same functionality in pd-gem5. One 
>thing that you are obsessed about is to make sure that there is no 
>in-flight packets while we start dumping checkpoint, and you have all 
>these complex mechanisms in place to insure that! I think you can 
>99.99999% make sure that there is no in-flight packet by waiting for 1 
>second after all gem5 processes finished their quantum simulation and 
>then dump checkpoint. Do you really think that delivering a tcp packet 
>would take more than 1 second in today's systems!?
>Always go for simple solutions ...
>
>
>
>> >
>> >By the way, we have a fix for this issue by introducing a new m5 
>> >pseudo instruction.
>>
>> I fail to see how a new pseudo instruction can solve the problem of 
>> completing the full quantum in pd-gem5 before a checkpoint can be taken.
>> Could you please elaborate on that?
>>
>> As we take checkpoint while fast-forwarding and it is likely that we 
>>relax
>synchronization for speedup purpose, a new pseudo instruction that can 
>set quantum size (m5_qset) can be helpful. So, one can insert m5_qset 
>in his benchmark source code before entering ROI that contains m5_ckpt 
>to decrease quantum size beforehand and reduce the discrepancy between 
>m5_ckpt tick and actual checkpoint tick. This is not included in 
>pd-gem5 patch right now.
>
>
>> >
>> >* Implementation of network topology.
>> >
>> >pd-gem5 uses a separate gem5 process to act as a switch whereas
>>multi-gem5
>> >
>> >uses a standalone packet relay process.
>> >
>> >We haven't measured the overhead of pd-gem5's simulated switch yet, 
>> >but
>> >
>> >we're confident that our approach is at least as fast and more
>>scalable.
>> >
>> >There is this flexibility in pd-gem5 to simulate a switch box 
>> >alongside one of the other gem5 processes. However, it might make 
>> >that gem5 process
>>the
>> >simulation bottleneck. One of the advantages of pd-gem5 over
>>multi-gem5 is
>> >that we use gem5 to simulate a switch box, which allows us to model 
>> >any network topology by instantiating several Switch simObjects and 
>> >interconnect them with EhterLink in an arbitrary fashion. A 
>> >standalone
>>tcp
>> >server just can provide switch functionality (forwarding packets to
>> >destinations) and model a star network topology. Furthermore, it 
>> >cannot model various network timings such as queueing delay, 
>> >congestion, and routing latency. Also it has some accuracy issues 
>> >that I will point out next.
>>
>> I agree with the complex topology argument. We already mentioned that  
>>before as an advantage for pd-gem5 from the point of view of future  
>>extensions. However, I do not agree that multi-gem5 cannot model 
>>queueing  delays and congestions. For a simple crossbar switch, it can 
>>model queueing  delays and congestions, but the receive queues are 
>>distributed among the
>> gem5 processes.
>>
>> It's true that you can model queuing delay of a simple crossbar by
>distributing queues across gem5 processes (end points). But to be able 
>to do so you have to ensure the ordering of packets that you enqueue in 
>the distributed queues. It is almost impossible without a synchronized 
>switch box. You should have a reorder queue that reorders packets 
>dynamically and updates timing parameter for each packet as well. I 
>don't know how much progress have you had to ensure ordering scheme in 
>multi-gem5 but you may already realized that how complex and error 
>prone it can be. This argument is also related to my next argument for "Broken 
>network timing".
>
>
>> >
>> >* Broken network timing:
>> >
>> >Forwarding packets between gem5 processes using a standalone tcp 
>> >server can cause reordering between packets that have different 
>> >source but same destination. It causes  inaccurate network timing 
>> >and worse of all non-deterministic simulation. pd-gem5 resolve this 
>> >by reordering
>>packets
>> >at
>> >Switch process and then send them to their destination (it¹s 
>> >possible
>>as
>> >switch is synchronized with the rest of the nodes).
>>
>> In multi-gem5, there is always a HeaderPkt that contains some meta  
>>information for each data packet. The meta information include the 
>>send  tick and the sender rank (i.e. a  unique ID of the sender gem5 process).
>> We use those information to define a well defined ordering of packets 
>>even  if packets are arriving at the same receiver from different 
>>senders.
>>This
>> packet ordering scheme is still being tested so the corresponding 
>>patch is  not on the RB yet.
>>
>> Please read my previous comment. The most important part of
>>multi/pd-gem5
>extension is ensuring accurate and deterministic simulation.
>
>
>> >
>> >* Amount of changes
>> >
>> >pd-gem5 introduce different modes in etherlink just to provide 
>> >accurate timing for each component in the network subsystem (NIC, 
>> >link, switch)
>>as
>> >well as capability of modeling different network topologies (mesh,
>>ring,
>> >fat tree, etc). To enable a simple functionality, like what 
>> >multi-gem5 provides, the amount of changes in gem5 can be limited to 
>> >time-stamping packets and providing synchronization through python 
>> >scripts. However,
>> >multi-gem5 re-implements functionalists that are already in gem5.
>>
>> This argument holds only if both implementations are correct (robust).
>>It
>> still seems to me that pd-gem5 does not provide correctness for the  
>>synchronization/checkpointing parts.
>>
>> Again, please read my first comment for correctness of pd-gem5.
>
>
>> >
>> >* Integrating with gem5 mainstream:
>> >
>> >pd-gem5 launch script is written in python which is suited for
>>integration
>> >with gem5 python scripts. However multi-gem5 uses bash script. Also,
>>all
>> >source files in pd-gem5 are already parts of gem5 mainstream. 
>> >However
>> >multi-gem5 has tcp_server.cc/hh that is a standalone process and 
>> >cannot
>> be
>> >part of gem5.
>>
>> The multi-gem5 launch script is simply enough to rely only on the 
>>shell. It  can obviously be easily re-written in python if that added 
>>any value.
>>The
>> tcp_server component is only a utility (like the "m5" utility that is 
>>also  part of gem5).
>>
>> The thing is that it's more likely that users want to add some
>functionality to the run-script of multi/pd-gem5. E.g. pd-gem5 
>run-script supports launching simulations using a simulation pool 
>management software ( http://research.cs.wisc.edu/htcondor/). Using 
>python enables users to easily add these kind of supports.
>
>
>>
>> Cheers,
>> - Gabor
>>
>>
>> >On Fri, Jun 26, 2015 at 8:40 PM, Curtis Dunham 
>> ><curtis.dun...@arm.com>
>> >wrote:
>> >
>> >>Hello everyone,
>> >>We have taken a look at how pd-gem5 compares with multi-gem5.  
>> >>While intending to deliver the same functionality, there are some 
>> >>crucial differences:
>> >>
>> >>*  Synchronization.
>> >>
>> >>    pd-gem5 implements this in Python (not a problem in itself; 
>> >>aesthetically
>> >>    this is nice, but...).  The issue is that pd-gem5's data 
>> >>packets
>>and
>> >>    barrier messages travel over different sockets.  Since pd-gem5
>>could
>> >>see
>> >>    data packets passing synchronization barriers, it could create an
>> >>    inconsistent checkpoint.
>> >>
>> >>    multi-gem5's synchronization is implemented in C++ using sync
>>events,
>> >>but
>> >>    more importantly, the messages queue up in the same stream and 
>> >>so cannot
>> >>    have the issue just described.  (Event ordering is often 
>> >>crucial
>>in
>> >>    snapshot protocols.) Therefore we feel that multi-gem5 is a 
>> >>more robust
>> >>    solution in this respect.
>> >>
>> >>*  Packet handling.
>> >>
>> >>    pd-gem5 uses EtherTap for data packets but changed the polling 
>> >>mechanism
>> >>    to go through the main event queue.  Since this rate is 
>> >>actually linked
>> >>    with simulator progress, it cannot guarantee that the packets 
>> >>are serviced
>> >>    at regular intervals of real time.  This can lead to packets 
>> >>queueing up
>> >>    which would contribute to the synchronization issues mentioned
>>above.
>> >>
>> >>    multi-gem5 uses plain sockets with separate receive threads and 
>> >>so does not
>> >>    have this issue.
>> >>
>> >>* Checkpoint accuracy.
>> >>
>> >>   A user would like to have a checkpoint at precisely the time the
>> >>   'm5 checkpoint' operation is executed so as to not miss any of the
>> >>   area of interest in his application.
>> >>
>> >>   pd-gem5 requires that simulation finish the current quantum
>> >>   before checkpointing, so it cannot provide this.
>> >>
>> >>   (Shortening the quantum can help, but usually the snapshot is 
>> >>being taken
>> >>   while 'fast-forwarding', i.e. simulating as fast as possible, 
>> >>which would
>> >>   motivate a longer quantum.)
>> >>
>> >>   multi-gem5 can enter the drain cycle immediately upon receiving a
>> >>   checkpoint request.  We find this accuracy highly desirable.
>> >>
>> >>* Implementation of network topology.
>> >>
>> >>   pd-gem5 uses a separate gem5 process to act as a switch whereas
>> >>multi-gem5
>> >>   uses a standalone packet relay process.
>> >>
>> >>   We haven't measured the overhead of pd-gem5's simulated switch 
>> >>yet, but
>> >>   we're confident that our approach is at least as fast and more 
>> >>scalable.
>> >>
>> >>
>> >>Thanks,
>> >>Curtis
>> >>________________________________________
>> >>From: gem5-dev [gem5-dev-boun...@gem5.org] On Behalf Of Mohammad
>>Alian [
>> >>al...@wisc.edu]
>> >>Sent: Friday, June 26, 2015 7:37 PM
>> >>To: gem5 Developer List
>> >>Subject: Re: [gem5-dev] pd-gem5: simulating a parallel/distributed 
>> >>system on multiple physical hosts
>> >>
>> >>Hi Anthony,
>> >>
>> >>I think that would be a good option, then I can add pd-gem5 
>> >>functionality on top of that. Right now I've simplified your 
>> >>implementation. Also, I think I had found some bugs in your patch 
>> >>that I cannot remember now.
>>If
>> >>you decided to ship EtherSwitch patch, let me know to give you a
>>review
>> >>on
>> >>that.
>> >>
>> >>Thanks,
>> >>Mohammad
>> >>
>> >>On Thu, Jun 25, 2015 at 8:36 PM, Gutierrez, Anthony < 
>> >>anthony.gutier...@amd.com> wrote:
>> >>
>> >>>Would it make sense for me to ship the EtherSwitch patch first, 
>> >>>since
>> >>it
>> >>>has utility on its own, and then we can decide which of the
>> >>"multi-gem5"
>> >>>approaches is best, or if it's some combination of both?
>> >>>
>> >>>The only reason I never shipped it was because Steve raised an 
>> >>>issue
>> >>that
>> >>>I didn't have a good alternative for, and didn't have the time to
>>look
>> >>into
>> >>>one at that time.
>> >>>________________________________________
>> >>>From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of Mohammad
>> >>Alian [
>> >>>al...@wisc.edu]
>> >>>Sent: Wednesday, June 24, 2015 12:43 PM
>> >>>To: gem5 Developer List
>> >>>Subject: Re: [gem5-dev] pd-gem5: simulating a parallel/distributed
>> >>system
>> >>>on multiple physical hosts
>> >>>
>> >>>Hi Andreas,
>> >>>
>> >>>Thanks for the comment.
>> >>>I think the checkpointing support in both works is the same. Here 
>> >>>is
>> >>how
>> >>>checkpointing support is implemented in pd-gem5:
>> >>>
>> >>>Whenever one of gem5 processes encounter an m5-checkpoint pseudo 
>> >>>instruction, it will send a ³recv-ckpt² signal to the ³barrier² 
>> >>>process. Then the ³barrier² process sends a ³take-ckpt²
>> >>signal
>> >>to
>> >>>all the simulated nodes
>> >>>(including the node that encountered m5-checkpoint) at the end of 
>> >>>the current simulation quantum. On the reception of ³take-ckpt² 
>> >>>signal, gem5 processes start dumping check-points. This
>> >>makes
>> >>>each simulated node dump a checkpoint at the same simulated time 
>> >>>point while ensuring there is no in-flight packets.
>> >>>
>> >>>I believe this is the same as multi-gem5 patch approach for
>>checkpoint
>> >>>support (based on the commit message of
>> >>http://reviews.gem5.org/r/2865/
>> >>).
>> >>>Also, we have tested our mechanism with several benchmarks and it
>> >>works.
>> >>As
>> >>>Steve suggested, I'll look into Curtis's patch and try to review 
>> >>>it
>>as
>> >>>well.
>> >>>But as Nilay also mentioned earlier, there are some codes missing 
>> >>>in Curtis's patch. I prefer to first run multi-gem5 before 
>> >>>starting to
>> >>review
>> >>>it.
>> >>>
>> >>>Thank you,
>> >>>Mohammad
>> >>>
>> >>>On Wed, Jun 24, 2015 at 7:25 AM, Andreas Hansson <
>> >>andreas.hans...@arm.com>
>> >>>wrote:
>> >>>
>> >>>>Hi Steve,
>> >>>>
>> >>>>Apologies for the confusion. We are on the same page. My point is
>> >>that
>> >>we
>> >>>>cannot simply take a little bit of patch A and a little bit of
>> >>patch B.
>> >>>>This change involves a lot of code, and we need to approach this 
>> >>>>in
>> >>a
>> >>>>structured fashion. My proposal is to do it bottom up, and start 
>> >>>>by getting the basic support in place. Since
>> >>>http://reviews.gem5.org/r/2826/
>> >>>>has already been on the review board for a few months, I am 
>> >>>>merely suggesting that the it would be a good start to relate the 
>> >>>>newly
>> >>posted
>> >>>>patches to what is already there.
>> >>>>
>> >>>>Andreas
>> >>>>
>> >>>>
>> >>>>
>> >>>>On 24/06/2015 13:11, "gem5-dev on behalf of Steve Reinhardt"
>> >>>><gem5-dev-boun...@gem5.org on behalf of ste...@gmail.com> wrote:
>> >>>>
>> >>>>>Hi Andreas,
>> >>>>>
>> >>>>>I'm a little confused by your email---you say you're 
>> >>>>>fundamentally
>> >>>opposed
>> >>>>>to looking at both patches and picking the best features, then 
>> >>>>>you
>> >>point
>> >>>>>out that the patches Curtis posted have the feature of better 
>> >>>>>checkpointing support so we should pick that :).
>> >>>>>
>> >>>>>Obviously we can't just pick patch A from Mohammad's set and 
>> >>>>>patch
>> >>B
>> >>>from
>> >>>>>Curtis's set and expect them to work together, but I think that
>> >>having
>> >>>>>both
>> >>>>>sets of patches available and comparing and contrasting the two 
>> >>>>>implementations should enable us to get to a single 
>> >>>>>implementation
>> >>>that's
>> >>>>>the best of both. Someone will have to make the effort of
>> >>integrating
>> >>>the
>> >>>>>better ideas from one set into the other set to create a new
>> >>unified
>> >>set
>> >>>>>of
>> >>>>>patches; (or maybe we commit one set and then integrate the best 
>> >>>>>of
>> >>the
>> >>>>>other set as patches on top of that), but the first step is to
>> >>identify
>> >>>>>what "the best of both" is.  Having Mohammad look at Curtis's
>> >>patches,
>> >>>and
>> >>>>>Curtis (or someone else from ARM) closely examine Mohammad's
>> >>patches
>> >>>would
>> >>>>>be a great start.  I intend to review them both, though
>> >>unfortunately
>> >>my
>> >>>>>time has been scarce lately---I'm hoping to squeeze that in 
>> >>>>>later
>> >>this
>> >>>>>week.
>> >>>>>
>> >>>>>Once we've had a few people look at both, we can discuss the 
>> >>>>>pros
>> >>and
>> >>>cons
>> >>>>>of each, then discuss the strategy for getting the best features
>> >>in.
>> >>So
>> >>>>>far I've heard that Mohammad's patches have a better network 
>> >>>>>model
>> >>but
>> >>>the
>> >>>>>ARM patches have better checkpointing support; that seems like a
>> >>good
>> >>>>>start.
>> >>>>>
>> >>>>>Steve
>> >>>>>
>> >>>>>On Wed, Jun 24, 2015 at 12:26 AM Andreas Hansson <
>> >>>andreas.hans...@arm.com
>> >>>>>
>> >>>>>wrote:
>> >>>>>
>> >>>>>>Hi all,
>> >>>>>>
>> >>>>>>Great work. However, I fundamentally do not believe in the
>> >>approach
>> >>of
>> >>>>>>Œletting reviewers pick the best features¹. There is no way we
>> >>would
>> >>>>>>ever
>> >>>>>>get something working out if it. We need to get _one_ working
>> >>solution
>> >>>>>>here, and figure out how to best get there. I would propose to
>> >>do it
>> >>>>>>bottom up, starting with the basic multi-simulator instance
>> >>support,
>> >>>>>>checkpointing support, and then move on to the network between
>> >>the
>> >>>>>>simulator instances.
>> >>>>>>
>> >>>>>>Thus, I propose we go with the low-level plumbing and 
>> >>>>>>checkpoint
>> >>>support
>> >>>>>>from what Curtis has posted. I believe proper checkpointing
>> >>support
>> >>to
>> >>>>>>be
>> >>>>>>the most challenging, and from what I can tell this is far more
>> >>>limited
>> >>>>>>in
>> >>>>>>what you just posted Mohammad. Could you perhaps review Curtis
>> >>patches
>> >>>>>>based on your insights, and we can try and get these patches in
>> >>shape
>> >>>>>>and
>> >>>>>>committed asap.
>> >>>>>>
>> >>>>>>Once we have the baseline functionality in place, then we can
>> >>start
>> >>>>>>looking at the more elaborate network models.
>> >>>>>>
>> >>>>>>Does this sound reasonable?
>> >>>>>>
>> >>>>>>Thanks,
>> >>>>>>
>> >>>>>>Andreas
>> >>>>>>
>> >>>>>>On 24/06/2015 05:05, "gem5-dev on behalf of Mohammad Alian"
>> >>>>>><gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu> wrote:
>> >>>>>>
>> >>>>>>>Hello All,
>> >>>>>>>
>> >>>>>>>I have submitted a chain of patches which enables gem5 to
>> >>simulate
>> >>a
>> >>>>>>>cluster on multiple physical hosts:
>> >>>>>>>
>> >>>>>>>http://reviews.gem5.org/r/2909/
>> >>>>>>>http://reviews.gem5.org/r/2910/
>> >>>>>>>http://reviews.gem5.org/r/2912/
>> >>>>>>>http://reviews.gem5.org/r/2913/
>> >>>>>>>http://reviews.gem5.org/r/2914/
>> >><http://reviews.gem5.org/r/2914/>
>> >>>>>>>
>> >>>>>>>and a patch that contains run scripts for a simple experiment:
>> >>>>>>>http://reviews.gem5.org/r/2915/
>> >>>>>>>
>> >>>>>>>We have run several benchmarks using this infrastructure,
>> >>including
>> >>>NAS
>> >>>>>>>parallel benchmarks (MPI) and DCBench-hadoop 
>> >>>>>>>(http://prof.ict.ac.cn/DCBench/),
>> >>>>>>>and would be happy to share scripts/diskimages.
>> >>>>>>>
>> >>>>>>>We call this *pd-gem5*. *pd-gem5 *functionality is more or 
>> >>>>>>>less
>> >>the
>> >>>>>>same
>> >>>>>>>as
>> >>>>>>>Curtis's patch for *multi-gem5.* However, I feel *pd-gem5
>> >>*network
>> >>>>>>model
>> >>>>>>>is
>> >>>>>>>more thorough; it also enables modeling different network
>> >>topologies.
>> >>>>>>>Having both set of changes together let reviewers to pick best
>> >>>features
>> >>>>>>>from both works.
>> >>>>>>>
>> >>>>>>>Thank you,
>> >>>>>>>Mohammad Alian
>> >>>>>>>_______________________________________________
>> >>>>>>>gem5-dev mailing list
>> >>>>>>>gem5-dev@gem5.org
>> >>>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>>>>>
>> >>>>>>
>> >>>>>>-- IMPORTANT NOTICE: The contents of this email and any
>> >>attachments
>> >>>are
>> >>>>>>confidential and may also be privileged. If you are not the
>> >>intended
>> >>>>>>recipient, please notify the sender immediately and do not
>> >>disclose
>> >>>the
>> >>>>>>contents to any other person, use it for any purpose, or store 
>> >>>>>>or
>> >>copy
>> >>>>>>the
>> >>>>>>information in any medium.  Thank you.
>> >>>>>>
>> >>>>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
>> >>9NJ,
>> >>>>>>Registered in England & Wales, Company No:  2557590 ARM 
>> >>>>>>Holdings plc, Registered office 110 Fulbourn Road, Cambridge
>> >>CB1
>> >>>>>>9NJ,
>> >>>>>>Registered in England & Wales, Company No:  2548782 
>> >>>>>>_______________________________________________
>> >>>>>>gem5-dev mailing list
>> >>>>>>gem5-dev@gem5.org
>> >>>>>>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>>>>>
>> >>>>>_______________________________________________
>> >>>>>gem5-dev mailing list
>> >>>>>gem5-dev@gem5.org
>> >>>>>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>>>
>> >>>>
>> >>>>-- IMPORTANT NOTICE: The contents of this email and any 
>> >>>>attachments
>> >>are
>> >>>>confidential and may also be privileged. If you are not the 
>> >>>>intended recipient, please notify the sender immediately and do 
>> >>>>not disclose
>> >>the
>> >>>>contents to any other person, use it for any purpose, or store or
>> >>copy
>> >>>the
>> >>>>information in any medium.  Thank you.
>> >>>>
>> >>>>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 
>> >>>>9NJ, Registered in England & Wales, Company No:  2557590 ARM 
>> >>>>Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>> >>9NJ,
>> >>>>Registered in England & Wales, Company No:  2548782 
>> >>>>_______________________________________________
>> >>>>gem5-dev mailing list
>> >>>>gem5-dev@gem5.org
>> >>>>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>>>
>> >>>_______________________________________________
>> >>>gem5-dev mailing list
>> >>>gem5-dev@gem5.org
>> >>>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>>_______________________________________________
>> >>>gem5-dev mailing list
>> >>>gem5-dev@gem5.org
>> >>>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>>
>> >>_______________________________________________
>> >>gem5-dev mailing list
>> >>gem5-dev@gem5.org
>> >>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>
>> >>-- IMPORTANT NOTICE: The contents of this email and any attachments
>>are
>> >>confidential and may also be privileged. If you are not the 
>> >>intended recipient, please notify the sender immediately and do not 
>> >>disclose
>>the
>> >>contents to any other person, use it for any purpose, or store or 
>> >>copy the information in any medium.  Thank you.
>> >>
>> >>ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 
>> >>9NJ, Registered in England & Wales, Company No:  2557590 ARM 
>> >>Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 
>> >>9NJ, Registered in England & Wales, Company No:  2548782
>> >>
>> >>_______________________________________________
>> >>gem5-dev mailing list
>> >>gem5-dev@gem5.org
>> >>http://m5sim.org/mailman/listinfo/gem5-dev
>> >>
>> >_______________________________________________
>> >gem5-dev mailing list
>> >gem5-dev@gem5.org
>> >http://m5sim.org/mailman/listinfo/gem5-dev
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments 
>>are  confidential and may also be privileged. If you are not the 
>>intended  recipient, please notify the sender immediately and do not 
>>disclose the  contents to any other person, use it for any purpose, or 
>>store or copy the  information in any medium.  Thank you.
>>
>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,  
>>Registered in England & Wales, Company No:  2557590  ARM Holdings plc, 
>>Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,  Registered in 
>>England & Wales, Company No:  2548782  
>>_______________________________________________
>> gem5-dev mailing list
>> gem5-dev@gem5.org
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
>_______________________________________________
>gem5-dev mailing list
>gem5-dev@gem5.org
>http://m5sim.org/mailman/listinfo/gem5-dev


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England & Wales, Company No:  2557590 ARM Holdings plc, Registered office 
110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company 
No:  2548782 _______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to