Again you need to look at log.* to find out why the simulation gets killed. Don't only look at log.switch. If one of the gem5 processes aborts then the entire dist-gem5 simulation will be killed.
On Wed, Dec 6, 2017 at 1:50 PM, Vitorio Cargnini (lcargnini) < [email protected]> wrote: > Hi Mohammad, > > > > Thank you for the prompt response. I checked the log.switch the first > erros and I fixed was the path, the script needs full-paths to work, so, I > fixed that, once I tried again, it executed and failed a little later. > > > > Got the following output: > > launch switch gem5 process on node0 ... > > waiting for switch to start .. > > node #switch started > > START Wed Dec 6 12:36:04 MST 2017 > > starting gem5 on node0... > > starting gem5 on node0... > > starting gem5 on node1... > > starting gem5 on node1... > > starting gem5 on node2 ... > > starting gem5 on node2 ... > > starting gem5 on node3 ... > > starting gem5 on node3 ... > > (I) (some) gem5 process(es) exited > > KILLED Wed Dec 6 12:37:35 MST 2017 > > ABORT Wed Dec 6 12:37:35 MST 2017 > > > > The log.switch had the following: > > command line: /wada/wada/gem5/build/ARM/gem5.opt -d > /wada/wada/gem5/m5out.switch --debug-flags=DistEthernet > /wada/wada/gem5/configs/dist/sw.py > --checkpoint-dir=/wada/wada/gem5/m5out.switch > --is-switch --dist-size=8 --dist-server-port=2200 > > > > info: Standard input is not a terminal, disabling listeners. > > Global frequency set at 1000000000000 ticks per second > > 0: system.portlink0: DistEtherLink::DistEtherLink() link > delay:10000000 ticksPerByte:800 > > 0: global: DistIface() ctor rank:0 > > info: tcp_iface listening on port 2200 > > Killed by signal 15. > > > > *From:* gem5-users [mailto:[email protected]] *On Behalf Of > *Mohammad > Alian > *Sent:* Tuesday, December 5, 2017 9:18 PM > *To:* gem5 users mailing list <[email protected]> > *Subject:* [EXT] Re: [gem5-users] Running Dist-gem5 > > > > Hi Vitorio, > > > > You should check the content of log.switch and why gem5 node simulating > switch cannot start. There can be so many reasons that a gem5 process fails > to run. If you print the content of switch.log here then I can help. > > > > Regarding "distributed run", you first need to setup passwordless ssh > between your simulation (physical) hosts and then use "LSB_MCPU_HOSTS" env > variable to assign gem5 processes to physical hosts. E.g. if your simulated > cluster size is 8 and you want to run 4 gem5 processes on host_name0 and 4 > on host_name1, then your LSB_MCPU_HOSTS looks like this: > > > > export LSB_MCPU_HOSTS="host_name0 4 host_name1 4" > > > > > > Best, > > Mohammad > > > > > > On Tue, Dec 5, 2017 at 6:03 PM, Vitorio Cargnini (lcargnini) < > [email protected]> wrote: > > Hello, > > > > Please, what exactly do I need to run dist-gem5 with the –-dist? > > > > I’m trying, however it fails with “Failed ot start switch” > > > > Also, what do I need in place for it start distributed acroos nodes, > instead of launching multiple/parallel runs in the ‘localhost’. > > > > Regards, > > Vitorio. > > > > > > > > > > > > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
