Hi Richard, You wrote below:
“As noted earlier, mpirun in will fail in gem5 without a list of hosts.” This should not happen. Without a list of hosts, mpirun should launch all the mpi processes on the ‘localhost’ (i.e. where mpirun is running). mpirun is using ssh to start new processes. Make sure that ssh is working before you try mpirun. E.g. you try to run from your rcS script: ‘ssh localhost ls’ to check if ssh works locally and ‘ssh 10.0.0.X ls’ To check if ssh can launch a new process on a remote host. If ssh does not work, check if sshd is started from the linux init. - Gabor From: gem5-users <gem5-users-boun...@gem5.org> on behalf of "Afoakwa, Richard" <rafoa...@ur.rochester.edu> Reply-To: gem5 users mailing list <gem5-users@gem5.org> Date: Friday, 7 September 2018 at 20:10 To: gem5 users mailing list <gem5-users@gem5.org> Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device. Gabor, Thanks very much for your response. Using the vanilla version, it appears to me that the error message was due to the fact that I was not including the host ip address(es) with my mpirun calls. Thanks again. As a secondary question. I am trying to understand the basic framework of dist-gem5. From what I infer "gem5-dist.sh" script launches gem5 FS processes (using the same *.rcS script and linux image) onto dedicated machines. Using the *.rcS script, each gem5 process updates the network configuration of the "image". For example, in the tutorials, this is done using the line; /sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED} 10.0.0.${MY_ADDR} Subsequently, the base gem5 process (the one with RANK = 0), can ping the other processes (as evident in the tutorial screenshot). Assuming all this works without error, and am trying to run an mpi application, the RANK0 gem5 process needs a list of hosts to execute mpirun. As noted earlier, mpirun in will fail in gem5 without a list of hosts. For this purpose, pass the list of host ip address, 10.0.0.${MY_ADDR}, to mpirun. But I keep receiving connection refused error messages after mpirun starts. Trying different ports does not work either. I would be grateful if anyone can provide some direction on this. Thanks. Richard ________________________________ From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Gabor Dozsa <gabor.do...@arm.com> Sent: Thursday, August 30, 2018 12:20:53 PM To: gem5 users mailing list Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device. Hi Richard, I would suggest you to try to run the same MPI app on a single simulated system first to see if it is a dist-gem5 specific issue or not. Simply use vanilla gem5 instead of dist-gem5 with exactly the same configuration (e.g. gem5 flags, kernel, disk image, etc.). You will need to remove the dist-gem5 and ethernet config commands from the bootscript but the mpirun command line should just work as it is. - Gabor From: gem5-users <gem5-users-boun...@gem5.org> on behalf of "Afoakwa, Richard" <rafoa...@ur.rochester.edu> Reply-To: gem5 users mailing list <gem5-users@gem5.org> Date: Thursday, 30 August 2018 at 15:57 To: "gem5-users@gem5.org" <gem5-users@gem5.org> Subject: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device. Hi all, this is my first time using dist-gem5, but I have a working knowledge of gem5. I have everything setup correctly, I think, but I keep getting the following panic message: "No 32bit reads implemented for this device. Offset 0x44", and I have run out of ideas to fix or work around it. The testsys.terminal outputs suggests that the images are all loaded correctly and things run fine until it gets to calling executing application. I have updated the image to include the mpi libraries so that I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a VM, I can run the application just fine with mpirun. But it keep getting this panic message when it's run inside dist-gem5. I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img, the vm is vmlinux.aarch64.20140821, and the dtb is express.aarch64.20140821.dtb. Here are the text outputs; ***** rcS ***** # -------------------------------------------- # ------ Start your tests below ... --------- # -------------------------------------------- ## Start workload NUM_CORES=$(/sbin/m5 initparam num-cpus) echo "Num-Cores: $NUM_CORES" echo "[RKA] Load modules and set omp threads..." export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use echo "[RKA] Start work..." if [ "$MY_RANK" == "0" ] then echo "[RKA] Stats dump and rest..." /sbin/m5 dumpstats /sbin/m5 resetstats echo "[RKA] Starting workload..." cd /benchmarks/lulesh mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10 /sbin/m5 exit 1 else printf "Wait for main to finish ...\n" while /bin/true do sleep 5 printf "." done fi ***** m5out.0/testsys.terminal ***** [RKA] bootscript.rcS running [RKA] Rank: 0 [RKA] Size: 2 [RKA] Address: 02 [RKA] Set ethernet config... [ 3.600382] CPU3: failed to come online [RKA] Display updated config... eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Preparing hosts for mpirun. Rank: 0 of 2 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms --- 192.168.0.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data. 64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms --- 192.168.0.3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms Num-Cores: 2 [RKA] Load modules and set omp threads... [RKA] Start work... [RKA] Stats dump and rest... [RKA] Starting workload... [ 4.620381] CPU2: failed to come online ***** m5out.1/testsys.terminal ***** [RKA] bootscript.rcS is running [RKA] Rank: 1 [RKA] Size: 2 [RKA] Address: 03 [RKA] Set ethernet config... [ 3.600382] CPU3: failed to come online [RKA] Display updated config... eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03 inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Preparing hosts for mpirun. Rank: 1 of 2 Num-Cores: 2 [RKA] Load modules and set omp threads... [RKA] Start work... Wait for main to finish ... [ 4.620382] CPU2: failed to come online ***** Log.0 ***** command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.0 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200 info: Standard input is not a terminal, disabling listeners. Global frequency set at 1000000000000 ticks per second 0: etherlink: Switch Link created. Delay: 10000000, Speed: 800 0: global: DistIface() ctor rank:0 warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes) info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821 warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match. warn: Sockets disabled, not accepting vnc client connections warn: Sockets disabled, not accepting terminal connections 0: etherlink: DistEtherLink::init() called … … … 18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d 18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts 18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3 panic: No 32bit reads implemented for this device. Offset 0x44 Memory Usage: 1243356 KBytes Program aborted at tick 18372912712000 --- BEGIN LIBC BACKTRACE --- ***** log.1 ***** command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.1 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200 info: Standard input is not a terminal, disabling listeners. Global frequency set at 1000000000000 ticks per second 0: etherlink: Switch Link created. Delay: 10000000, Speed: 800 0: global: DistIface() ctor rank:1 warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes) info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821 warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match. warn: Sockets disabled, not accepting vnc client connections warn: Sockets disabled, not accepting terminal connections 0: etherlink: DistEtherLink::init() called … … … 18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD 18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d 18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts 18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3 info: recv(): Connection closed Exiting @ tick 18372920000000 because connection to gem5 peer got closed Any help would be appreciated. Thanks, Richard IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users