On Thu, Jan 26, 2012 at 10:38 AM, Friedley, Andrew W. <[email protected]>wrote:
> Hi, > > I'm getting started with Marssx86 doing some simulation of MPI codes in a > single node with large numbers of cores. When doing some initial runs, I > ran into a segfault in the simulator (marss 0.2.1) > > (qemu) simconfig -run -quiet -machine shared_l2 > simulation options received:-run -quiet -machine shared_l2 > (qemu) > root@ubuntu:~/svn/afriedle/miniMD-0.1# mpirun -np 1 ./md > qemu-system-x86_64: ptlsim/build/core/ooo-core/ooo.cpp:1301: bool > ooo::ThreadContext::handle_exception(): Assertion `ctx.page_fault_addr != > 0' failed. > Aborted (core dumped) > > I'll paste the gdb backtrace at the end of the email. This occurs with > just 1 MPI rank and appears to die early in the program. I can run w/o the > cycle accurate simulation, and everything works. I compiled marss with c=4 > cores and using the provided splash.img, though I updated ubuntu to lucid > and am using their stock kernel and Open MPI. I can reproduce the segfault > with marss 0.2.1 and the current git. > > Any idea what is going on? > > We have never run Open MPI so no idea why this is crashing. From the backtrace it seems like some code is trying to access memory from address 0. This is a bug but no idea why its getting the address 0. Can you generate the log with loglevel set to 10 (compile with 'debug=1' to enable logging) and send me the logfile. It will be easier to find the issue from logfile. - Avadh Thanks, > > Andrew > > Program terminated with signal 6, Aborted. > #0 0x00002aaaadc4e265 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > > (gdb) bt > #0 0x00002aaaadc4e265 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x00002aaaadc4fd10 in abort () at abort.c:88 > #2 0x00002aaaadc476e6 in __assert_fail ( > assertion=0x7b223a "ctx.page_fault_addr != 0", > file=0x7b2cf0 "ptlsim/build/core/ooo-core/ooo.cpp", line=1301, > function=0x7b3900 "bool ooo::ThreadContext::handle_exception()") > at assert.c:78 > #3 0x00000000005c1913 in ooo::ThreadContext::handle_exception > (this=0x2f38a60) > at ptlsim/build/core/ooo-core/ooo.cpp:1301 > #4 0x00000000005c28ba in ooo::OooCore::runcycle (this=0x2f23bd0) > at ptlsim/build/core/ooo-core/ooo.cpp:749 > #5 0x000000000065ea6b in BaseMachine::run (this=0x123bfc0, config=...) > at ptlsim/build/sim/machine.cpp:164 > #6 0x000000000066b337 in ptl_simulate () at > ptlsim/build/sim/ptlsim.cpp:1244 > #7 0x000000000057bfc8 in sim_cpu_exec () at qemu/cpu-exec.c:310 > #8 0x000000000041a96e in main_loop (argc=10, argv=<value optimized out>, > envp=<value optimized out>) at qemu/vl.c:1451 > #9 main (argc=10, argv=<value optimized out>, envp=<value optimized out>) > at qemu/vl.c:3186 > > > _______________________________________________ > http://www.marss86.org > Marss86-Devel mailing list > [email protected] > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel >
_______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
