On Thu, Jan 26, 2012 at 10:38 AM, Friedley, Andrew W.
<[email protected]>wrote:

> Hi,
>
> I'm getting started with Marssx86 doing some simulation of MPI codes in a
> single node with large numbers of cores.  When doing some initial runs, I
> ran into a segfault in the simulator (marss 0.2.1)
>
> (qemu) simconfig -run -quiet -machine shared_l2
> simulation options received:-run -quiet -machine shared_l2
> (qemu)
> root@ubuntu:~/svn/afriedle/miniMD-0.1# mpirun -np 1 ./md
> qemu-system-x86_64: ptlsim/build/core/ooo-core/ooo.cpp:1301: bool
> ooo::ThreadContext::handle_exception(): Assertion `ctx.page_fault_addr !=
> 0' failed.
> Aborted (core dumped)
>
> I'll paste the gdb backtrace at the end of the email.  This occurs with
> just 1 MPI rank and appears to die early in the program.  I can run w/o the
> cycle accurate simulation, and everything works.  I compiled marss with c=4
> cores and using the provided splash.img, though I updated ubuntu to lucid
> and am using their stock kernel and Open MPI.  I can reproduce the segfault
> with marss 0.2.1 and the current git.
>
> Any idea what is going on?
>
> We have never run Open MPI so no idea why this is crashing. From the
backtrace it seems like some code is trying to access memory from address
0.  This is a bug but no idea why its getting the address 0. Can you
generate the log with loglevel set to 10  (compile with 'debug=1' to enable
logging)  and send me the logfile.  It will be easier to find the issue
from logfile.

- Avadh

Thanks,
>
> Andrew
>
> Program terminated with signal 6, Aborted.
> #0  0x00002aaaadc4e265 in raise (sig=<value optimized out>)
>    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>
> (gdb) bt
> #0  0x00002aaaadc4e265 in raise (sig=<value optimized out>)
>    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00002aaaadc4fd10 in abort () at abort.c:88
> #2  0x00002aaaadc476e6 in __assert_fail (
>    assertion=0x7b223a "ctx.page_fault_addr != 0",
>    file=0x7b2cf0 "ptlsim/build/core/ooo-core/ooo.cpp", line=1301,
>    function=0x7b3900 "bool ooo::ThreadContext::handle_exception()")
>    at assert.c:78
> #3  0x00000000005c1913 in ooo::ThreadContext::handle_exception
> (this=0x2f38a60)
>    at ptlsim/build/core/ooo-core/ooo.cpp:1301
> #4  0x00000000005c28ba in ooo::OooCore::runcycle (this=0x2f23bd0)
>    at ptlsim/build/core/ooo-core/ooo.cpp:749
> #5  0x000000000065ea6b in BaseMachine::run (this=0x123bfc0, config=...)
>    at ptlsim/build/sim/machine.cpp:164
> #6  0x000000000066b337 in ptl_simulate () at
> ptlsim/build/sim/ptlsim.cpp:1244
> #7  0x000000000057bfc8 in sim_cpu_exec () at qemu/cpu-exec.c:310
> #8  0x000000000041a96e in main_loop (argc=10, argv=<value optimized out>,
>    envp=<value optimized out>) at qemu/vl.c:1451
> #9  main (argc=10, argv=<value optimized out>, envp=<value optimized out>)
>    at qemu/vl.c:3186
>
>
> _______________________________________________
> http://www.marss86.org
> Marss86-Devel mailing list
> [email protected]
> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to