FWIW: we fixed this recently in the openib BTL by ensuring that all
registered memory is freed during the BTL finalize (vs. the mpool
finalize).
This is a new issue because the mpool finalize was just recently
expanded to un-register all of its memory as part of the NIC-restart
effort (and will likely also be needed for checkpoint/restart...?).
On Aug 13, 2007, at 9:11 AM, Tim Prins wrote:
Hi folks,
I have run into a problem with mca_mpool_rdma_finalize as
implemented in
r15557. With the t_win onesided test, running over gm, it
segfaults. What
appears to be happening is that some memory is registered with gm,
and then
gets freed by mca_mpool_rdma_finalize. But the free function that
it is using
is in the gm btl, and the btls are unloaded before the mpool is
shut down. So
the function call segfaults.
If I change the code so we never unload the btls (and we don't free
the gm
port), it works fine.
Note that the openib btl works just fine.
Forgive me if this is a known problem, I am trying to catch up from my
vacation...
Tim
---
If anyone cares, here is the callstack:
(gdb) bt
#0 0x404de825 in ?? () from /lib/libgcc_s.so.1
#1 0x4048081a in mca_mpool_rdma_finalize (mpool=0x925b690)
at mpool_rdma_module.c:431
#2 0x400caca9 in mca_mpool_base_close () at base/
mpool_base_close.c:57
#3 0x40060094 in ompi_mpi_finalize () at runtime/
ompi_mpi_finalize.c:304
#4 0x4009a4c9 in PMPI_Finalize () at pfinalize.c:44
#5 0x08049946 in main (argc=1, argv=0xbfe16924) at t_win.c:214
(gdb)
gdb shows that at this point the gm btl is no longer loaded.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems