>(gdb) bt >#0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from >/lib64/tls/libpthread.so.0 >#1 0x000000000068db20 in ?? () >#2 0x0000000060040a0a in ?? () >#3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from >/lib64/tls/libpthread.so.0 >#4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at >src/cma.c:403 >#5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425 >#6 0x0000000000423ef9 in ib_finalize_rdma_cm () >#7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize () >#8 0x000000000044b03b in MPIDI_CH3_Finalize () >#9 0x000000000043169e in MPID_Finalize () >#10 0x000000000040c3ef in PMPI_Finalize () >#11 0x0000000000403af4 in main () >(gdb) > >I'm not sure I belive this stack trace fully, because >ucm_destroy_kern_id() doesn't call pthread_cond_destroy(). However >rdma_destroy_id() does. So I'm thinking that ucma_destroy_id() has >already been executed and rdma_destroy_id() is freeing the cm_id and we >get stuck in pthread_cond_destroy() destroying the pthread condition object. > >I'm wondering if ya'll have ever seen this kind of hang? I can kill the > process and it exits, so I don't think we're stuck down in the >kernel IWCM or anything. > >Any thoughts?
I haven't seen any hangs like this, but I will perform a code inspection to see if any issues can be found. - Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
