Hi Folks, I'm seeing again a case of a hang (yes I'm going to start using timeout) of a two process run on the iu jenkins server for master. This is the --disable-dlopen jenkins project for the IU jenkins server.
I attached to the hanging processes and get this for a backtrace: #0 0x00007fdd4ca7ae94 in recv () from /lib64/libpthread.so.0 #1 0x00007fdd4bab622a in opal_pmix_pmix1xx_pmix_usock_recv_blocking (sd=13, data=0x7fff9342fb78 "&", size=4) at src/usock/usock.c:157 #2 0x00007fdd4babad69 in recv_connect_ack (sd=13) at src/client/pmix_client.c:777 #3 0x00007fdd4babbc59 in usock_connect (addr=0x7fff9342fe80) at src/client/pmix_client.c:1026 #4 0x00007fdd4bab88ae in connect_to_server (address=0x7fff9342fe80, cbdata=0x7fff9342fc30) at src/client/pmix_client.c:177 #5 0x00007fdd4bab90f7 in OPAL_PMIX_PMIX1XX_PMIx_Init (proc=0x7fdd4c2e9820 <myproc>) at src/client/pmix_client.c:329 #6 0x00007fdd4bff1892 in pmix1_client_init () at pmix1_client.c:58 #7 0x00007fdd4c37ce1d in pmi_component_query (module=0x7fff9342ffd0, priority=0x7fff9342ffcc) at ess_pmi_component.c:89 #8 0x00007fdd4bf54c38 in mca_base_select (type_name=0x7fdd4c45e5b9 "ess", output_id=-1, components_available=0x7fdd4c6b21d0 <orte_ess_base_framework+80>, best_module=0x7fff93430000, best_component=0x7fff93430008) at mca_base_components_select.c:73 #9 0x00007fdd4c373f0d in orte_ess_base_select () at base/ess_base_select.c:39 #10 0x00007fdd4c312fed in orte_init (pargc=0x0, pargv=0x0, flags=32) at runtime/orte_init.c:221 #11 0x00007fdd4d788e26 in ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fff934300fc) at runtime/ompi_mpi_init.c:468 #12 0x00007fdd4d7be27a in PMPI_Init (argc=0x7fff93430138, argv=0x7fff93430130) at pinit.c:84 #13 0x00007fdd4dce515e in ompi_init_f (ierr=0x7fff9343043c) at pinit_f.c:82 #14 0x0000000000400dff in MAIN__ () #15 0x0000000000400f38 in main () This seems to only happen periodically. Any suggestions on how to further analyze? Howard