Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

Gilles Gouaillardet via devel Mon, 11 Nov 2019 16:54:36 -0800

John,

OMPI_LAZY_WAIT_FOR_COMPLETION(active)

is a simple loop that periodically checks the (volatile) "active"condition, that is expected to be updated by an other thread.So if you set your breakpoint too early, and **all** threads are stoppedwhen this breakpoint is hit, you might experience

what looks like a race condition.

I guess a similar scenario can occur if the breakpoint is set inmpirun/orted too early, and prevents the pmix (or oob/tcp) thread

from sending the message to all MPI tasks)



Ralph,

does the v4.0.x branch still need the oob/tcp progress thread runninginside the MPI app?or are we missing some commits (since all interactions with mpirun/ortedare handled by PMIx, at least in the master branch) ?


Cheers,

Gilles

On 11/12/2019 9:27 AM, Ralph Castain via devel wrote:

Hi John
Sorry to say, but there is no way to really answer your question asthe OMPI community doesn't actively test MPIR support. I haven't seenany reports of hangs during MPI_Init from any release series,including 4.x. My guess is that it may have something to do with thedebugger interactions as opposed to being a true race condition.
Ralph
On Nov 8, 2019, at 11:27 AM, John DelSignore via devel<devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>> wrote:
Hi,
An LLNL TotalView user on a Mac reported that their MPI job washanging inside MPI_Init() when started under the control ofTotalView. They were using Open MPI 4.0.1, and TotalView was usingthe MPIR Interface (sorry, we don't support the PMIx debugging hooksyet).
I was able to reproduce the hang on my own Linux system with my ownbuild of Open MPI 4.0.1, which I built with debug symbols. As far asI can tell, there is some sort of race inside of Open MPI 4.0.1,because if I placed breakpoints at certain points in the Open MPIcode, and thus change the timing slightly, that was enough to avoidthe hang.
When the code hangs, it appeared as if one or more MPI processes arewaiting inside ompi_mpi_init() at line ompi_mpi_init.c#904 for afence to be released. In one of the runs, rank 0 was the only one thewas hanging there (though I have seen runs where two ranks were hungthere).
Here's a backtrace of the first thread in the rank 0 process in thecase where one rank was hung:
d1.<> f 10.1 w
> 0 __nanosleep_nocancel PC=0x7ffff74e2efd, FP=0x7fffffffd1e0[/lib64/libc.so.6]
   1 usleep PC=0x7ffff7513b2f, FP=0x7fffffffd200 [/lib64/libc.so.6]
2 ompi_mpi_init PC=0x7ffff7a64009, FP=0x7fffffffd350[/home/jdelsign/src/tools-external/openmpi-4.0.1/ompi/runtime/ompi_mpi_init.c#904] 3 PMPI_Init PC=0x7ffff7ab0be4, FP=0x7fffffffd390[/home/jdelsign/src/tools-external/openmpi-4.0.1-lid/ompi/mpi/c/profile/pinit.c#67] 4 main PC=0x00400c5e, FP=0x7fffffffd550[/home/jdelsign/cpi.c#27] 5 __libc_start_main PC=0x7ffff7446b13, FP=0x7fffffffd610[/lib64/libc.so.6] 6 _start PC=0x00400b04, FP=0x7fffffffd618[/amd/home/jdelsign/cpi]
Here's the block of code where the thread is hung:

    /* if we executed the above fence in the background, then
     * we have to wait here for it to complete. However, there
     * is no reason to do two barriers! */
    if (background_fence) {
OMPI_LAZY_WAIT_FOR_COMPLETION(active);
    } else if (!ompi_async_mpi_init) {
        /* wait for everyone to reach this point - this is a hard
         * barrier requirement at this time, though we hope to relax
         * it at a later point */
        if (NULL != opal_pmix.fence_nb) {
            active = true;
OPAL_POST_OBJECT(&active);
            if (OMPI_SUCCESS != (ret = opal_pmix.fence_nb(NULL, false,
fence_release, (void*)&active))) {
                error = "opal_pmix.fence_nb() failed";
                goto error;
            }
OMPI_LAZY_WAIT_FOR_COMPLETION(active); *<<<<----- STUCK HERE WAITINGFOR THE FENCE TO BE RELEASED*
        } else {
            if (OMPI_SUCCESS != (ret = opal_pmix.fence(NULL, false))) {
                error = "opal_pmix.fence() failed";
                goto error;
            }
        }
    }
And here is an aggregated backtrace of all of the processes andthreads in the job:
d1.<> f g w -g f+l
+/
 +__clone : 5:12[0-3.2-3, p1.2-5]
 |+start_thread
| +listen_thread@oob_tcp_listener.c<mailto:listen_thread@oob_tcp_listener.c>#705 : 1:1[p1.5]
 | |+__select_nocancel
| +listen_thread@ptl_base_listener.c<mailto:listen_thread@ptl_base_listener.c>#214 : 1:1[p1.3]
 | |+__select_nocancel
| +progress_engine@opal_progress_threads.c<mailto:progress_engine@opal_progress_threads.c>#105 : 5:5[0-3.2, p1.4] | |+opal_libevent2022_event_base_loop@event.c<mailto:opal_libevent2022_event_base_loop@event.c>#1632
 | | +poll_dispatch@poll.c <mailto:poll_dispatch@poll.c>#167
 | |  +__poll_nocancel
| +progress_engine@pmix_progress_threads.c<mailto:progress_engine@pmix_progress_threads.c>#108 : 5:5[0-3.3, p1.2] | +opal_libevent2022_event_base_loop@event.c<mailto:opal_libevent2022_event_base_loop@event.c>#1632
 |   +epoll_dispatch@epoll.c <mailto:epoll_dispatch@epoll.c>#409
 |    +__epoll_wait_nocancel
 +_start : 5:5[0-3.1, p1.1]
  +__libc_start_main
   +main@cpi.c <mailto:main@cpi.c>#27 : 4:4[0-3.1]
   |+PMPI_Init@pinit.c <mailto:PMPI_Init@pinit.c>#67
| +*ompi_mpi_init@ompi_mpi_init.c#890 : 3:3[1-3.1]**<<<<---- THE 3OTHER MPI PROCS MADE IT PAST FENCE* | |+ompi_rte_wait_for_debugger@rte_orte_module.c<mailto:ompi_rte_wait_for_debugger@rte_orte_module.c>#196 | | +opal_progress@opal_progress.c<mailto:opal_progress@opal_progress.c>#251 | | +opal_progress_events@opal_progress.c<mailto:opal_progress_events@opal_progress.c>#191 | | +opal_libevent2022_event_base_loop@event.c<mailto:opal_libevent2022_event_base_loop@event.c>#1632
   | |    +poll_dispatch@poll.c <mailto:poll_dispatch@poll.c>#167
   | |     +__poll_nocancel
| +*ompi_mpi_init@ompi_mpi_init.c#904 : 1:1[0.1]**<<<<----**THETHREAD THAT IS STUCK*
   |  +usleep
   |   +__nanosleep_nocancel
   +main@main.c <mailto:main@main.c>#14 : 1:1[p1.1]
    +orterun@orterun.c <mailto:orterun@orterun.c>#200
+opal_libevent2022_event_base_loop@event.c<mailto:opal_libevent2022_event_base_loop@event.c>#1632
      +poll_dispatch@poll.c <mailto:poll_dispatch@poll.c>#167
       +__poll_nocancel

d1.<>
I have tested Open MPI 4.0.2 dozens of times, and the hang does notseem to happen. My concern is that if the problem is indeed a race,then it's /possible/ (but perhaps not likely) that the same raceexists in Open MPI 4.0.2, but the timing could be slightly differentsuch that it doesn't hang using my simple test setup. In other words,maybe I've just been "lucky" with my testing of Open MPI 4.0.2 andhave failed to provoke the hang yet.
My question is: Was this a known problem in Open MPI 4.0.1 that wasfixed in Open MPI 4.0.2?
Thanks, John D.

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

Reply via email to