Short answer: we need the "extra" decrement at the end of MPI init.
Long answer: Ok, so I was somewhat wrong :). The count of users is initialized to 0. If it's greater than zero, the event library is polled every time opal_progress() is called, which kills latency (surprised this didn't show up in testing). It's really quite pointless to a runtime library or portability library to not poll the event library every time (particularly since the primary communication mechanisms in the runtime library use the event library), so opal_init() increases the counter to 1. So by the time anything interesting in MPI_INIT happens, the counter is set to 1, and every call to opal_progress results in a call to the event library. The decrement in MPI_INIT was to "undo" the initialization increment, so that things would run fast from end of MPI_INIT to start of MPI_FINALIZE unless some other piece of OMPI knew it needed fast run-time interactions (such as the DPM or the TCP-based BTLs). Of course, during MPI_FINALIZE, we need to "undo" the go-fast options we changed during the end of MPI_INIT, which is why there's an increment early in finalize. Brian On Nov 22, 2010, at 12:27 PM, Jeff Squyres wrote: > On Nov 22, 2010, at 11:35 AM, Barrett, Brian W wrote: > >> Um, the counter starts initialized at one. > > Does that mean that we should or should not leave that extra _decrement() in > there? > >> Brian >> >> On Nov 22, 2010, at 9:32 AM, Jeff Squyres wrote: >> >>> A user noticed a specific change that we made between 1.4.2 and 1.4.3: >>> >>> https://svn.open-mpi.org/trac/ompi/changeset/23448 >>> >>> which is from CMR https://svn.open-mpi.org/trac/ompi/ticket/2489, and >>> originally from trunk https://svn.open-mpi.org/trac/ompi/changeset/23434. >>> I removed the opal_progress_event_users_decrement() from ompi_mpi_init() >>> because the ORTE DPM does its own _increment() and _decrement(). >>> >>> However, it seems that there was an unintended consequence of this -- look >>> at the annotated Ganglia graph that the user sent (see attached). In >>> 1.4.2, all of the idle time was "user" CPU usage. In 1.4.3, it's split >>> between user and system CPU usage. The application that he used to test is >>> basically an init / finalize test (with some additional MPI middleware). >>> See: >>> >>> http://www.open-mpi.org/community/lists/users/2010/11/14773.php >>> >>> Can anyone think of why this occurs, and/or if it's a Bad Thing? >>> >>> If removing this decrement enabled a bunch more system CPU time, that would >>> seem to imply that we're calling libevent more frequently than we used to >>> (vs. polling the opal event callbacks), and therefore that there might now >>> be an unmatched increment somewhere. >>> >>> Right...? >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> <openmpi143.jpeg><ATT00002..txt> >> >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National Laboratories >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories