Gleb Natapov wrote:
On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote:
Gleb,
in the scenario which you describe in the comment to the patch, what
should happen is, that the communicator with the cid which started
already the allreduce will basically 'hang' until the other processes
'allow' the lower cids to continue. It should basically be blocked in
the allreduce.
Why? Two threads are allowed to run allreduce simultaneously for different
communicators. Are they?
They are, but they might never agree on the cid. This is simply how the
algorithm was designed originally -- which does not mean, that it has to
remain this way, just to explain its behavior and the intent. See the
design doc for that in ompi-docs in the January 2004 repository.
Lets assume, that we have n procs with 2 threads, and both threads do a
comm_create at the same time, input cid 1 and cid 2. N-1 processes let
cid 1 start because that's the lower number. However, one process let
cid 2 start because the other thread was late. What would happen in the
algorithm is nobody responds to cid 2, so it would hang. As soon as the
other thread with cid 1 enters the comm_create, it would be allowed to
run, this operation would finish. The other threads would then allow cid
2 to enter, the 'hanging' process would be released.
However, here is something, where we might have problems with the sun
thread tests (and we discussed that with Terry already): the cid
allocation algorithm as implemented in Open MPI assumes ( -- this was/is
my/our understanding of the standard --) that a communicator creation is
a collective operation. This means, you can not have a comm_create and
another allreduce of the same communicator running in different threads,
because these allreduces will mix up and produce non-sense results. We
fixed the case, if all collective operations are comm_creates, but if
some of the threads are in a comm_create and some are in allreduce on
the same communicator, it won't work.
Correct, but this is not what happens with mt_coll test. mt_coll calls
commdup on the same communicator in different threads concurrently, but
we handle this case inside ompi_comm_nextcid().
Gleb Natapov wrote:
On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
Gleb,
This patch is not correct. The code preventing the registration of the same
communicator twice is later in the code (same file in the function
ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid
I saw this code and the comment. The problem is not with the same
communicator but with different communicators.
is called, we know that each communicator only handle one "communicator
creation" function at the same time. Therefore, we want to give priority to
the smallest com_id, which is what happens in the code you removed.
The code I removed was doing it wrongly. I.e the algorithm sometimes is executed
for different communicators simultaneously by different threads. Think
about the case where the function is running for cid 1 and then another
thread runs it for cid 0. cid 0 will proceed although the function is
executed on another CPU. And this is not something theoretical, that
is happening with sun's thread test suit mpi_coll test case.
Without the condition in the ompi_comm_register_cid (each communicator only
get registered once) your comment make sense. However, with the condition
your patch allow a dead end situation, while 2 processes try to create
communicators in multiple threads, and they will never succeed, simply
because they will not order the creation based on the com_id.
If the algorithm is really prone to deadlock in case it is concurrently
executed for several different communicators (I haven't check this),
then we may want to fix original code to really prevent two threads to
enter the function, but then I don't see the reason for all those
complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
The algorithm described here:
http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2
in section 5.3 works without it and we can do something similar.
george.
On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:
Author: gleb
Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
New Revision: 16088
URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
Log:
The code tries to prevent itself from running for more then one
communicator
simultaneously, but is doing it incorrectly. If the function is running
already
for one communicator and it is called from another thread for other
communicator
with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
will fail and the function will be executed for two different
communicators by
two threads simultaneously. There is nothing in the algorithm that prevent
it
from been running simultaneously for different communicators as far as I
can see,
but ompi_comm_unregister_cid() assumes that it is always called for a
communicator
with the lowest cid and this is not always the case. This patch removes
bogus
lowest cid check and fix ompi_comm_register_cid() to properly remove cid
from
the list.
Text files modified:
trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)
Modified: trunk/ompi/communicator/comm_cid.c
==============================================================================
--- trunk/ompi/communicator/comm_cid.c (original)
+++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11
Sep 2007)
@@ -11,6 +11,7 @@
* All rights reserved.
* Copyright (c) 2006-2007 University of Houston. All rights reserved.
* Copyright (c) 2007 Cisco, Inc. All rights reserved.
+ * Copyright (c) 2007 Voltaire All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -170,15 +171,6 @@
* This is the real algorithm described in the doc
*/
- OPAL_THREAD_LOCK(&ompi_cid_lock);
- if (comm->c_contextid != ompi_comm_lowest_cid() ) {
- /* if not lowest cid, we do not continue, but sleep and
try again */
- OPAL_THREAD_UNLOCK(&ompi_cid_lock);
- continue;
- }
- OPAL_THREAD_UNLOCK(&ompi_cid_lock);
-
-
for (i=start; i < mca_pml.pml_max_contextid ; i++) {
flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators,
i, comm);
@@ -365,10 +357,18 @@
static int ompi_comm_unregister_cid (uint32_t cid)
{
- ompi_comm_reg_t *regcom=NULL;
- opal_list_item_t
*item=opal_list_remove_first(&ompi_registered_comms);
+ ompi_comm_reg_t *regcom;
+ opal_list_item_t *item;
- regcom = (ompi_comm_reg_t *) item;
+ for (item = opal_list_get_first(&ompi_registered_comms);
+ item != opal_list_get_end(&ompi_registered_comms);
+ item = opal_list_get_next(item)) {
+ regcom = (ompi_comm_reg_t *)item;
+ if(regcom->cid == cid) {
+ opal_list_remove_item(&ompi_registered_comms, item);
+ break;
+ }
+ }
OBJ_RELEASE(regcom);
return OMPI_SUCCESS;
}
_______________________________________________
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335