Hmmm...puzzling. It is working fine for me on TM machines and on my Mac.
However, Galen reports it borked on alps as well.

I'll have to dig a little to check this out and see if there is something
missing on those PLMs. Will get back shortly.

Sorry for problem


On 3/27/08 10:28 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:

> Unfortunately now with r17988 I cannot run any mpi programs, they seem
> to hang in the modex.
> 
> Tim
> 
> Ralph H Castain wrote:
>> Thanks Tim - I found the problem and will commit a fix shortly.
>> 
>> Appreciate your testing and reporting!
>> 
>> 
>> On 3/27/08 8:24 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
>> 
>>> This commit breaks things for me. Running on 3 nodes of odin:
>>> 
>>> mpirun -mca btl tcp,sm,self  examples/ring_c
>>> 
>>> causes a hang. All of the processes are stuck in
>>> orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
>>> and the ring program does not hang all the time, but fairly often.
>>> 
>>> Tim
>>> 
>>> r...@osl.iu.edu wrote:
>>>> Author: rhc
>>>> Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
>>>> New Revision: 17941
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/17941
>>>> 
>>>> Log:
>>>> Fix the allgather and allgather_list functions to avoid deadlocks at large
>>>> node/proc counts. Violated the RML rules here - we received the allgather
>>>> buffer and then did an xcast, which causes a send to go out, and is then
>>>> subsequently received by the sender. This fix breaks that pattern by
>>>> forcing
>>>> the recv to complete outside of the function itself - thus, the allgather
>>>> and
>>>> allgather_list always complete their recvs before returning or sending.
>>>> 
>>>> Reogranize the grpcomm code a little to provide support for soon-to-come
>>>> new
>>>> grpcomm components. The revised organization puts what will be common code
>>>> elements in the base to avoid duplication, while allowing components that
>>>> don't need those functions to ignore them.
>>>> 
>>>> Added:
>>>>    trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
>>>>    trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
>>>>    trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
>>>> Text files modified:
>>>>    trunk/orte/mca/grpcomm/base/Makefile.am                |     5
>>>>    trunk/orte/mca/grpcomm/base/base.h                     |    23 +
>>>>    trunk/orte/mca/grpcomm/base/grpcomm_base_close.c       |     4
>>>>    trunk/orte/mca/grpcomm/base/grpcomm_base_open.c        |     1
>>>>    trunk/orte/mca/grpcomm/base/grpcomm_base_select.c      |   121 ++---
>>>>    trunk/orte/mca/grpcomm/basic/grpcomm_basic.h           |    16
>>>>    trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |    30 -
>>>>    trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c    |   845
>>>> ++-------------------------------------
>>>>    trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h             |     8
>>>>    trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   |     8
>>>>    trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c      |    21
>>>>    trunk/orte/mca/grpcomm/grpcomm.h                       |    45 +
>>>>    trunk/orte/mca/rml/rml_types.h                         |    31
>>>>    trunk/orte/orted/orted_comm.c                          |    27 +
>>>>    14 files changed, 226 insertions(+), 959 deletions(-)
>>>> 
>>>> 
>>>> Diff not shown due to size (92619 bytes).
>>>> To see the diff, run the following command:
>>>> 
>>>> svn diff -r 17940:17941 --no-diff-deleted
>>>> 
>>>> _______________________________________________
>>>> svn mailing list
>>>> s...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to