Actually I think that Damien analysis is correct. On a 8 nodes cluster

mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 
Sendrecv

does work, while 

mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 
Sendrecv

doesn't. As soon as I remove the grpcomm (aka use bad instead) everything works 
as expected.

I just committed a patch (r24534) to the TCP BTL to output more information and 
here is what I get when I add --mca btl_base_verbose 100 to the mpirun.

[node02:01565] btl: tcp: attempting to connect() to [[14725,1],0] address 
192.168.3.1 on port 1024
[node02:01565] btl: tcp: attempting to connect() to [[14725,1],1] address 
192.168.3.1 on port 1024
[node01:31562] btl: tcp: attempting to connect() to [[14725,1],2] address 
192.168.3.2 on port 1026
[node01:31561] btl: tcp: attempting to connect() to [[14725,1],2] address 
192.168.3.2 on port 1026
[node01:31562] btl: tcp: attempting to connect() to [[14725,1],3] address 
192.168.3.2 on port 1026

The "-npernode 2" will place 2 processes per node, so the vpid 0 and 1 will be 
on node01 and vpid 2 and 3 will be on node02. Looking at the BTL TCP connection 
attempts one can clearly see that process 01565 on node02 think that both vpid 
0 and 1 can be joined using address 192.168.3.1 on port 1024, which is 
obviously wrong.

As removing the grpcomm hier solves the problem, I would expect the issues is 
not in the TCP BTL.

  george.


On Mar 16, 2011, at 15:16 , Ralph Castain wrote:

> I suspect something else is wrong - the grpcomm system never has any 
> visibility as to what data goes into the modex, or how that data is used. In 
> other words, if the tcp btl isn't providing adequate info, then it would fail 
> regardless of which grpcomm module was in use. So your statement about the 
> hier module not distinguishing between peers on the same node doesn't make 
> sense - the hier module has no idea that a tcp btl even exists, let alone 
> have anything to do with the modex data.
> 
> You might take a look at how the tcp btl is picking its sockets. The srun 
> direct launch method may be setting envars that confuse it, perhaps causing 
> the procs to all pick the same socket.
> 
> 
> On Mar 16, 2011, at 12:48 PM, Damien Guinier wrote:
> 
>> Hi all
>> 
>> From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The 
>> "grpcomm:hier" module is important because, "srun" launch protocol can't use 
>> any other "grpcomm" module.
>> You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when you 
>> create a ring(like: IMB sendrecv)
>> 
>> $>salloc -N 2 -n 4 mpirun --mca grpcomm hier --mca btl self,sm,tcp 
>> ./IMB-MPI1 Sendrecv
>> salloc: Granted job allocation 2979
>> [cuzco95][[59536,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
>>  received unexpected process identifier [[59536,1],0]
>> [cuzco92][[59536,1],0][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
>>  received unexpected process identifier [[59536,1],2]
>> ^C
>> $>
>> 
>> This error message show: "btl:tcp" have create a connection to a peer, but 
>> it not the good one ( peer identity is checked with the "ack").
>> To create a connection between two peers with "btl:tcp":
>> - Each peer broadcast theirs IP parameters with ompi_modex_send().
>> - IP parameters from selected peer is received with ompi_modex_recv().
>> 
>> In fact, modex use "orte_grpcomm.set_proc_attr()" and 
>> "orte_grpcomm.get_proc_attr()" to exchange data. The problem is 
>> "grpcomm:hier" doesn't make difference between two peer on the same node. 
>> From my test the IP parameters, from the fist rank on the selected node, is 
>> always return.
>> 
>> 
>> "grpcomm:hier" is restricted to "btl:sm" and "btl:openib" ?
>> 
>> 
>> --------
>> 
>> One easy solution to fix this problem, is to add rank information in the 
>> "name" variable on
>> -    ompi/runtime/ompi_module_exchange.c:ompi_modex_send()
>> -    ompi/runtime/ompi_module_exchange.c:ompi_modex_recv()
>> but I dislike it.
>> 
>> Someone have a better solution ?
>> 
>> 
>> thanks you
>> Damien
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"To preserve the freedom of the human mind then and freedom of the press, every 
spirit should be ready to devote itself to martyrdom; for as long as we may 
think as we will, and speak as we think, the condition of man will proceed in 
improvement."
  -- Thomas Jefferson, 1799


Reply via email to