Actually I think that Damien analysis is correct. On a 8 nodes cluster mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv
does work, while mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv doesn't. As soon as I remove the grpcomm (aka use bad instead) everything works as expected. I just committed a patch (r24534) to the TCP BTL to output more information and here is what I get when I add --mca btl_base_verbose 100 to the mpirun. [node02:01565] btl: tcp: attempting to connect() to [[14725,1],0] address 192.168.3.1 on port 1024 [node02:01565] btl: tcp: attempting to connect() to [[14725,1],1] address 192.168.3.1 on port 1024 [node01:31562] btl: tcp: attempting to connect() to [[14725,1],2] address 192.168.3.2 on port 1026 [node01:31561] btl: tcp: attempting to connect() to [[14725,1],2] address 192.168.3.2 on port 1026 [node01:31562] btl: tcp: attempting to connect() to [[14725,1],3] address 192.168.3.2 on port 1026 The "-npernode 2" will place 2 processes per node, so the vpid 0 and 1 will be on node01 and vpid 2 and 3 will be on node02. Looking at the BTL TCP connection attempts one can clearly see that process 01565 on node02 think that both vpid 0 and 1 can be joined using address 192.168.3.1 on port 1024, which is obviously wrong. As removing the grpcomm hier solves the problem, I would expect the issues is not in the TCP BTL. george. On Mar 16, 2011, at 15:16 , Ralph Castain wrote: > I suspect something else is wrong - the grpcomm system never has any > visibility as to what data goes into the modex, or how that data is used. In > other words, if the tcp btl isn't providing adequate info, then it would fail > regardless of which grpcomm module was in use. So your statement about the > hier module not distinguishing between peers on the same node doesn't make > sense - the hier module has no idea that a tcp btl even exists, let alone > have anything to do with the modex data. > > You might take a look at how the tcp btl is picking its sockets. The srun > direct launch method may be setting envars that confuse it, perhaps causing > the procs to all pick the same socket. > > > On Mar 16, 2011, at 12:48 PM, Damien Guinier wrote: > >> Hi all >> >> From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The >> "grpcomm:hier" module is important because, "srun" launch protocol can't use >> any other "grpcomm" module. >> You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when you >> create a ring(like: IMB sendrecv) >> >> $>salloc -N 2 -n 4 mpirun --mca grpcomm hier --mca btl self,sm,tcp >> ./IMB-MPI1 Sendrecv >> salloc: Granted job allocation 2979 >> [cuzco95][[59536,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] >> received unexpected process identifier [[59536,1],0] >> [cuzco92][[59536,1],0][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] >> received unexpected process identifier [[59536,1],2] >> ^C >> $> >> >> This error message show: "btl:tcp" have create a connection to a peer, but >> it not the good one ( peer identity is checked with the "ack"). >> To create a connection between two peers with "btl:tcp": >> - Each peer broadcast theirs IP parameters with ompi_modex_send(). >> - IP parameters from selected peer is received with ompi_modex_recv(). >> >> In fact, modex use "orte_grpcomm.set_proc_attr()" and >> "orte_grpcomm.get_proc_attr()" to exchange data. The problem is >> "grpcomm:hier" doesn't make difference between two peer on the same node. >> From my test the IP parameters, from the fist rank on the selected node, is >> always return. >> >> >> "grpcomm:hier" is restricted to "btl:sm" and "btl:openib" ? >> >> >> -------- >> >> One easy solution to fix this problem, is to add rank information in the >> "name" variable on >> - ompi/runtime/ompi_module_exchange.c:ompi_modex_send() >> - ompi/runtime/ompi_module_exchange.c:ompi_modex_recv() >> but I dislike it. >> >> Someone have a better solution ? >> >> >> thanks you >> Damien >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel "To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement." -- Thomas Jefferson, 1799