Thanks for the answer. So if I understand correctly, the connection order is decided dynamically depending on when each peer has some messages to send and how the upper level load-balances them. There shouldn't be anything preventing (1) and (2) from happening at the same time then. And I wonder why I always see 1,2,3,4 with MX (using IMB) and not with Open-MX...
Brice George Bosilca wrote: > Brice, > > The connection mechanism in the MX BTL suffers from a big problem on > multi-rail (if all NICS are identical). If the rails are connected > using the same mapper, they will have identical ID. Unfortunately, > these ID are supposed to be unique in order to guarantee the > connection ordering (0 to 0, 1 to 1 and so on based on the mapper's > MAC). However, the outcome I saw in the past in this case is not a > deadlock but a poorly distribution of the data over the two NICS: one > will be over-loaded while the other will not be used at all. > > There is no answer from a peer when we connect the MX BTLs. If the > steps are the ones you described in your email, then I guess both of > the peers try to connect to the other simultaneously. Now, when you > have multiple rails, we treat them at the upper level as independent > devices, and we will try to load balance the messages over all of > them. The step (3) seems to indicate that another message (MPI) has > been sent, and because of the load balancing scheme we try to connect > the second device (rail in this context). In MX this works because we > use the blocking function (mx_connect). > > george. > > On Jun 17, 2009, at 08:23 , Brice Goglin wrote: > >> Hello, >> >> I am debugging some sort of deadlock when doing multirail over Open-MX. >> What I am seeing with 2 processes and 2 boards per node with *MX* is: >> 1) process 0 rail 0 connects to process 1 rail 0 >> 2) p1r0 connects back to p0r0 >> 3) p0 rail 1 connects to p1 rail 1 >> 4) p1r1 connects back to p0r1 >> For some reason, with *Open-MX*, process 0 seems to start (3) before >> process 1 has finished (2). It probably causes a deadlock because p1 is >> polling on rail 0 for (2), while (3) needs somebody to poll on rail 1 >> for the connect handshake. >> >> So, the question is: is there anything in OMPI (1.3) guarantying that >> the above 4 steps will occur in some specified order? If so, Open-MX is >> probably doing something wrong breaking the order. If not, adding a >> progression thread to Open-MX might be the only solution... >> >> thanks, >> Brice >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel