Sean Hefty wrote: >> Well then the rdma-cm needs to know which devices support hw loopback. >> Cuz on a T3-only system, no hwloop... >> > > The problem sounds like it's more than just whether 127.0.0.1 is usable. That > check may fix openmpi, but it sounds more like the app needs to know whether > the > device can actually support loopback, regardless of what addresses are used. > Is > this correct? > > What would openmpi do if there were two addresses assigned to the T3 device? >
It would use them and might even create two connections. > Does openmpi simply bypass RDMA for all connections on the local machine? > > OpenMPI can be run to use hw loopback if its available. For T3 clusters, OMPI is run in a mode to use shared memory for intra-node communications. > Basically, I'm not sure that this is *just* an rdma_cm issue. Although it > definitely appears that some sort of change needs to be made to the rdma_cm. > > I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this particular case. Prior to ofed-1.5.1, however, the bind would fail and thus OpenMPI would not advertise 127.0.0.1 to its peer. I will work to get that change done. But lets also add a device attribute so the rdmacm can know if a device supports loopback. Clearly, if the rdma-cm allows binds to T3, loopback connections will fail at connect time. Hey Roland, are you ok with a device attribute to indicate hw-loopback support? Steve. _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg