This is an open question to OMPI developers...
It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen
is activated. This IP interface is only used to communicate with the local Xen
instance(s); it is not used to communicate over the real network.
In a case that I saw, the interface is created, set to "up", and is given an IP
address in the 192.168.1.x range. This was done by default -- all the user had
done was either say "yes, I want Xen enabled", or he didn't say he wanted it
*disabled* (I'm not sure which).
This causes a problem if you have Xen enabled on multiple machines in an OMPI
job. OMPI will see the 192.168.1.x address and see that it's "up", so it'll
add it to the eligible subnets that can be used. When OMPI sees that its peer
processes also have 192.168.1.x, it'll try to use that network for OOB/BTL
traffic -- which will fail, because these are local-only interfaces.
Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
Or is there another way to detect that an interface is local-only and should
not be used for OOB/BTL communication?
See this post on the user's list:
http://www.open-mpi.org/community/lists/users/2012/02/18432.php
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/