On 2/10/2012 11:50 AM, Jeff Squyres wrote:
This is an open question to OMPI developers...

It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen 
is activated.  This IP interface is only used to communicate with the local Xen 
instance(s); it is not used to communicate over the real network.

In a case that I saw, the interface is created, set to "up", and is given an IP address 
in the 192.168.1.x range.  This was done by default -- all the user had done was either say 
"yes, I want Xen enabled", or he didn't say he wanted it *disabled* (I'm not sure which).
I've done the latter and hit the same problem. There were instructions somewhere on the web that I found that told one how to disable vibr0.

This causes a problem if you have Xen enabled on multiple machines in an OMPI job.  OMPI 
will see the 192.168.1.x address and see that it's "up", so it'll add it to the 
eligible subnets that can be used.  When OMPI sees that its peer processes also have 
192.168.1.x, it'll try to use that network for OOB/BTL traffic -- which will fail, 
because these are local-only interfaces.

Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
What happens to that value if you then set btl_tcp_if_exclude to some value on the mpirun command line? So this brings me to something that has annoyed me for a bit. It seems to me that maybe it would be nice to have a conf file that you can dump interface names to exclude but would not be interpreted as a btl_tcp_if_exclude options. For example there were some interfaces on certain Sun machine (a long time ago) that went to the diagnostic processor and caused a similar issue as the virbr0 issue. So we started delivering a conf file that set btl_tcp_if_exclude but then this precluded anyone from being able to set btl_tcp_if_include. If we had a file one could specify the set of interfaces to use or exclude but allow the user to operate on the result of that set it seems that would be nice.

--td

Or is there another way to detect that an interface is local-only and should 
not be used for OOB/BTL communication?

See this post on the user's list:

     http://www.open-mpi.org/community/lists/users/2012/02/18432.php


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to