I think that in this case one *could* add logic that would disqualify the
subnet because every compute node in the job has the SAME address.  In
fact, any subnet on which two or more compute nodes have the same address
must be suspect.  If this logic were introduced, the 127.0.0.1 loopback
address wouldn't need to be a special case.

This is just an observation, not a feature request (at least not on my
part).

-Paul


On Wed, Aug 13, 2014 at 7:55 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> I think this is expected behavior.
>
> If you have networks that you need Open MPI to ignore (e.g., a private
> network that *looks* reachable between multiple servers -- because the
> interfaces are on the same subnet -- but actually *isn't*), then the
> include/exclude mechanism is the right way to exclude them.
>
> That being said, I'm not sure why the behavior is different between trunk
> and v1.8.
>
>
> On Aug 13, 2014, at 1:41 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
> > Folks,
> >
> > i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> > *and* each node has a private network with the same ip
> > (in my case, each node has a private network to a MIC)
> >
> > in order to reproduce the problem, you can simply run (as root) on the
> > two compute nodes
> > brctl addbr br0
> > ifconfig br0 192.168.255.1 netmask 255.255.255.0
> >
> > mpirun will hang
> >
> > a workaroung is to add --mca btl_tcp_if_include eth0
> >
> > v1.8 does not hang in this case
> >
> > Cheers,
> >
> > Gilles
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15623.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15631.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to