Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread George Bosilca
The trunk is [almost] right. It has nice error handling, and a bunch of
other features.

However, part of this bug report is troubling. We might want to check why
it doesn't exhaust all possible addressed before giving up on an endpoint.

  George.


PS: I'm not saying that we should back-port this in the 1.8 ...


On Wed, Aug 13, 2014 at 3:33 PM, Jeff Squyres (jsquyres)  wrote:

> On Aug 13, 2014, at 12:52 PM, George Bosilca  wrote:
>
> > There are many differences between the trunk and 1.8 regarding the TCP
> BTL. The major I remember about is that the TCP in the trunk is reporting
> errors to the upper level via the callbacks attached to fragments, while
> the 1.8 TCP BTL doesn't.
> >
> > So, I guess that once a connection to a particular endpoint fails, the
> trunk is getting the errors reported via the cb and then takes some drastic
> measure. In the 1.8 we might fallback and try another IP address before
> giving up.
>
> Does that has any effect on performance?
>
> I.e., should we bring this change to v1.8?
>
> Or, put simply: which way is Right?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15638.php
>


Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Jeff Squyres (jsquyres)
Paul: I think this is a slippery slope.

As I understand it, these private/on-host IP addresses are generated somewhat 
randomly (e.g., for on-host VM networking -- I don't know if the IP's for Phi 
on-host networking are pseudo-random or [effectively] fixed).  So you might end 
up in a situation like this:

server A: has br0 on-host IP address 10.0.0.23/8 ***same as server C
server B: has br0 on-host IP address 10.0.0.25/8
server C: has br0 on-host IP address 10.0.0.23/8 ***same as server A
server D: has br0 on-host IP address 10.0.0.107/8

In this case, servers A and C will detect that they have the same IP.  "Ah ha!" 
they say. "I'll just not use br0, because clearly this is erroneous".

But how will servers B and D know this?

You'll likely get the same "hang" behavior that we currently have, because B 
may try to send to A on 10.0.0.23/8.

Hence, the additional logic may not actually solve the problem.

I'm thinking that this is a human-configuration issue -- there may not be a 
good way to detect this automatically.

...unless there's a bit in Linux interfaces that says "this is an on-host 
network".  Does that exist?  Because that would be a better way to disqualify 
Linux IP interfaces.


On Aug 13, 2014, at 1:57 PM, Paul Hargrove  wrote:

> I think that in this case one *could* add logic that would disqualify the 
> subnet because every compute node in the job has the SAME address.  In fact, 
> any subnet on which two or more compute nodes have the same address must be 
> suspect.  If this logic were introduced, the 127.0.0.1 loopback address 
> wouldn't need to be a special case.
> 
> This is just an observation, not a feature request (at least not on my part).
> 
> -Paul
> 
> 
> On Wed, Aug 13, 2014 at 7:55 AM, Jeff Squyres (jsquyres)  
> wrote:
> I think this is expected behavior.
> 
> If you have networks that you need Open MPI to ignore (e.g., a private 
> network that *looks* reachable between multiple servers -- because the 
> interfaces are on the same subnet -- but actually *isn't*), then the 
> include/exclude mechanism is the right way to exclude them.
> 
> That being said, I'm not sure why the behavior is different between trunk and 
> v1.8.
> 
> 
> On Aug 13, 2014, at 1:41 AM, Gilles Gouaillardet 
>  wrote:
> 
> > Folks,
> >
> > i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> > *and* each node has a private network with the same ip
> > (in my case, each node has a private network to a MIC)
> >
> > in order to reproduce the problem, you can simply run (as root) on the
> > two compute nodes
> > brctl addbr br0
> > ifconfig br0 192.168.255.1 netmask 255.255.255.0
> >
> > mpirun will hang
> >
> > a workaroung is to add --mca btl_tcp_if_include eth0
> >
> > v1.8 does not hang in this case
> >
> > Cheers,
> >
> > Gilles
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/08/15623.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15631.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15636.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Jeff Squyres (jsquyres)
On Aug 13, 2014, at 12:52 PM, George Bosilca  wrote:

> There are many differences between the trunk and 1.8 regarding the TCP BTL. 
> The major I remember about is that the TCP in the trunk is reporting errors 
> to the upper level via the callbacks attached to fragments, while the 1.8 TCP 
> BTL doesn't.
> 
> So, I guess that once a connection to a particular endpoint fails, the trunk 
> is getting the errors reported via the cb and then takes some drastic 
> measure. In the 1.8 we might fallback and try another IP address before 
> giving up.

Does that has any effect on performance?

I.e., should we bring this change to v1.8?

Or, put simply: which way is Right?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Paul Hargrove
I think that in this case one *could* add logic that would disqualify the
subnet because every compute node in the job has the SAME address.  In
fact, any subnet on which two or more compute nodes have the same address
must be suspect.  If this logic were introduced, the 127.0.0.1 loopback
address wouldn't need to be a special case.

This is just an observation, not a feature request (at least not on my
part).

-Paul


On Wed, Aug 13, 2014 at 7:55 AM, Jeff Squyres (jsquyres)  wrote:

> I think this is expected behavior.
>
> If you have networks that you need Open MPI to ignore (e.g., a private
> network that *looks* reachable between multiple servers -- because the
> interfaces are on the same subnet -- but actually *isn't*), then the
> include/exclude mechanism is the right way to exclude them.
>
> That being said, I'm not sure why the behavior is different between trunk
> and v1.8.
>
>
> On Aug 13, 2014, at 1:41 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
> > Folks,
> >
> > i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> > *and* each node has a private network with the same ip
> > (in my case, each node has a private network to a MIC)
> >
> > in order to reproduce the problem, you can simply run (as root) on the
> > two compute nodes
> > brctl addbr br0
> > ifconfig br0 192.168.255.1 netmask 255.255.255.0
> >
> > mpirun will hang
> >
> > a workaroung is to add --mca btl_tcp_if_include eth0
> >
> > v1.8 does not hang in this case
> >
> > Cheers,
> >
> > Gilles
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15623.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15631.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread George Bosilca
There are many differences between the trunk and 1.8 regarding the TCP BTL.
The major I remember about is that the TCP in the trunk is reporting errors
to the upper level via the callbacks attached to fragments, while the 1.8
TCP BTL doesn't.

So, I guess that once a connection to a particular endpoint fails, the
trunk is getting the errors reported via the cb and then takes some drastic
measure. In the 1.8 we might fallback and try another IP address before
giving up.

  George.



On Wed, Aug 13, 2014 at 10:55 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> I think this is expected behavior.
>
> If you have networks that you need Open MPI to ignore (e.g., a private
> network that *looks* reachable between multiple servers -- because the
> interfaces are on the same subnet -- but actually *isn't*), then the
> include/exclude mechanism is the right way to exclude them.
>
> That being said, I'm not sure why the behavior is different between trunk
> and v1.8.
>
>
> On Aug 13, 2014, at 1:41 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
> > Folks,
> >
> > i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> > *and* each node has a private network with the same ip
> > (in my case, each node has a private network to a MIC)
> >
> > in order to reproduce the problem, you can simply run (as root) on the
> > two compute nodes
> > brctl addbr br0
> > ifconfig br0 192.168.255.1 netmask 255.255.255.0
> >
> > mpirun will hang
> >
> > a workaroung is to add --mca btl_tcp_if_include eth0
> >
> > v1.8 does not hang in this case
> >
> > Cheers,
> >
> > Gilles
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15623.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15631.php
>


Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Jeff Squyres (jsquyres)
I think this is expected behavior.

If you have networks that you need Open MPI to ignore (e.g., a private network 
that *looks* reachable between multiple servers -- because the interfaces are 
on the same subnet -- but actually *isn't*), then the include/exclude mechanism 
is the right way to exclude them.

That being said, I'm not sure why the behavior is different between trunk and 
v1.8.


On Aug 13, 2014, at 1:41 AM, Gilles Gouaillardet 
 wrote:

> Folks,
> 
> i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> *and* each node has a private network with the same ip
> (in my case, each node has a private network to a MIC)
> 
> in order to reproduce the problem, you can simply run (as root) on the
> two compute nodes
> brctl addbr br0
> ifconfig br0 192.168.255.1 netmask 255.255.255.0
> 
> mpirun will hang
> 
> a workaroung is to add --mca btl_tcp_if_include eth0
> 
> v1.8 does not hang in this case
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15623.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Ralph Castain
Afraid I can't get to this until next week, but will look at it then



On Tue, Aug 12, 2014 at 10:41 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Folks,
>
> i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> *and* each node has a private network with the same ip
> (in my case, each node has a private network to a MIC)
>
> in order to reproduce the problem, you can simply run (as root) on the
> two compute nodes
> brctl addbr br0
> ifconfig br0 192.168.255.1 netmask 255.255.255.0
>
> mpirun will hang
>
> a workaroung is to add --mca btl_tcp_if_include eth0
>
> v1.8 does not hang in this case
>
> Cheers,
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15623.php
>


[OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Gilles Gouaillardet
Folks,

i noticed mpirun (trunk) hangs when running any mpi program on two nodes
*and* each node has a private network with the same ip
(in my case, each node has a private network to a MIC)

in order to reproduce the problem, you can simply run (as root) on the
two compute nodes
brctl addbr br0
ifconfig br0 192.168.255.1 netmask 255.255.255.0

mpirun will hang

a workaroung is to add --mca btl_tcp_if_include eth0

v1.8 does not hang in this case

Cheers,

Gilles