Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang on trunk

2014-06-06 Thread Ralph Castain

On Jun 6, 2014, at 12:50 PM, Rolf vandeVaart  wrote:

> Thanks for trying Ralph.   Looks like my issues has to do with coll ml 
> interaction.  If I exclude coll ml, then all my tests pass.  Do you know if 
> there is a bug for this issue?

There is a known issue with coll ml for intercomm_create - Nathan is working on 
a fix. It was reported by Gilles (yesterday?)

> If so, then I can run my nightly tests with coll ml disabled and wait for the 
> bug to be fixed.
>  
> Also, where does simple_spawn and spawn_multiple live?

I have a copy/version in my orte/test/mpi directory that I use - that's where 
these came from. Note that I left coll ml "on" for those as they weren't having 
troubles.


>   I was running “spawn” and “spawn_multiple” from the ibm/dynamic test suite. 
> Your output for spawn_multiple looks different than mine.
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Friday, June 06, 2014 3:19 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple 
> hang on trunk
>  
> Works fine for me:
>  
> [rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./simple_spawn
> [pid 22777] starting up!
> [pid 22778] starting up!
> [pid 22779] starting up!
> 1 completed MPI_Init
> Parent [pid 22778] about to spawn!
> 2 completed MPI_Init
> Parent [pid 22779] about to spawn!
> 0 completed MPI_Init
> Parent [pid 22777] about to spawn!
> [pid 22783] starting up!
> [pid 22784] starting up!
> Parent done with spawn
> Parent sending message to child
> Parent done with spawn
> Parent done with spawn
> 0 completed MPI_Init
> Hello from the child 0 of 2 on host bend001 pid 22783
> Child 0 received msg: 38
> 1 completed MPI_Init
> Hello from the child 1 of 2 on host bend001 pid 22784
> Child 1 disconnected
> Parent disconnected
> Parent disconnected
> Parent disconnected
> Child 0 disconnected
> 22784: exiting
> 22778: exiting
> 22779: exiting
> 22777: exiting
> 22783: exiting
> [rhc@bend001 mpi]$ make spawn_multiple
> mpicc -g --openmpi:linkallspawn_multiple.c   -o spawn_multiple
> [rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./spawn_multiple
> Parent [pid 22797] about to spawn!
> Parent [pid 22798] about to spawn!
> Parent [pid 22799] about to spawn!
> Parent done with spawn
> Parent done with spawn
> Parent sending message to children
> Parent done with spawn
> Hello from the child 0 of 2 on host bend001 pid 22803: argv[1] = foo
> Child 0 received msg: 38
> Hello from the child 1 of 2 on host bend001 pid 22804: argv[1] = bar
> Child 1 disconnected
> Parent disconnected
> Parent disconnected
> Parent disconnected
> Child 0 disconnected
> [rhc@bend001 mpi]$ mpirun -n 3 --host bend001 -mca coll ^ml ./intercomm_create
> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 3]
> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4]
> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5]
> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 3]
> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4]
> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5]
> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
> b: intercomm_create (0)
> b: barrier on inter-comm - before
> b: barrier on inter-comm - after
> b: intercomm_create (0)
> b: barrier on inter-comm - before
> b: barrier on inter-comm - after
> c: intercomm_create (0)
> c: barrier on inter-comm - before
> c: barrier on inter-comm - after
> c: intercomm_create (0)
> c: barrier on inter-comm - before
> c: barrier on inter-comm - after
> a: intercomm_create (0)
> a: barrier on inter-comm - before
> a: barrier on inter-comm - after
> c: intercomm_create (0)
> c: barrier on inter-comm - before
> c: barrier on inter-comm - after
> a: intercomm_create (0)
> a: barrier on inter-comm - before
> a: barrier on inter-comm - after
> a: intercomm_create (0)
> a: barrier on inter-comm - before
> a: barrier on inter-comm - after
> b: intercomm_create (0)
> b: barrier on inter-comm - before
> b: barrier on inter-comm - after
> a: intercomm_merge(0) (0) [rank 2]
> c: intercomm_merge(0) (0) [rank 8]
> a: intercomm_merge(0) (0) [rank 0]
> a: intercomm_merge(0) (0) [rank 1]
> c: intercomm_merge(0) (0) [rank 7]
> b: intercomm_merge(1) (0) [rank 4]
> b: intercomm_merge(1) (0) [rank 5]
> c: intercomm_merge(0) (0) [rank 6]
> b: intercomm_merge(1) (0) [rank 3]
> a: barrier (0)
> b: barrier (0)
> c: barrier (0)
> a: barrier (0)
> c: barrier (0)
> b: barrier (0)
> a: barrier (0)
> c: barrier (0)
> b: barrier (0)
> dpm_base_disconnect_init: error -12 in isend to process 3
> dpm_base_disconnect_init: error -12 in isend to process 3
> dpm_base_disconnect_init: error -12 in isend to process 3
> dpm_base_disconnect_init: erro

Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang on trunk

2014-06-06 Thread Rolf vandeVaart
Thanks for trying Ralph.   Looks like my issues has to do with coll ml 
interaction.  If I exclude coll ml, then all my tests pass.  Do you know if 
there is a bug for this issue?
If so, then I can run my nightly tests with coll ml disabled and wait for the 
bug to be fixed.

Also, where does simple_spawn and spawn_multiple live?  I was running "spawn" 
and "spawn_multiple" from the ibm/dynamic test suite.
Your output for spawn_multiple looks different than mine.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, June 06, 2014 3:19 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang 
on trunk

Works fine for me:

[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./simple_spawn
[pid 22777] starting up!
[pid 22778] starting up!
[pid 22779] starting up!
1 completed MPI_Init
Parent [pid 22778] about to spawn!
2 completed MPI_Init
Parent [pid 22779] about to spawn!
0 completed MPI_Init
Parent [pid 22777] about to spawn!
[pid 22783] starting up!
[pid 22784] starting up!
Parent done with spawn
Parent sending message to child
Parent done with spawn
Parent done with spawn
0 completed MPI_Init
Hello from the child 0 of 2 on host bend001 pid 22783
Child 0 received msg: 38
1 completed MPI_Init
Hello from the child 1 of 2 on host bend001 pid 22784
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
22784: exiting
22778: exiting
22779: exiting
22777: exiting
22783: exiting
[rhc@bend001 mpi]$ make spawn_multiple
mpicc -g --openmpi:linkallspawn_multiple.c   -o spawn_multiple
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./spawn_multiple
Parent [pid 22797] about to spawn!
Parent [pid 22798] about to spawn!
Parent [pid 22799] about to spawn!
Parent done with spawn
Parent done with spawn
Parent sending message to children
Parent done with spawn
Hello from the child 0 of 2 on host bend001 pid 22803: argv[1] = foo
Child 0 received msg: 38
Hello from the child 1 of 2 on host bend001 pid 22804: argv[1] = bar
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 -mca coll ^ml ./intercomm_create
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 3]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 3]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5]
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
a: intercomm_merge(0) (0) [rank 2]
c: intercomm_merge(0) (0) [rank 8]
a: intercomm_merge(0) (0) [rank 0]
a: intercomm_merge(0) (0) [rank 1]
c: intercomm_merge(0) (0) [rank 7]
b: intercomm_merge(1) (0) [rank 4]
b: intercomm_merge(1) (0) [rank 5]
c: intercomm_merge(0) (0) [rank 6]
b: intercomm_merge(1) (0) [rank 3]
a: barrier (0)
b: barrier (0)
c: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 1
dpm_base_disconnect_init: error -12 in isend to process 3
[rhc@bend001 mpi]$



On Jun 6, 2014, at 11:26 AM, Rolf vandeVaa

Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang on trunk

2014-06-06 Thread Ralph Castain
Works fine for me:

[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./simple_spawn
[pid 22777] starting up!
[pid 22778] starting up!
[pid 22779] starting up!
1 completed MPI_Init
Parent [pid 22778] about to spawn!
2 completed MPI_Init
Parent [pid 22779] about to spawn!
0 completed MPI_Init
Parent [pid 22777] about to spawn!
[pid 22783] starting up!
[pid 22784] starting up!
Parent done with spawn
Parent sending message to child
Parent done with spawn
Parent done with spawn
0 completed MPI_Init
Hello from the child 0 of 2 on host bend001 pid 22783
Child 0 received msg: 38
1 completed MPI_Init
Hello from the child 1 of 2 on host bend001 pid 22784
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
22784: exiting
22778: exiting
22779: exiting
22777: exiting
22783: exiting
[rhc@bend001 mpi]$ make spawn_multiple
mpicc -g --openmpi:linkallspawn_multiple.c   -o spawn_multiple
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./spawn_multiple
Parent [pid 22797] about to spawn!
Parent [pid 22798] about to spawn!
Parent [pid 22799] about to spawn!
Parent done with spawn
Parent done with spawn
Parent sending message to children
Parent done with spawn
Hello from the child 0 of 2 on host bend001 pid 22803: argv[1] = foo
Child 0 received msg: 38
Hello from the child 1 of 2 on host bend001 pid 22804: argv[1] = bar
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 -mca coll ^ml ./intercomm_create
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 3]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 3]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5]
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
a: intercomm_merge(0) (0) [rank 2]
c: intercomm_merge(0) (0) [rank 8]
a: intercomm_merge(0) (0) [rank 0]
a: intercomm_merge(0) (0) [rank 1]
c: intercomm_merge(0) (0) [rank 7]
b: intercomm_merge(1) (0) [rank 4]
b: intercomm_merge(1) (0) [rank 5]
c: intercomm_merge(0) (0) [rank 6]
b: intercomm_merge(1) (0) [rank 3]
a: barrier (0)
b: barrier (0)
c: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 1
dpm_base_disconnect_init: error -12 in isend to process 3
[rhc@bend001 mpi]$ 



On Jun 6, 2014, at 11:26 AM, Rolf vandeVaart  wrote:

> I am seeing an interesting failure on trunk.  intercomm_create, spawn, and 
> spawn_multiple from the IBM tests hang if I explicitly list the hostnames to 
> run on.  For example:
> 
> Good:
> $ mpirun -np 2 --mca btl self,sm,tcp spawn_multiple
> Parent: 0 of 2, drossetti-ivy0.nvidia.com (0 in init)
> Parent: 1 of 2, drossetti-ivy0.nvidia.com (0 in init)
> Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
> Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
> Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
> Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
> $ 
> 
> Bad:
> $ mpirun -np 2 --mca 

[OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang on trunk

2014-06-06 Thread Rolf vandeVaart
I am seeing an interesting failure on trunk.  intercomm_create, spawn, and 
spawn_multiple from the IBM tests hang if I explicitly list the hostnames to 
run on.  For example:

Good:
$ mpirun -np 2 --mca btl self,sm,tcp spawn_multiple
Parent: 0 of 2, drossetti-ivy0.nvidia.com (0 in init)
Parent: 1 of 2, drossetti-ivy0.nvidia.com (0 in init)
Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
$ 

Bad:
$ mpirun -np 2 --mca btl self,sm,tcp -host drossetti-ivy0,drossetti-ivy0 
spawn_multiple
Parent: 0 of 2, drossetti-ivy0.nvidia.com (1 in init)
Parent: 1 of 2, drossetti-ivy0.nvidia.com (1 in init)
Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
[..and we are hung here...]

I see the exact same behavior for spawn and spawn_multiple.  Ralph, any 
thoughts?  Open MPI 1.8 is fine.  I can provide more information if needed, but 
I assume this is reproducible. 

Thanks,
Rolf
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] iallgather failures with coll ml

2014-06-06 Thread Rolf vandeVaart
On the trunk, I am seeing failures of the ibm tests iallgather and 
iallgather_in_place.  Is this a known issue?

$ mpirun --mca btl self,sm,tcp --mca coll ml,basic,libnbc --host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 -np 4 iallgather
[**ERROR**]: MPI_COMM_WORLD rank 0, file iallgather.c:77:
bad answer (0) at index 1 of 4 (should be 1)
[**ERROR**]: MPI_COMM_WORLD rank 1, file iallgather.c:77:
bad answer (0) at index 1 of 4 (should be 1)

Interestingly, there is an MCA param to disable it in coll ml which allows the 
test to pass.

$ mpirun --mca coll_ml_disable_allgather 1 --mca btl self,sm,tcp --mca coll 
ml,basic,libnbc --host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 -np 4 iallgather
$ echo $?
0




---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Intermittent hangs when exiting with error

2014-06-06 Thread Ralph Castain

On Jun 6, 2014, at 7:11 AM, Jeff Squyres (jsquyres)  wrote:

> Looks like Ralph's simpler solution fit the bill.

Yeah, but I still am unhappy with it. It's about the stupidest connection model 
you can imagine. What happens is this:

* a process constructs its URI - this is done by creating a string with the 
IP:PORT for each subnet the proc is listening on. The URI is constructed in 
alphabetical order (well, actually in kernel index order - but that tends to 
follow the alphabetical order of the interface names). This then gets passed to 
the other process

* the sender breaks the URI into its component parts and creates a list of 
addresses for the target. This list gets created in the order of the components 
- i.e., we take the first IP:PORT out of the URI, and that is our first address.

* when the sender initiates a connection, it takes the first address in the 
list (which means the alphabetically first name in the target's list of 
interfaces) and initiates the connection on that subnet. If it succeeds, then 
that is the subnet we use for all subsequent messages.

So if the first subnet can reach the target, even if it means bouncing all over 
the Internet, we will use it - even though the second subnet in the URI might 
have provided a direct connection!

It solves Gilles problem because "ib" comes after "eth", and it matches what 
was done in the original OOB (before my rewrite) - but it sure sounds to me 
like a bad, inefficient solution for general use.


> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/14987.php



Re: [OMPI devel] Intermittent hangs when exiting with error

2014-06-06 Thread Jeff Squyres (jsquyres)
On Jun 5, 2014, at 9:16 PM, Gilles Gouaillardet  
wrote:

> i work on a 4k+ nodes cluster with a very decent gigabit ethernet
> network (reasonable oversubscription + switches
> from a reputable vendor you are familiar with ;-) )
> my experience is that IPoIB can be very slow at establishing a
> connection, especially if the arp table is not populated
> (as far as i understand, this involves the subnet manager and
> performance can be very random especially if all nodes issue
> arp requests at the same time)
> on the other hand, performance is much more stable when using the
> subnetted IP network.

Got it.

>> As a simple solution, there could be an TCP oob MCA param that says 
>> "regardless of peer IP address, I can connect to them" (i.e., assume IP 
>> routing will make everything work out ok).
> +1 and/or an option to tell oob mca "do not discard the interface simply
> because the peer IP is not in the same subnet"

Looks like Ralph's simpler solution fit the bill.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] MPI_Comm_spawn affinity and coll/ml

2014-06-06 Thread Ralph Castain
I fixed the binding algorithm so it shifts the location to be more of what you 
expected. However, we still won't bind the final spawn if there aren't enough 
free cores to support those procs.


On Jun 5, 2014, at 7:12 AM, Hjelm, Nathan T  wrote:

> Coll/ml does disqualify itself if processes are not bound. The problem here 
> is there is an inconsistency between the two sides of the intercommunicator. 
> I can write a quick fix for 1.8.2.
> 
> -Nathan
> 
> From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
> [gilles.gouaillar...@gmail.com]
> Sent: Thursday, June 05, 2014 1:20 AM
> To: Open MPI Developers
> Subject: [OMPI devel] MPI_Comm_spawn affinity and coll/ml
> 
> Folks,
> 
> on my single socket four cores VM (no batch manager), i am running the 
> intercomm_create test from the ibm test suite.
> 
> mpirun -np 1 ./intercomm_create
> => OK
> 
> mpirun -np 2 ./intercomm_create
> => HANG :-(
> 
> mpirun -np 2 --mca coll ^ml  ./intercomm_create
> => OK
> 
> basically, this first two tasks will call twice MPI_Comm_spawn(2 tasks) 
> followed by MPI_Intercomm_merge
> and the 4 spawned tasks will call MPI_Intercomm_merge followed by 
> MPI_Intercomm_create
> 
> i digged a bit into that issue and found two distinct issues :
> 
> 1) binding :
> tasks [0-1] (launched with mpirun) are bound on cores [0-1] => OK
> tasks[2-3] (first spawn) are bound on cores [0-1] => ODD, i would have 
> expected [2-3]
> tasks[4-5] (second spawn) are not bound at all => ODD again, could have made 
> sense if tasks[2-3] were bound on cores [2-3]
> i observe the same behaviour  with the --oversubscribe mpirun parameter
> 
> 2) coll/ml
> coll/ml hangs when -np 2 (total 6 tasks, including 2 unbound tasks)
> i suspect coll/ml is unable to handle unbound tasks.
> if i am correct, should coll/ml detect this and simply automatically 
> disqualify itself ?
> 
> Cheers,
> 
> Gilles
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/14980.php



Re: [OMPI devel] Intermittent hangs when exiting with error

2014-06-06 Thread Ralph Castain
Kewl - thanks!

On Jun 5, 2014, at 9:28 PM, Gilles Gouaillardet  
wrote:

> Ralph,
> 
> sorry for my poor understanding ...
> 
> i tried r31956 and it solved both issues :
> - MPI_Abort does not hang any more if nodes are on different eth0 subnets
> - MPI_Init does not hang any more if hosts have different number of IB ports
> 
> this likely explains why you are having trouble replicating it ;-)
> 
> Thanks a lot !
> 
> Gilles
> 
> 
> On Fri, Jun 6, 2014 at 11:45 AM, Ralph Castain  wrote:
> I keep explaining that we don't "discard" anything, but there really isn't 
> any point to continuing trying to explain the system. With the announced 
> intention of completing the move of the BTLs to OPAL, I no longer need the 
> multi-module complexity in the OOB/TCP. So I have removed it and gone back to 
> the single module that connects to everything.
> 
> Try r31956 - hopefully will resolve your connectivity issues.
> 
> Still looking at the MPI_Abort hang as I'm having trouble replicating it.
> 
> 
> On Jun 5, 2014, at 7:16 PM, Gilles Gouaillardet 
>  wrote:
> 
> > Jeff,
> >
> > as pointed by Ralph, i do wish using eth0 for oob messages.
> >
> > i work on a 4k+ nodes cluster with a very decent gigabit ethernet
> > network (reasonable oversubscription + switches
> > from a reputable vendor you are familiar with ;-) )
> > my experience is that IPoIB can be very slow at establishing a
> > connection, especially if the arp table is not populated
> > (as far as i understand, this involves the subnet manager and
> > performance can be very random especially if all nodes issue
> > arp requests at the same time)
> > on the other hand, performance is much more stable when using the
> > subnetted IP network.
> >
> > as Ralf also pointed, i can imagine some architects neglect their
> > ethernet network (e.g. highly oversubscribed + low end switches)
> > and in this case ib0 is a best fit for oob messages.
> >
> >> As a simple solution, there could be an TCP oob MCA param that says 
> >> "regardless of peer IP address, I can connect to them" (i.e., assume IP 
> >> routing will make everything work out ok).
> > +1 and/or an option to tell oob mca "do not discard the interface simply
> > because the peer IP is not in the same subnet"
> >
> > Cheers,
> >
> > Gilles
> >
> > On 2014/06/05 23:01, Ralph Castain wrote:
> >> Because Gilles wants to avoid using IB for TCP messages, and using eth0 
> >> also solves the problem (the messages just route)
> >>
> >> On Jun 5, 2014, at 5:00 AM, Jeff Squyres (jsquyres)  
> >> wrote:
> >>
> >>> Another random thought for Gilles situation: why not oob-TCP-if-include 
> >>> ib0?  (And not eth0)
> >>>
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/06/14982.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/14983.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/14984.php



Re: [OMPI devel] Intermittent hangs when exiting with error

2014-06-06 Thread Gilles Gouaillardet
Ralph,

sorry for my poor understanding ...

i tried r31956 and it solved both issues :
- MPI_Abort does not hang any more if nodes are on different eth0 subnets
- MPI_Init does not hang any more if hosts have different number of IB ports

this likely explains why you are having trouble replicating it ;-)

Thanks a lot !

Gilles


On Fri, Jun 6, 2014 at 11:45 AM, Ralph Castain  wrote:

> I keep explaining that we don't "discard" anything, but there really isn't
> any point to continuing trying to explain the system. With the announced
> intention of completing the move of the BTLs to OPAL, I no longer need the
> multi-module complexity in the OOB/TCP. So I have removed it and gone back
> to the single module that connects to everything.
>
> Try r31956 - hopefully will resolve your connectivity issues.
>
> Still looking at the MPI_Abort hang as I'm having trouble replicating it.
>
>
> On Jun 5, 2014, at 7:16 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
> > Jeff,
> >
> > as pointed by Ralph, i do wish using eth0 for oob messages.
> >
> > i work on a 4k+ nodes cluster with a very decent gigabit ethernet
> > network (reasonable oversubscription + switches
> > from a reputable vendor you are familiar with ;-) )
> > my experience is that IPoIB can be very slow at establishing a
> > connection, especially if the arp table is not populated
> > (as far as i understand, this involves the subnet manager and
> > performance can be very random especially if all nodes issue
> > arp requests at the same time)
> > on the other hand, performance is much more stable when using the
> > subnetted IP network.
> >
> > as Ralf also pointed, i can imagine some architects neglect their
> > ethernet network (e.g. highly oversubscribed + low end switches)
> > and in this case ib0 is a best fit for oob messages.
> >
> >> As a simple solution, there could be an TCP oob MCA param that says
> "regardless of peer IP address, I can connect to them" (i.e., assume IP
> routing will make everything work out ok).
> > +1 and/or an option to tell oob mca "do not discard the interface simply
> > because the peer IP is not in the same subnet"
> >
> > Cheers,
> >
> > Gilles
> >
> > On 2014/06/05 23:01, Ralph Castain wrote:
> >> Because Gilles wants to avoid using IB for TCP messages, and using eth0
> also solves the problem (the messages just route)
> >>
> >> On Jun 5, 2014, at 5:00 AM, Jeff Squyres (jsquyres) 
> wrote:
> >>
> >>> Another random thought for Gilles situation: why not
> oob-TCP-if-include ib0?  (And not eth0)
> >>>
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/14982.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/14983.php
>